Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040861
Yuanyuan Yang, Jianchao Wang
Many emerging network applications, such as teleconferencing and information services, require group communication, in which messages from one or more sender(s) are delivered to a large number of receivers. We consider efficient network support for a key type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. In our recent work (Yang, 2001), we proposed a design for a conference network which can support multiple disjoint conferences. The major component of the network is an enhanced multistage switching network which interconnects switch modules with fan-in and fan-out capability. The multistage network used is modified from an indirect binary cube network by relaying all internal outputs at each stage through multiplexers to the outputs of the network. Each conference is realized in an indirect binary cube-like subnetwork depending on its location. A natural question here is: Can we directly adopt a class of multistage networks such as a baseline, an omega, or an indirect binary cube network to obtain a conference network with more regular network structure, simpler self-routing algorithm and less hardware cost? This paper aims to answer this question. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network.
{"title":"A class of multistage conference switching networks for group communication","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/ICPP.2002.1040861","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040861","url":null,"abstract":"Many emerging network applications, such as teleconferencing and information services, require group communication, in which messages from one or more sender(s) are delivered to a large number of receivers. We consider efficient network support for a key type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. In our recent work (Yang, 2001), we proposed a design for a conference network which can support multiple disjoint conferences. The major component of the network is an enhanced multistage switching network which interconnects switch modules with fan-in and fan-out capability. The multistage network used is modified from an indirect binary cube network by relaying all internal outputs at each stage through multiplexers to the outputs of the network. Each conference is realized in an indirect binary cube-like subnetwork depending on its location. A natural question here is: Can we directly adopt a class of multistage networks such as a baseline, an omega, or an indirect binary cube network to obtain a conference network with more regular network structure, simpler self-routing algorithm and less hardware cost? This paper aims to answer this question. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123367750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040895
Umit Rencuzogullari, S. Dwarkadas
Clusters of workstations (COW) offer high performance relative to their cost. Generally these clusters operate as autonomous systems running independent copies of the operating system, where access to machines is not controlled and all users enjoy the same access privileges. While these features are desirable and reduce operating costs, they create adverse effects on parallel applications running on these clusters. Load imbalances are common for parallel applications on COWs due to: 1) variable amount of load on nodes caused by an inherent lack of parallelism, 2) variable resource availability on nodes, and 3) independent scheduling decisions made by the independent schedulers on each node. Our earlier study has shown that an approach combining static program analysis, dynamic load balancing, and scheduler cooperation is effective in countering the adverse effects mentioned above. In our current study, we investigate the scalability of our approach as the number of processors is increased. We further relax the requirement of global synchronization, avoiding the need to use barriers and allowing the use of any other synchronization primitives while still achieving dynamic load balancing. The use of alternative synchronization primitives avoids the inherent vulnerability of barriers to load imbalance. It also allows load balancing to take place at any point in the course of execution, rather than only at a synchronization point, potentially reducing the time the application runs imbalanced. Moreover, load readjustment decisions are made in a distributed fashion, thus preventing any need for processes to globally synchronize in order to redistribute load.
{"title":"A technique for adaptation to available resources on clusters independent of synchronization methods used","authors":"Umit Rencuzogullari, S. Dwarkadas","doi":"10.1109/ICPP.2002.1040895","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040895","url":null,"abstract":"Clusters of workstations (COW) offer high performance relative to their cost. Generally these clusters operate as autonomous systems running independent copies of the operating system, where access to machines is not controlled and all users enjoy the same access privileges. While these features are desirable and reduce operating costs, they create adverse effects on parallel applications running on these clusters. Load imbalances are common for parallel applications on COWs due to: 1) variable amount of load on nodes caused by an inherent lack of parallelism, 2) variable resource availability on nodes, and 3) independent scheduling decisions made by the independent schedulers on each node. Our earlier study has shown that an approach combining static program analysis, dynamic load balancing, and scheduler cooperation is effective in countering the adverse effects mentioned above. In our current study, we investigate the scalability of our approach as the number of processors is increased. We further relax the requirement of global synchronization, avoiding the need to use barriers and allowing the use of any other synchronization primitives while still achieving dynamic load balancing. The use of alternative synchronization primitives avoids the inherent vulnerability of barriers to load imbalance. It also allows load balancing to take place at any point in the course of execution, rather than only at a synchronization point, potentially reducing the time the application runs imbalanced. Moreover, load readjustment decisions are made in a distributed fashion, thus preventing any need for processes to globally synchronize in order to redistribute load.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131618006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040915
Peter Sulatycke, K. Ghose
We present in-core and out-of-core parallel techniques for implementing isosurface rendering based on the notion of span-space buckets. Our in-core technique makes conservative use of the RAM and is amenable to parallelization. The out-of-core variant keeps the amount of data read in the search process to a minimum, visiting only the cells that intersect the isosurface. The out-of-core technique additionally minimizes disk I/O time through in-order seeking, interleaving data records on the disk and by overlapping computational and I/O threads. The overall isosurface rendering time achieved using our out-of-core span space buckets is comparable to that of well-optimized in-core techniques that have enough RAM at their disposal to avoid thrashing. When the RAM size is limited, our out-of-core span-space buckets maintains its performance level while in-core algorithms either start to thrash or must sacrifice performance for a smaller memory footprint.
我们提出了核内和核外并行技术来实现基于跨空间桶概念的等值面渲染。我们的核内技术保守地使用了RAM,并且适合并行化。out-of-core变体将搜索过程中读取的数据量保持在最低限度,只访问与等值面相交的单元格。外核技术还通过顺序查找、交错磁盘上的数据记录以及重叠计算和I/O线程来最小化磁盘I/O时间。使用我们的out-of-core span space bucket获得的整体等面渲染时间与优化良好的in-core技术相当,后者拥有足够的RAM以避免抖动。当RAM大小有限时,我们的外核跨空间桶保持其性能水平,而内核算法要么开始颠簸,要么必须牺牲性能以获得更小的内存占用。
{"title":"Multithreaded isosurface rendering on SMPs using span-space buckets","authors":"Peter Sulatycke, K. Ghose","doi":"10.1109/ICPP.2002.1040915","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040915","url":null,"abstract":"We present in-core and out-of-core parallel techniques for implementing isosurface rendering based on the notion of span-space buckets. Our in-core technique makes conservative use of the RAM and is amenable to parallelization. The out-of-core variant keeps the amount of data read in the search process to a minimum, visiting only the cells that intersect the isosurface. The out-of-core technique additionally minimizes disk I/O time through in-order seeking, interleaving data records on the disk and by overlapping computational and I/O threads. The overall isosurface rendering time achieved using our out-of-core span space buckets is comparable to that of well-optimized in-core techniques that have enough RAM at their disposal to avoid thrashing. When the RAM size is limited, our out-of-core span-space buckets maintains its performance level while in-core algorithms either start to thrash or must sacrifice performance for a smaller memory footprint.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134515656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040903
C. Yeh, B. Parhami
We formulate array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problem, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1+o(1). The number of faults tolerated by ARTs ranges from o(min (n/sup 1-1/d/, n/d, n/h)) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X'Y' routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.
我们提出了阵列鲁棒性定理(ARTs),用于在故障阵列上进行有效的计算和通信。不需要硬件冗余,也不假设完整的子网格或子环面的可用性。基于ARTs,各种各样的问题,包括排序、FFT、总交换、排列和一些矩阵操作,都可以用1+o(1)的减速因子来解决。ARTs可容忍的故障数量范围为0 (min (n/sup 1-1/d/, n/d, n/h)),对于大多数n节点二维网格或具有随机故障的环面,ARTs可容忍的故障数量为0 (n),其中h是每个处理器的数据项数。由此产生的运行时间是迄今为止报道的解决故障阵列上许多问题的最佳结果。基于ARTs和鲁棒库、优先级仿真规则和X' y '路由等其他组件,我们引入了鲁棒自适应接口层(RAIL)作为普通算法/程序和故障网络/硬件之间的中间件。实际上,RAIL为更高层提供了一个虚拟的无故障网络,而普通的算法/程序通过RAIL转换为相应的可在故障网络上运行的鲁棒算法/程序。
{"title":"ART: robustness of meshes and tori for parallel and distributed computation","authors":"C. Yeh, B. Parhami","doi":"10.1109/ICPP.2002.1040903","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040903","url":null,"abstract":"We formulate array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problem, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1+o(1). The number of faults tolerated by ARTs ranges from o(min (n/sup 1-1/d/, n/d, n/h)) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X'Y' routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133409817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040881
S. Bromling, S. MacDonald, J. Anvik, J. Schaeffer, D. Szafron, K. Tan
The advantages of pattern-based programming have been well-documented in the sequential programming literature. However patterns have yet to make their way into mainstream parallel computing, even though several research tools support them. There are two critical shortcomings of pattern (or template) based systems for parallel programming: lack of extensibility and performance. This paper describes our approach for addressing these problems in the CO/sub 2/P/sub 3/S parallel programming system. CO/sub 2/P/sub 3/S supports multiple levels of abstraction, allowing the user to design an application with high-level patterns, but move to lower levels of abstraction for performance tuning. Patterns are implemented as parameterized templates, allowing the user the ability to customize the pattern to meet their needs. CO/sub 2/P/sub 3/S generates code that is specific to the pattern/parameter combination selected by the user. The MetaCO/sub 2/P/sub 3/S tool addresses extensibility by giving users the ability to design and add new pattern templates to CO/sub 2/P/sub 3/S. Since the pattern templates are stored in a system-independent format, they are suitable for storing in a repository to be shared throughout the user community.
{"title":"Pattern-based parallel programming","authors":"S. Bromling, S. MacDonald, J. Anvik, J. Schaeffer, D. Szafron, K. Tan","doi":"10.1109/ICPP.2002.1040881","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040881","url":null,"abstract":"The advantages of pattern-based programming have been well-documented in the sequential programming literature. However patterns have yet to make their way into mainstream parallel computing, even though several research tools support them. There are two critical shortcomings of pattern (or template) based systems for parallel programming: lack of extensibility and performance. This paper describes our approach for addressing these problems in the CO/sub 2/P/sub 3/S parallel programming system. CO/sub 2/P/sub 3/S supports multiple levels of abstraction, allowing the user to design an application with high-level patterns, but move to lower levels of abstraction for performance tuning. Patterns are implemented as parameterized templates, allowing the user the ability to customize the pattern to meet their needs. CO/sub 2/P/sub 3/S generates code that is specific to the pattern/parameter combination selected by the user. The MetaCO/sub 2/P/sub 3/S tool addresses extensibility by giving users the ability to design and add new pattern templates to CO/sub 2/P/sub 3/S. Since the pattern templates are stored in a system-independent format, they are suitable for storing in a repository to be shared throughout the user community.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128809386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040856
María J. Martín, D. E. Singh, J. Touriño, F. F. Rivera
The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.
{"title":"Exploiting locality in the run-time parallelization of irregular loops","authors":"María J. Martín, D. E. Singh, J. Touriño, F. F. Rivera","doi":"10.1109/ICPP.2002.1040856","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040856","url":null,"abstract":"The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129125866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040906
K. Nakano
A broadcast communication model (BCM) is a distributed system with no central arbiter populated by n processing units referred to as stations. The stations can communicate by broadcasting/receiving data packets in one of k communication channels. We assume that the stations run on batteries and expands power while broadcasting/receiving a data packet. Thus, the most important measure to evaluate algorithms on the BCM is the number of awake time slots, in which a station is broadcasting/receiving a data packet. We also assume that the stations are identical and have no unique ID number, and no station knows the number n of the stations. For given n keys one for each station, the ranking problem asks each station to determine the number of keys in the BCM smaller than its own key. The main contribution of the paper is to present an optimal randomized ranking algorithm on the k-channel BCM. Our algorithm solves the ranking problem, with high probability, in O(n/k+log n) time slots with no station being awake for more than O(log n) time slots. We also prove that any randomized ranking algorithm is required to run in expected /spl Omega/(n/k+log n) time slots with at least one station being awake for expected /spl Omega/(log n) time slots. Therefore, our ranking algorithm is optimal.
{"title":"An optimal randomized ranking algorithm on the k-channel broadcast communication model","authors":"K. Nakano","doi":"10.1109/ICPP.2002.1040906","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040906","url":null,"abstract":"A broadcast communication model (BCM) is a distributed system with no central arbiter populated by n processing units referred to as stations. The stations can communicate by broadcasting/receiving data packets in one of k communication channels. We assume that the stations run on batteries and expands power while broadcasting/receiving a data packet. Thus, the most important measure to evaluate algorithms on the BCM is the number of awake time slots, in which a station is broadcasting/receiving a data packet. We also assume that the stations are identical and have no unique ID number, and no station knows the number n of the stations. For given n keys one for each station, the ranking problem asks each station to determine the number of keys in the BCM smaller than its own key. The main contribution of the paper is to present an optimal randomized ranking algorithm on the k-channel BCM. Our algorithm solves the ranking problem, with high probability, in O(n/k+log n) time slots with no station being awake for more than O(log n) time slots. We also prove that any randomized ranking algorithm is required to run in expected /spl Omega/(n/k+log n) time slots with at least one station being awake for expected /spl Omega/(log n) time slots. Therefore, our ranking algorithm is optimal.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040884
Xueyan Tang, Fan Zhang, S. Chanson
Streaming media is expected to become one of the most popular types of Web content in the future. Due to increasing variety of client devices and the range of access speeds to the Internet, multimedia contents may be required to be transcoded to match the client's capability. With transcoding, both the network and the proxy CPU are potential bottlenecks for streaming media delivery. This paper discusses and compares various caching algorithms designed for transcoding proxies. In particular we propose a new adaptive algorithm that dynamically selects an appropriate metric for adjusting the management policy. Experimental results show that the proposed algorithm significantly outperforms those that cache only untranscoded or only transcoded objects. Moreover motivated by the characteristics of many video compression algorithms, we investigate partitioning a video object into sections based on frame type and handling them individually for proxy caching. It is found that partitioning improves performance when CPU power rather than network bandwidth is the limiting resource, particularly when the reference pattern is not highly skewed.
{"title":"Streaming media caching algorithms for transcoding proxies","authors":"Xueyan Tang, Fan Zhang, S. Chanson","doi":"10.1109/ICPP.2002.1040884","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040884","url":null,"abstract":"Streaming media is expected to become one of the most popular types of Web content in the future. Due to increasing variety of client devices and the range of access speeds to the Internet, multimedia contents may be required to be transcoded to match the client's capability. With transcoding, both the network and the proxy CPU are potential bottlenecks for streaming media delivery. This paper discusses and compares various caching algorithms designed for transcoding proxies. In particular we propose a new adaptive algorithm that dynamically selects an appropriate metric for adjusting the management policy. Experimental results show that the proposed algorithm significantly outperforms those that cache only untranscoded or only transcoded objects. Moreover motivated by the characteristics of many video compression algorithms, we investigate partitioning a video object into sections based on frame type and handling them individually for proxy caching. It is found that partitioning improves performance when CPU power rather than network bandwidth is the limiting resource, particularly when the reference pattern is not highly skewed.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131925171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040873
Pei Zheng, L. Ni
The development and implementation of new network protocols and applications need accurate, scalable, reconfigurable, and inexpensive tools for debugging, testing, performance tuning and evaluation purposes. Network emulation provides a fully controllable laboratory network environment in which protocols and applications can be evaluated against predefined network conditions and traffic dynamics. In this paper, we present a new framework of network emulation EMPOWER. EMPOWER is capable of generating a decent network model based on the information of an emulated network, and then mapping the model to an emulation configuration in the EMPOWER laboratory network environment. It is highly scalable not only because the number of emulator nodes may be increased without significantly increasing the emulation time or worrying about parallel simulation, but also because the network mapping scheme allows flexible ports aggregation and derivation. By dynamically configuring a virtual device, effects such as link bandwidth, packet delay, packet loss rate, and out-of-order delivery, can be emulated.
{"title":"EMPOWER: a scalable framework for network emulation","authors":"Pei Zheng, L. Ni","doi":"10.1109/ICPP.2002.1040873","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040873","url":null,"abstract":"The development and implementation of new network protocols and applications need accurate, scalable, reconfigurable, and inexpensive tools for debugging, testing, performance tuning and evaluation purposes. Network emulation provides a fully controllable laboratory network environment in which protocols and applications can be evaluated against predefined network conditions and traffic dynamics. In this paper, we present a new framework of network emulation EMPOWER. EMPOWER is capable of generating a decent network model based on the information of an emulated network, and then mapping the model to an emulation configuration in the EMPOWER laboratory network environment. It is highly scalable not only because the number of emulator nodes may be increased without significantly increasing the emulation time or worrying about parallel simulation, but also because the network mapping scheme allows flexible ports aggregation and derivation. By dynamically configuring a virtual device, effects such as link bandwidth, packet delay, packet loss rate, and out-of-order delivery, can be emulated.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128290926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040885
Xin Chen, Xiaodong Zhang
Prediction by partial match (PPM) is a commonly used technique in Web prefetching, where prefetching decisions are made based on historical URLs in a dynamically maintained Markov prediction tree. Existing approaches either widely store the URL nodes by building the tree with a fixed height in each branch, or only store the branches with frequently accessed URLs. Building the popularity information into the Markov prediction tree, we propose a new prefetching model, called popularity-based PPM. In this model, the tree is dynamically updated with a variable height in each set of branches where a popular URL can lead a set of long branches, and a less popular document leads a set of short ones. Since majority root nodes are popular URLs in our approach, the space allocation for storing nodes are effectively utilized. We have also included two additional optimizations in this model: (1) directly linking a root node to duplicated popular nodes in a surfing path to give popular URLs more considerations for prefetching; and (2) making a space optimization after the tree is built to further remove less popular nodes. Our trace-driven simulation results comparatively show a significant space reduction and an improved prediction accuracy of the proposed prefetching technique.
{"title":"Popularity-based PPM: an effective Web prefetching technique for high accuracy and low storage","authors":"Xin Chen, Xiaodong Zhang","doi":"10.1109/ICPP.2002.1040885","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040885","url":null,"abstract":"Prediction by partial match (PPM) is a commonly used technique in Web prefetching, where prefetching decisions are made based on historical URLs in a dynamically maintained Markov prediction tree. Existing approaches either widely store the URL nodes by building the tree with a fixed height in each branch, or only store the branches with frequently accessed URLs. Building the popularity information into the Markov prediction tree, we propose a new prefetching model, called popularity-based PPM. In this model, the tree is dynamically updated with a variable height in each set of branches where a popular URL can lead a set of long branches, and a less popular document leads a set of short ones. Since majority root nodes are popular URLs in our approach, the space allocation for storing nodes are effectively utilized. We have also included two additional optimizations in this model: (1) directly linking a root node to duplicated popular nodes in a surfing path to give popular URLs more considerations for prefetching; and (2) making a space optimization after the tree is built to further remove less popular nodes. Our trace-driven simulation results comparatively show a significant space reduction and an improved prediction accuracy of the proposed prefetching technique.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129966537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}