Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040861
Yuanyuan Yang, Jianchao Wang
Many emerging network applications, such as teleconferencing and information services, require group communication, in which messages from one or more sender(s) are delivered to a large number of receivers. We consider efficient network support for a key type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. In our recent work (Yang, 2001), we proposed a design for a conference network which can support multiple disjoint conferences. The major component of the network is an enhanced multistage switching network which interconnects switch modules with fan-in and fan-out capability. The multistage network used is modified from an indirect binary cube network by relaying all internal outputs at each stage through multiplexers to the outputs of the network. Each conference is realized in an indirect binary cube-like subnetwork depending on its location. A natural question here is: Can we directly adopt a class of multistage networks such as a baseline, an omega, or an indirect binary cube network to obtain a conference network with more regular network structure, simpler self-routing algorithm and less hardware cost? This paper aims to answer this question. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network.
{"title":"A class of multistage conference switching networks for group communication","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/ICPP.2002.1040861","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040861","url":null,"abstract":"Many emerging network applications, such as teleconferencing and information services, require group communication, in which messages from one or more sender(s) are delivered to a large number of receivers. We consider efficient network support for a key type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. In our recent work (Yang, 2001), we proposed a design for a conference network which can support multiple disjoint conferences. The major component of the network is an enhanced multistage switching network which interconnects switch modules with fan-in and fan-out capability. The multistage network used is modified from an indirect binary cube network by relaying all internal outputs at each stage through multiplexers to the outputs of the network. Each conference is realized in an indirect binary cube-like subnetwork depending on its location. A natural question here is: Can we directly adopt a class of multistage networks such as a baseline, an omega, or an indirect binary cube network to obtain a conference network with more regular network structure, simpler self-routing algorithm and less hardware cost? This paper aims to answer this question. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123367750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040901
Huanjing Wang, Guangbin Fan, Jingyuan Zhang
Location management deals with how to track mobile users within the cellular network. It consists of two basic operations: location update and paging. The total location management cost is the sum of the location update cost and the paging cost. Location areas and reporting centers are two popular location management schemes. The motivation for the study is the observation that the location update cost difference between the reporting centers scheme and the location areas scheme is small whereas the paging cost in the reporting centers scheme is larger than that in the location areas scheme. The paper compares the performance of the location areas scheme and the reporting centers scheme under aggregate movement behavior mobility models by simulations. Simulation results show that the location areas scheme performs about the same as the reporting centers scheme in two extreme cases, that is, when a few cells or almost all cells are selected as the reporting cells. However, the location areas scheme outperforms the reporting centers scheme at the 100% confidence level with all call-to-mobility ratios when the reporting cells divide the whole service area into several regions.
{"title":"Performance comparison of location areas and reporting centers under aggregate movement behavior mobility models","authors":"Huanjing Wang, Guangbin Fan, Jingyuan Zhang","doi":"10.1109/ICPP.2002.1040901","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040901","url":null,"abstract":"Location management deals with how to track mobile users within the cellular network. It consists of two basic operations: location update and paging. The total location management cost is the sum of the location update cost and the paging cost. Location areas and reporting centers are two popular location management schemes. The motivation for the study is the observation that the location update cost difference between the reporting centers scheme and the location areas scheme is small whereas the paging cost in the reporting centers scheme is larger than that in the location areas scheme. The paper compares the performance of the location areas scheme and the reporting centers scheme under aggregate movement behavior mobility models by simulations. Simulation results show that the location areas scheme performs about the same as the reporting centers scheme in two extreme cases, that is, when a few cells or almost all cells are selected as the reporting cells. However, the location areas scheme outperforms the reporting centers scheme at the 100% confidence level with all call-to-mobility ratios when the reporting cells divide the whole service area into several regions.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114256404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040878
D. Xiang, Ai Chen
A limited-global-safety-information-based metric called local safety is proposed to handle fault-tolerant routing in 2D tori (or meshes). Sufficient conditions for existence of a minimum feasible path between the source and destination is presented based on local safety information in a 2D torus network. An efficient heuristic function is defined to guide fault-tolerant routing inside a 2D torus network. Unlike the conventional methods based on the block fault model, our method does not disable any fault-free nodes and fault-free nodes inside a fault block can still be a source or a destination, which can greatly increase throughput and computational power of the system. Techniques for avoidance of deadlocks are introduced. Extensive simulation results are presented.
{"title":"Fault-tolerant routing in 2D tori or meshes using limited-global-safety information","authors":"D. Xiang, Ai Chen","doi":"10.1109/ICPP.2002.1040878","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040878","url":null,"abstract":"A limited-global-safety-information-based metric called local safety is proposed to handle fault-tolerant routing in 2D tori (or meshes). Sufficient conditions for existence of a minimum feasible path between the source and destination is presented based on local safety information in a 2D torus network. An efficient heuristic function is defined to guide fault-tolerant routing inside a 2D torus network. Unlike the conventional methods based on the block fault model, our method does not disable any fault-free nodes and fault-free nodes inside a fault block can still be a source or a destination, which can greatly increase throughput and computational power of the system. Techniques for avoidance of deadlocks are introduced. Extensive simulation results are presented.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114363450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040903
C. Yeh, B. Parhami
We formulate array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problem, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1+o(1). The number of faults tolerated by ARTs ranges from o(min (n/sup 1-1/d/, n/d, n/h)) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X'Y' routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.
我们提出了阵列鲁棒性定理(ARTs),用于在故障阵列上进行有效的计算和通信。不需要硬件冗余,也不假设完整的子网格或子环面的可用性。基于ARTs,各种各样的问题,包括排序、FFT、总交换、排列和一些矩阵操作,都可以用1+o(1)的减速因子来解决。ARTs可容忍的故障数量范围为0 (min (n/sup 1-1/d/, n/d, n/h)),对于大多数n节点二维网格或具有随机故障的环面,ARTs可容忍的故障数量为0 (n),其中h是每个处理器的数据项数。由此产生的运行时间是迄今为止报道的解决故障阵列上许多问题的最佳结果。基于ARTs和鲁棒库、优先级仿真规则和X' y '路由等其他组件,我们引入了鲁棒自适应接口层(RAIL)作为普通算法/程序和故障网络/硬件之间的中间件。实际上,RAIL为更高层提供了一个虚拟的无故障网络,而普通的算法/程序通过RAIL转换为相应的可在故障网络上运行的鲁棒算法/程序。
{"title":"ART: robustness of meshes and tori for parallel and distributed computation","authors":"C. Yeh, B. Parhami","doi":"10.1109/ICPP.2002.1040903","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040903","url":null,"abstract":"We formulate array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problem, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1+o(1). The number of faults tolerated by ARTs ranges from o(min (n/sup 1-1/d/, n/d, n/h)) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X'Y' routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133409817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040881
S. Bromling, S. MacDonald, J. Anvik, J. Schaeffer, D. Szafron, K. Tan
The advantages of pattern-based programming have been well-documented in the sequential programming literature. However patterns have yet to make their way into mainstream parallel computing, even though several research tools support them. There are two critical shortcomings of pattern (or template) based systems for parallel programming: lack of extensibility and performance. This paper describes our approach for addressing these problems in the CO/sub 2/P/sub 3/S parallel programming system. CO/sub 2/P/sub 3/S supports multiple levels of abstraction, allowing the user to design an application with high-level patterns, but move to lower levels of abstraction for performance tuning. Patterns are implemented as parameterized templates, allowing the user the ability to customize the pattern to meet their needs. CO/sub 2/P/sub 3/S generates code that is specific to the pattern/parameter combination selected by the user. The MetaCO/sub 2/P/sub 3/S tool addresses extensibility by giving users the ability to design and add new pattern templates to CO/sub 2/P/sub 3/S. Since the pattern templates are stored in a system-independent format, they are suitable for storing in a repository to be shared throughout the user community.
{"title":"Pattern-based parallel programming","authors":"S. Bromling, S. MacDonald, J. Anvik, J. Schaeffer, D. Szafron, K. Tan","doi":"10.1109/ICPP.2002.1040881","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040881","url":null,"abstract":"The advantages of pattern-based programming have been well-documented in the sequential programming literature. However patterns have yet to make their way into mainstream parallel computing, even though several research tools support them. There are two critical shortcomings of pattern (or template) based systems for parallel programming: lack of extensibility and performance. This paper describes our approach for addressing these problems in the CO/sub 2/P/sub 3/S parallel programming system. CO/sub 2/P/sub 3/S supports multiple levels of abstraction, allowing the user to design an application with high-level patterns, but move to lower levels of abstraction for performance tuning. Patterns are implemented as parameterized templates, allowing the user the ability to customize the pattern to meet their needs. CO/sub 2/P/sub 3/S generates code that is specific to the pattern/parameter combination selected by the user. The MetaCO/sub 2/P/sub 3/S tool addresses extensibility by giving users the ability to design and add new pattern templates to CO/sub 2/P/sub 3/S. Since the pattern templates are stored in a system-independent format, they are suitable for storing in a repository to be shared throughout the user community.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128809386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040856
María J. Martín, D. E. Singh, J. Touriño, F. F. Rivera
The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.
{"title":"Exploiting locality in the run-time parallelization of irregular loops","authors":"María J. Martín, D. E. Singh, J. Touriño, F. F. Rivera","doi":"10.1109/ICPP.2002.1040856","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040856","url":null,"abstract":"The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129125866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040906
K. Nakano
A broadcast communication model (BCM) is a distributed system with no central arbiter populated by n processing units referred to as stations. The stations can communicate by broadcasting/receiving data packets in one of k communication channels. We assume that the stations run on batteries and expands power while broadcasting/receiving a data packet. Thus, the most important measure to evaluate algorithms on the BCM is the number of awake time slots, in which a station is broadcasting/receiving a data packet. We also assume that the stations are identical and have no unique ID number, and no station knows the number n of the stations. For given n keys one for each station, the ranking problem asks each station to determine the number of keys in the BCM smaller than its own key. The main contribution of the paper is to present an optimal randomized ranking algorithm on the k-channel BCM. Our algorithm solves the ranking problem, with high probability, in O(n/k+log n) time slots with no station being awake for more than O(log n) time slots. We also prove that any randomized ranking algorithm is required to run in expected /spl Omega/(n/k+log n) time slots with at least one station being awake for expected /spl Omega/(log n) time slots. Therefore, our ranking algorithm is optimal.
{"title":"An optimal randomized ranking algorithm on the k-channel broadcast communication model","authors":"K. Nakano","doi":"10.1109/ICPP.2002.1040906","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040906","url":null,"abstract":"A broadcast communication model (BCM) is a distributed system with no central arbiter populated by n processing units referred to as stations. The stations can communicate by broadcasting/receiving data packets in one of k communication channels. We assume that the stations run on batteries and expands power while broadcasting/receiving a data packet. Thus, the most important measure to evaluate algorithms on the BCM is the number of awake time slots, in which a station is broadcasting/receiving a data packet. We also assume that the stations are identical and have no unique ID number, and no station knows the number n of the stations. For given n keys one for each station, the ranking problem asks each station to determine the number of keys in the BCM smaller than its own key. The main contribution of the paper is to present an optimal randomized ranking algorithm on the k-channel BCM. Our algorithm solves the ranking problem, with high probability, in O(n/k+log n) time slots with no station being awake for more than O(log n) time slots. We also prove that any randomized ranking algorithm is required to run in expected /spl Omega/(n/k+log n) time slots with at least one station being awake for expected /spl Omega/(log n) time slots. Therefore, our ranking algorithm is optimal.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040884
Xueyan Tang, Fan Zhang, S. Chanson
Streaming media is expected to become one of the most popular types of Web content in the future. Due to increasing variety of client devices and the range of access speeds to the Internet, multimedia contents may be required to be transcoded to match the client's capability. With transcoding, both the network and the proxy CPU are potential bottlenecks for streaming media delivery. This paper discusses and compares various caching algorithms designed for transcoding proxies. In particular we propose a new adaptive algorithm that dynamically selects an appropriate metric for adjusting the management policy. Experimental results show that the proposed algorithm significantly outperforms those that cache only untranscoded or only transcoded objects. Moreover motivated by the characteristics of many video compression algorithms, we investigate partitioning a video object into sections based on frame type and handling them individually for proxy caching. It is found that partitioning improves performance when CPU power rather than network bandwidth is the limiting resource, particularly when the reference pattern is not highly skewed.
{"title":"Streaming media caching algorithms for transcoding proxies","authors":"Xueyan Tang, Fan Zhang, S. Chanson","doi":"10.1109/ICPP.2002.1040884","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040884","url":null,"abstract":"Streaming media is expected to become one of the most popular types of Web content in the future. Due to increasing variety of client devices and the range of access speeds to the Internet, multimedia contents may be required to be transcoded to match the client's capability. With transcoding, both the network and the proxy CPU are potential bottlenecks for streaming media delivery. This paper discusses and compares various caching algorithms designed for transcoding proxies. In particular we propose a new adaptive algorithm that dynamically selects an appropriate metric for adjusting the management policy. Experimental results show that the proposed algorithm significantly outperforms those that cache only untranscoded or only transcoded objects. Moreover motivated by the characteristics of many video compression algorithms, we investigate partitioning a video object into sections based on frame type and handling them individually for proxy caching. It is found that partitioning improves performance when CPU power rather than network bandwidth is the limiting resource, particularly when the reference pattern is not highly skewed.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"16 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131925171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040873
Pei Zheng, L. Ni
The development and implementation of new network protocols and applications need accurate, scalable, reconfigurable, and inexpensive tools for debugging, testing, performance tuning and evaluation purposes. Network emulation provides a fully controllable laboratory network environment in which protocols and applications can be evaluated against predefined network conditions and traffic dynamics. In this paper, we present a new framework of network emulation EMPOWER. EMPOWER is capable of generating a decent network model based on the information of an emulated network, and then mapping the model to an emulation configuration in the EMPOWER laboratory network environment. It is highly scalable not only because the number of emulator nodes may be increased without significantly increasing the emulation time or worrying about parallel simulation, but also because the network mapping scheme allows flexible ports aggregation and derivation. By dynamically configuring a virtual device, effects such as link bandwidth, packet delay, packet loss rate, and out-of-order delivery, can be emulated.
{"title":"EMPOWER: a scalable framework for network emulation","authors":"Pei Zheng, L. Ni","doi":"10.1109/ICPP.2002.1040873","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040873","url":null,"abstract":"The development and implementation of new network protocols and applications need accurate, scalable, reconfigurable, and inexpensive tools for debugging, testing, performance tuning and evaluation purposes. Network emulation provides a fully controllable laboratory network environment in which protocols and applications can be evaluated against predefined network conditions and traffic dynamics. In this paper, we present a new framework of network emulation EMPOWER. EMPOWER is capable of generating a decent network model based on the information of an emulated network, and then mapping the model to an emulation configuration in the EMPOWER laboratory network environment. It is highly scalable not only because the number of emulator nodes may be increased without significantly increasing the emulation time or worrying about parallel simulation, but also because the network mapping scheme allows flexible ports aggregation and derivation. By dynamically configuring a virtual device, effects such as link bandwidth, packet delay, packet loss rate, and out-of-order delivery, can be emulated.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128290926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-08-18DOI: 10.1109/ICPP.2002.1040885
Xin Chen, Xiaodong Zhang
Prediction by partial match (PPM) is a commonly used technique in Web prefetching, where prefetching decisions are made based on historical URLs in a dynamically maintained Markov prediction tree. Existing approaches either widely store the URL nodes by building the tree with a fixed height in each branch, or only store the branches with frequently accessed URLs. Building the popularity information into the Markov prediction tree, we propose a new prefetching model, called popularity-based PPM. In this model, the tree is dynamically updated with a variable height in each set of branches where a popular URL can lead a set of long branches, and a less popular document leads a set of short ones. Since majority root nodes are popular URLs in our approach, the space allocation for storing nodes are effectively utilized. We have also included two additional optimizations in this model: (1) directly linking a root node to duplicated popular nodes in a surfing path to give popular URLs more considerations for prefetching; and (2) making a space optimization after the tree is built to further remove less popular nodes. Our trace-driven simulation results comparatively show a significant space reduction and an improved prediction accuracy of the proposed prefetching technique.
{"title":"Popularity-based PPM: an effective Web prefetching technique for high accuracy and low storage","authors":"Xin Chen, Xiaodong Zhang","doi":"10.1109/ICPP.2002.1040885","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040885","url":null,"abstract":"Prediction by partial match (PPM) is a commonly used technique in Web prefetching, where prefetching decisions are made based on historical URLs in a dynamically maintained Markov prediction tree. Existing approaches either widely store the URL nodes by building the tree with a fixed height in each branch, or only store the branches with frequently accessed URLs. Building the popularity information into the Markov prediction tree, we propose a new prefetching model, called popularity-based PPM. In this model, the tree is dynamically updated with a variable height in each set of branches where a popular URL can lead a set of long branches, and a less popular document leads a set of short ones. Since majority root nodes are popular URLs in our approach, the space allocation for storing nodes are effectively utilized. We have also included two additional optimizations in this model: (1) directly linking a root node to duplicated popular nodes in a surfing path to give popular URLs more considerations for prefetching; and (2) making a space optimization after the tree is built to further remove less popular nodes. Our trace-driven simulation results comparatively show a significant space reduction and an improved prediction accuracy of the proposed prefetching technique.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129966537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}