Proceedings International Conference on Parallel Processing最新文献

英文中文

A class of multistage conference switching networks for group communication 一类用于群通信的多级会议交换网络

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040861

Yuanyuan Yang, Jianchao Wang

Many emerging network applications, such as teleconferencing and information services, require group communication, in which messages from one or more sender(s) are delivered to a large number of receivers. We consider efficient network support for a key type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. In our recent work (Yang, 2001), we proposed a design for a conference network which can support multiple disjoint conferences. The major component of the network is an enhanced multistage switching network which interconnects switch modules with fan-in and fan-out capability. The multistage network used is modified from an indirect binary cube network by relaying all internal outputs at each stage through multiplexers to the outputs of the network. Each conference is realized in an indirect binary cube-like subnetwork depending on its location. A natural question here is: Can we directly adopt a class of multistage networks such as a baseline, an omega, or an indirect binary cube network to obtain a conference network with more regular network structure, simpler self-routing algorithm and less hardware cost? This paper aims to answer this question. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network.

许多新兴的网络应用，如电话会议和信息服务，都需要组通信，其中来自一个或多个发送者的消息被传递给大量的接收者。我们认为有效的网络支持一种关键类型的群体通信，会议。会议是指网络中的一组成员在组内相互通信。在我们最近的工作中(Yang, 2001)，我们提出了一个会议网络的设计，它可以支持多个不相交的会议。该网络的主要组成部分是一个增强型多级交换网络，它将具有扇进和扇出能力的交换模块互连起来。所使用的多级网络由间接二进制立方体网络修改而成，通过多路复用器将每级的所有内部输出中继到网络的输出。每个会议根据其位置在一个间接的二进制立方体子网中实现。一个自然的问题是:我们是否可以直接采用一类多阶段网络，如基线网络、欧米茄网络或间接二进制立方体网络，来获得一个网络结构更规则、自路由算法更简单、硬件成本更低的会议网络?本文旨在回答这个问题。会议网络设计的关键问题是确定路由冲突的多重性，即当网络中同时存在多个不相关的会议时，竞争单个级间链路的冲突方的最大数量。

{"title":"A class of multistage conference switching networks for group communication","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/ICPP.2002.1040861","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040861","url":null,"abstract":"Many emerging network applications, such as teleconferencing and information services, require group communication, in which messages from one or more sender(s) are delivered to a large number of receivers. We consider efficient network support for a key type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. In our recent work (Yang, 2001), we proposed a design for a conference network which can support multiple disjoint conferences. The major component of the network is an enhanced multistage switching network which interconnects switch modules with fan-in and fan-out capability. The multistage network used is modified from an indirect binary cube network by relaying all internal outputs at each stage through multiplexers to the outputs of the network. Each conference is realized in an indirect binary cube-like subnetwork depending on its location. A natural question here is: Can we directly adopt a class of multistage networks such as a baseline, an omega, or an indirect binary cube network to obtain a conference network with more regular network structure, simpler self-routing algorithm and less hardware cost? This paper aims to answer this question. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123367750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A technique for adaptation to available resources on clusters independent of synchronization methods used 一种适应集群上可用资源的技术，与所使用的同步方法无关

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040895

Umit Rencuzogullari, S. Dwarkadas

Clusters of workstations (COW) offer high performance relative to their cost. Generally these clusters operate as autonomous systems running independent copies of the operating system, where access to machines is not controlled and all users enjoy the same access privileges. While these features are desirable and reduce operating costs, they create adverse effects on parallel applications running on these clusters. Load imbalances are common for parallel applications on COWs due to: 1) variable amount of load on nodes caused by an inherent lack of parallelism, 2) variable resource availability on nodes, and 3) independent scheduling decisions made by the independent schedulers on each node. Our earlier study has shown that an approach combining static program analysis, dynamic load balancing, and scheduler cooperation is effective in countering the adverse effects mentioned above. In our current study, we investigate the scalability of our approach as the number of processors is increased. We further relax the requirement of global synchronization, avoiding the need to use barriers and allowing the use of any other synchronization primitives while still achieving dynamic load balancing. The use of alternative synchronization primitives avoids the inherent vulnerability of barriers to load imbalance. It also allows load balancing to take place at any point in the course of execution, rather than only at a synchronization point, potentially reducing the time the application runs imbalanced. Moreover, load readjustment decisions are made in a distributed fashion, thus preventing any need for processes to globally synchronize in order to redistribute load.

工作站集群(COW)提供了相对于其成本而言的高性能。通常，这些集群作为运行操作系统独立副本的自治系统运行，其中对机器的访问不受控制，所有用户都享有相同的访问权限。虽然这些特性是可取的，并且可以降低操作成本，但它们会对在这些集群上运行的并行应用程序产生不利影响。负载不平衡在奶牛上的并行应用程序中很常见，这是因为:1)由于固有的并行性缺乏导致节点上的负载数量变化，2)节点上的资源可用性变化，以及3)由每个节点上的独立调度器做出的独立调度决策。我们早期的研究表明，结合静态程序分析、动态负载平衡和调度器合作的方法可以有效地对抗上述不利影响。在我们当前的研究中，我们研究了随着处理器数量的增加，我们的方法的可扩展性。我们进一步放宽了对全局同步的要求，避免了使用屏障的需要，并允许在实现动态负载平衡的同时使用任何其他同步原语。可选同步原语的使用避免了屏障对负载不平衡的固有脆弱性。它还允许在执行过程中的任何点进行负载平衡，而不仅仅是在同步点，这可能会减少应用程序不平衡运行的时间。此外，负载调整决策以分布式方式进行，从而避免了进程为了重新分配负载而进行全局同步的任何需要。

{"title":"A technique for adaptation to available resources on clusters independent of synchronization methods used","authors":"Umit Rencuzogullari, S. Dwarkadas","doi":"10.1109/ICPP.2002.1040895","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040895","url":null,"abstract":"Clusters of workstations (COW) offer high performance relative to their cost. Generally these clusters operate as autonomous systems running independent copies of the operating system, where access to machines is not controlled and all users enjoy the same access privileges. While these features are desirable and reduce operating costs, they create adverse effects on parallel applications running on these clusters. Load imbalances are common for parallel applications on COWs due to: 1) variable amount of load on nodes caused by an inherent lack of parallelism, 2) variable resource availability on nodes, and 3) independent scheduling decisions made by the independent schedulers on each node. Our earlier study has shown that an approach combining static program analysis, dynamic load balancing, and scheduler cooperation is effective in countering the adverse effects mentioned above. In our current study, we investigate the scalability of our approach as the number of processors is increased. We further relax the requirement of global synchronization, avoiding the need to use barriers and allowing the use of any other synchronization primitives while still achieving dynamic load balancing. The use of alternative synchronization primitives avoids the inherent vulnerability of barriers to load imbalance. It also allows load balancing to take place at any point in the course of execution, rather than only at a synchronization point, potentially reducing the time the application runs imbalanced. Moreover, load readjustment decisions are made in a distributed fashion, thus preventing any need for processes to globally synchronize in order to redistribute load.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131618006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multithreaded isosurface rendering on SMPs using span-space buckets 使用跨空间桶在smp上进行多线程等值面渲染

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040915

Peter Sulatycke, K. Ghose

We present in-core and out-of-core parallel techniques for implementing isosurface rendering based on the notion of span-space buckets. Our in-core technique makes conservative use of the RAM and is amenable to parallelization. The out-of-core variant keeps the amount of data read in the search process to a minimum, visiting only the cells that intersect the isosurface. The out-of-core technique additionally minimizes disk I/O time through in-order seeking, interleaving data records on the disk and by overlapping computational and I/O threads. The overall isosurface rendering time achieved using our out-of-core span space buckets is comparable to that of well-optimized in-core techniques that have enough RAM at their disposal to avoid thrashing. When the RAM size is limited, our out-of-core span-space buckets maintains its performance level while in-core algorithms either start to thrash or must sacrifice performance for a smaller memory footprint.

我们提出了核内和核外并行技术来实现基于跨空间桶概念的等值面渲染。我们的核内技术保守地使用了RAM，并且适合并行化。out-of-core变体将搜索过程中读取的数据量保持在最低限度，只访问与等值面相交的单元格。外核技术还通过顺序查找、交错磁盘上的数据记录以及重叠计算和I/O线程来最小化磁盘I/O时间。使用我们的out-of-core span space bucket获得的整体等面渲染时间与优化良好的in-core技术相当，后者拥有足够的RAM以避免抖动。当RAM大小有限时，我们的外核跨空间桶保持其性能水平，而内核算法要么开始颠簸，要么必须牺牲性能以获得更小的内存占用。

引用次数: 8

ART: robustness of meshes and tori for parallel and distributed computation ART:网格和环面在并行和分布式计算中的鲁棒性

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040903

C. Yeh, B. Parhami

We formulate array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problem, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1+o(1). The number of faults tolerated by ARTs ranges from o(min (n/sup 1-1/d/, n/d, n/h)) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X'Y' routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.

我们提出了阵列鲁棒性定理(ARTs)，用于在故障阵列上进行有效的计算和通信。不需要硬件冗余，也不假设完整的子网格或子环面的可用性。基于ARTs，各种各样的问题，包括排序、FFT、总交换、排列和一些矩阵操作，都可以用1+o(1)的减速因子来解决。ARTs可容忍的故障数量范围为0 (min (n/sup 1-1/d/， n/d, n/h))，对于大多数n节点二维网格或具有随机故障的环面，ARTs可容忍的故障数量为0 (n)，其中h是每个处理器的数据项数。由此产生的运行时间是迄今为止报道的解决故障阵列上许多问题的最佳结果。基于ARTs和鲁棒库、优先级仿真规则和X' y '路由等其他组件，我们引入了鲁棒自适应接口层(RAIL)作为普通算法/程序和故障网络/硬件之间的中间件。实际上，RAIL为更高层提供了一个虚拟的无故障网络，而普通的算法/程序通过RAIL转换为相应的可在故障网络上运行的鲁棒算法/程序。

{"title":"ART: robustness of meshes and tori for parallel and distributed computation","authors":"C. Yeh, B. Parhami","doi":"10.1109/ICPP.2002.1040903","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040903","url":null,"abstract":"We formulate array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problem, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1+o(1). The number of faults tolerated by ARTs ranges from o(min (n/sup 1-1/d/, n/d, n/h)) for n-ary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X'Y' routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithms/programs and the faulty network/hardware. In effect, RAIL provides a virtual fault-free network to higher layers, while ordinary algorithms/programs are transformed through RAIL into corresponding robust algorithms/programs that can run on faulty networks.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133409817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Pattern-based parallel programming 基于模式的并行编程

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040881

S. Bromling, S. MacDonald, J. Anvik, J. Schaeffer, D. Szafron, K. Tan

The advantages of pattern-based programming have been well-documented in the sequential programming literature. However patterns have yet to make their way into mainstream parallel computing, even though several research tools support them. There are two critical shortcomings of pattern (or template) based systems for parallel programming: lack of extensibility and performance. This paper describes our approach for addressing these problems in the CO/sub 2/P/sub 3/S parallel programming system. CO/sub 2/P/sub 3/S supports multiple levels of abstraction, allowing the user to design an application with high-level patterns, but move to lower levels of abstraction for performance tuning. Patterns are implemented as parameterized templates, allowing the user the ability to customize the pattern to meet their needs. CO/sub 2/P/sub 3/S generates code that is specific to the pattern/parameter combination selected by the user. The MetaCO/sub 2/P/sub 3/S tool addresses extensibility by giving users the ability to design and add new pattern templates to CO/sub 2/P/sub 3/S. Since the pattern templates are stored in a system-independent format, they are suitable for storing in a repository to be shared throughout the user community.

基于模式的编程的优点在顺序编程的文献中得到了充分的说明。然而，尽管有一些研究工具支持模式，但模式还没有进入主流并行计算领域。基于模式(或模板)的并行编程系统有两个主要缺点:缺乏可扩展性和性能。本文介绍了在CO/sub 2/P/sub 3/S并行编程系统中解决这些问题的方法。CO/sub 2/P/sub 3/S支持多个抽象级别，允许用户使用高级模式设计应用程序，但为了进行性能调优，可以移到较低的抽象级别。模式是作为参数化模板实现的，允许用户定制模式以满足他们的需求。CO/sub 2/P/sub 3/S生成特定于用户选择的模式/参数组合的代码。MetaCO/sub 2/P/sub 3/S工具通过使用户能够设计和添加新的模式模板到CO/sub 2/P/sub 3/S，从而解决了可扩展性问题。由于模式模板以独立于系统的格式存储，因此它们适合存储在存储库中，以便在整个用户社区中共享。

{"title":"Pattern-based parallel programming","authors":"S. Bromling, S. MacDonald, J. Anvik, J. Schaeffer, D. Szafron, K. Tan","doi":"10.1109/ICPP.2002.1040881","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040881","url":null,"abstract":"The advantages of pattern-based programming have been well-documented in the sequential programming literature. However patterns have yet to make their way into mainstream parallel computing, even though several research tools support them. There are two critical shortcomings of pattern (or template) based systems for parallel programming: lack of extensibility and performance. This paper describes our approach for addressing these problems in the CO/sub 2/P/sub 3/S parallel programming system. CO/sub 2/P/sub 3/S supports multiple levels of abstraction, allowing the user to design an application with high-level patterns, but move to lower levels of abstraction for performance tuning. Patterns are implemented as parameterized templates, allowing the user the ability to customize the pattern to meet their needs. CO/sub 2/P/sub 3/S generates code that is specific to the pattern/parameter combination selected by the user. The MetaCO/sub 2/P/sub 3/S tool addresses extensibility by giving users the ability to design and add new pattern templates to CO/sub 2/P/sub 3/S. Since the pattern templates are stored in a system-independent format, they are suitable for storing in a repository to be shared throughout the user community.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128809386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Exploiting locality in the run-time parallelization of irregular loops 利用不规则循环运行时并行化的局部性

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040856

María J. Martín, D. E. Singh, J. Touriño, F. F. Rivera

The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not always be determined at compile-time as, in many cases, they involve input data that are only known at run-time and/or the access pattern is too complex to be analyzed In this paper we propose runtime strategies for the parallelization of these loops. Our approaches focus not only on extracting parallelism among iterations of the loop, but also on exploiting data access locality to improve memory hierarchy behavior and, thus, the overall program speedup. Two strategies are proposed one based on graph partitioning techniques and other based on a block-cyclic distribution. Experimental results show that both strategies are complementary and the choice of the best alternative depends on some features of the loop pattern.

这项工作的目标是通过间接数组访问有效地并行执行循环，以便嵌入到并行编译器框架中。在这种循环模式中，依赖关系不能总是在编译时确定，因为在许多情况下，它们涉及只有在运行时才知道的输入数据和/或访问模式太复杂而无法分析。本文提出了并行化这些循环的运行时策略。我们的方法不仅关注于提取循环迭代之间的并行性，而且还关注于利用数据访问局部性来改善内存层次结构行为，从而提高整个程序的速度。提出了两种策略，一种基于图划分技术，另一种基于块循环分布。实验结果表明，这两种策略是互补的，最佳方案的选择取决于环路模式的某些特征。

引用次数: 14

An optimal randomized ranking algorithm on the k-channel broadcast communication model 基于k频道广播通信模型的最优随机排序算法

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040906

K. Nakano

A broadcast communication model (BCM) is a distributed system with no central arbiter populated by n processing units referred to as stations. The stations can communicate by broadcasting/receiving data packets in one of k communication channels. We assume that the stations run on batteries and expands power while broadcasting/receiving a data packet. Thus, the most important measure to evaluate algorithms on the BCM is the number of awake time slots, in which a station is broadcasting/receiving a data packet. We also assume that the stations are identical and have no unique ID number, and no station knows the number n of the stations. For given n keys one for each station, the ranking problem asks each station to determine the number of keys in the BCM smaller than its own key. The main contribution of the paper is to present an optimal randomized ranking algorithm on the k-channel BCM. Our algorithm solves the ranking problem, with high probability, in O(n/k+log n) time slots with no station being awake for more than O(log n) time slots. We also prove that any randomized ranking algorithm is required to run in expected /spl Omega/(n/k+log n) time slots with at least one station being awake for expected /spl Omega/(log n) time slots. Therefore, our ranking algorithm is optimal.

广播通信模型(BCM)是一个分布式系统，没有中央仲裁器，由n个处理单元(称为站)填充。这些电台可以通过k个通信信道中的一个广播/接收数据包进行通信。我们假设电台使用电池运行，并在广播/接收数据包时扩展电力。因此，评估BCM上的算法的最重要的措施是唤醒时隙的数量，其中一个站正在广播/接收数据包。我们还假设这些站点是相同的，没有唯一的ID号，并且没有站点知道站点的编号n。对于给定的n个键，每个站一个键，排序问题要求每个站确定BCM中比自己的键小的键的数量。本文的主要贡献是在k通道BCM上提出了一种最优随机排序算法。我们的算法在O(n/k+log n)个时隙中以高概率解决了排序问题，并且没有站点在超过O(log n)个时隙中处于唤醒状态。我们还证明了任何随机排序算法都需要在预期/spl Omega/(n/k+log n)时隙中运行，并且至少有一个站点在预期/spl Omega/(log n)时隙中处于唤醒状态。因此，我们的排序算法是最优的。

{"title":"An optimal randomized ranking algorithm on the k-channel broadcast communication model","authors":"K. Nakano","doi":"10.1109/ICPP.2002.1040906","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040906","url":null,"abstract":"A broadcast communication model (BCM) is a distributed system with no central arbiter populated by n processing units referred to as stations. The stations can communicate by broadcasting/receiving data packets in one of k communication channels. We assume that the stations run on batteries and expands power while broadcasting/receiving a data packet. Thus, the most important measure to evaluate algorithms on the BCM is the number of awake time slots, in which a station is broadcasting/receiving a data packet. We also assume that the stations are identical and have no unique ID number, and no station knows the number n of the stations. For given n keys one for each station, the ranking problem asks each station to determine the number of keys in the BCM smaller than its own key. The main contribution of the paper is to present an optimal randomized ranking algorithm on the k-channel BCM. Our algorithm solves the ranking problem, with high probability, in O(n/k+log n) time slots with no station being awake for more than O(log n) time slots. We also prove that any randomized ranking algorithm is required to run in expected /spl Omega/(n/k+log n) time slots with at least one station being awake for expected /spl Omega/(log n) time slots. Therefore, our ranking algorithm is optimal.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116411386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Streaming media caching algorithms for transcoding proxies 转码代理的流媒体缓存算法

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040884

Xueyan Tang, Fan Zhang, S. Chanson

Streaming media is expected to become one of the most popular types of Web content in the future. Due to increasing variety of client devices and the range of access speeds to the Internet, multimedia contents may be required to be transcoded to match the client's capability. With transcoding, both the network and the proxy CPU are potential bottlenecks for streaming media delivery. This paper discusses and compares various caching algorithms designed for transcoding proxies. In particular we propose a new adaptive algorithm that dynamically selects an appropriate metric for adjusting the management policy. Experimental results show that the proposed algorithm significantly outperforms those that cache only untranscoded or only transcoded objects. Moreover motivated by the characteristics of many video compression algorithms, we investigate partitioning a video object into sections based on frame type and handling them individually for proxy caching. It is found that partitioning improves performance when CPU power rather than network bandwidth is the limiting resource, particularly when the reference pattern is not highly skewed.

流媒体有望成为未来最流行的网络内容类型之一。由于客户端设备的种类越来越多，访问Internet的速度也越来越快，因此可能需要对多媒体内容进行转码，以配合客户端的能力。对于转码，网络和代理CPU都是流媒体传输的潜在瓶颈。本文讨论并比较了为转码代理设计的各种缓存算法。我们特别提出了一种新的自适应算法，它动态地选择合适的度量来调整管理策略。实验结果表明，该算法明显优于仅缓存未转码对象或仅缓存转码对象的算法。此外，由于许多视频压缩算法的特点，我们研究了基于帧类型将视频对象划分为部分并单独处理它们以进行代理缓存。我们发现，当CPU功率而不是网络带宽是限制资源时，特别是当参考模式没有高度倾斜时，分区可以提高性能。

引用次数: 50

EMPOWER: a scalable framework for network emulation EMPOWER:用于网络仿真的可扩展框架

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040873

Pei Zheng, L. Ni

The development and implementation of new network protocols and applications need accurate, scalable, reconfigurable, and inexpensive tools for debugging, testing, performance tuning and evaluation purposes. Network emulation provides a fully controllable laboratory network environment in which protocols and applications can be evaluated against predefined network conditions and traffic dynamics. In this paper, we present a new framework of network emulation EMPOWER. EMPOWER is capable of generating a decent network model based on the information of an emulated network, and then mapping the model to an emulation configuration in the EMPOWER laboratory network environment. It is highly scalable not only because the number of emulator nodes may be increased without significantly increasing the emulation time or worrying about parallel simulation, but also because the network mapping scheme allows flexible ports aggregation and derivation. By dynamically configuring a virtual device, effects such as link bandwidth, packet delay, packet loss rate, and out-of-order delivery, can be emulated.

新的网络协议和应用程序的开发和实现需要精确的、可伸缩的、可重构的和廉价的工具来进行调试、测试、性能调优和评估。网络仿真提供了一个完全可控的实验室网络环境，可以根据预定义的网络条件和流量动态评估协议和应用程序。本文提出了一种新的网络仿真框架EMPOWER。EMPOWER能够根据仿真网络的信息生成一个体面的网络模型，然后将该模型映射到EMPOWER实验室网络环境中的仿真配置。它具有很高的可扩展性，不仅因为可以增加仿真器节点的数量，而无需显着增加仿真时间或担心并行仿真，而且因为网络映射方案允许灵活的端口聚合和派生。通过动态配置虚拟设备，可以模拟链路带宽、报文延迟、丢包率、乱序发送等影响。

引用次数: 17

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040885

Xin Chen, Xiaodong Zhang

Prediction by partial match (PPM) is a commonly used technique in Web prefetching, where prefetching decisions are made based on historical URLs in a dynamically maintained Markov prediction tree. Existing approaches either widely store the URL nodes by building the tree with a fixed height in each branch, or only store the branches with frequently accessed URLs. Building the popularity information into the Markov prediction tree, we propose a new prefetching model, called popularity-based PPM. In this model, the tree is dynamically updated with a variable height in each set of branches where a popular URL can lead a set of long branches, and a less popular document leads a set of short ones. Since majority root nodes are popular URLs in our approach, the space allocation for storing nodes are effectively utilized. We have also included two additional optimizations in this model: (1) directly linking a root node to duplicated popular nodes in a surfing path to give popular URLs more considerations for prefetching; and (2) making a space optimization after the tree is built to further remove less popular nodes. Our trace-driven simulation results comparatively show a significant space reduction and an improved prediction accuracy of the proposed prefetching technique.

部分匹配预测(PPM)是Web预取中常用的一种技术，其中预取决策是基于动态维护的Markov预测树中的历史url做出的。现有的方法要么通过在每个分支中构建具有固定高度的树来广泛存储URL节点，要么仅存储具有频繁访问URL的分支。将流行度信息构建到马尔可夫预测树中，提出了一种新的预取模型，称为基于流行度的PPM。在这个模型中，树在每一组分支中使用可变高度动态更新，其中流行的URL可以引导一组长分支，而不太流行的文档可以引导一组短分支。由于我们的方法中大多数根节点都是流行的url，因此有效地利用了存储节点的空间分配。我们还在该模型中包含了两个额外的优化:(1)直接将根节点链接到冲浪路径中的重复流行节点，以使流行url更多地考虑预取;(2)在树构建完成后进行空间优化，进一步移除不太受欢迎的节点。我们的跟踪驱动仿真结果显示，所提出的预取技术显著减少了空间，并提高了预测精度。

{"title":"Popularity-based PPM: an effective Web prefetching technique for high accuracy and low storage","authors":"Xin Chen, Xiaodong Zhang","doi":"10.1109/ICPP.2002.1040885","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040885","url":null,"abstract":"Prediction by partial match (PPM) is a commonly used technique in Web prefetching, where prefetching decisions are made based on historical URLs in a dynamically maintained Markov prediction tree. Existing approaches either widely store the URL nodes by building the tree with a fixed height in each branch, or only store the branches with frequently accessed URLs. Building the popularity information into the Markov prediction tree, we propose a new prefetching model, called popularity-based PPM. In this model, the tree is dynamically updated with a variable height in each set of branches where a popular URL can lead a set of long branches, and a less popular document leads a set of short ones. Since majority root nodes are popular URLs in our approach, the space allocation for storing nodes are effectively utilized. We have also included two additional optimizations in this model: (1) directly linking a root node to duplicated popular nodes in a surfing path to give popular URLs more considerations for prefetching; and (2) making a space optimization after the tree is built to further remove less popular nodes. Our trace-driven simulation results comparatively show a significant space reduction and an improved prediction accuracy of the proposed prefetching technique.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129966537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings International Conference on Parallel Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀