Proceedings International Conference on Parallel Processing最新文献

英文中文

Out-of-order instruction fetch using multiple sequencers 使用多个顺序器获取无序指令

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040855

P. Oberoi, G. Sohi

Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide fetch, is inefficient for modern out-of-order processors. Instead of the usual approach of fetching large blocks of instructions from a single point in the program, we propose a high-bandwidth fetch mechanism that fetches small blocks of instructions from multiple points in a program. In this paper, we demonstrate that it is possible to achieve high-bandwidth fetch by using multiple narrow fetch units operating in parallel. Our mechanism performs as well as a trace cache, does not waste cache space, is more resilient to instruction cache misses, and is a natural fit for techniques that require fetching multiple threads, like multithreading, dual-path execution, and speculative threads.

传统的指令获取机制在每个周期中获取连续的指令块。它们很难扩展，因为采取分支使得很难将这些块的大小增加到超过8条指令。已经提出了跟踪缓存作为此问题的解决方案，但是它们对缓存空间的使用效率不高。我们表明，对于现代乱序处理器，获取大块连续指令或广泛获取是低效的。与通常从程序中的单个点获取大块指令的方法不同，我们提出了一种从程序中的多个点获取小块指令的高带宽获取机制。在本文中，我们证明了通过使用多个窄取单元并行操作来实现高带宽取是可能的。我们的机制执行起来和跟踪缓存一样好，不浪费缓存空间，对指令缓存丢失更有弹性，并且非常适合需要获取多个线程的技术，比如多线程、双路径执行和推测线程。

引用次数: 9

A new mechanism for congestion and deadlock resolution 解决拥塞和死锁的新机制

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040862

Y. Song, T. Pinkston

Efficient and reliable communication is essential for achieving high performance in a networked computing environment. Limited network resources bring about unavoidable competition among in-flight packets, resulting in network congestion and possibly deadlock. Many techniques have been proposed to improve performance by efficiently handling network congestion and deadlock. However, none of them provide an efficient way of accelerating the movement of packets involved in congestion onward to their destinations. In this paper, we propose a new mechanism for the detection and resolution of network congestion and deadlocks. The proposed mechanism is based on increasing the scheduling priority of packets involved in congestion and providing necessary resources for those packets to make forward progress. Simulation results show that the proposed technique outperforms previously proposed techniques by effectively dispersing network congestion.

高效可靠的通信对于在网络计算环境中实现高性能至关重要。有限的网络资源导致了在飞数据包之间不可避免的竞争，导致网络拥塞甚至可能出现死锁。人们提出了许多技术来通过有效地处理网络拥塞和死锁来提高性能。然而，它们都没有提供一种有效的方法来加速拥塞中的数据包向目的地的移动。在本文中，我们提出了一种检测和解决网络拥塞和死锁的新机制。提出的机制是基于提高拥塞中数据包的调度优先级，并为这些数据包提供必要的资源以使其向前发展。仿真结果表明，该方法有效地分散了网络拥塞，优于先前提出的方法。

引用次数: 18

Minimal sensor integrity in sensor grids 最小的传感器完整性传感器网格

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040914

R. Kannan, S. Sarangi, S. Ray, S. Iyengar

Given the increasing importance of optimal sensor deployment for battlefield strategists, the converse problem of reacting to a particular deployment by an enemy is equally significant and not yet addressed in a quantifiable manner in the literature. We address this issue by modeling a two stage game in which the opponent deploys sensors to cover a sensor field and we attempt to maximally reduce his coverage at minimal cost. In this context, we introduce the concept of minimal sensor integrity which measures Me vulnerability of any sensor deployment. We find the best response by quantifying the merits of each response. While the problem of optimally deploying sensors subject to coverage constraints is NP-complete, in this paper we show that the best response (i.e. the maximum vulnerability) can be computed in polynomial time for sensors with arbitrary coverage capabilities deployed over points in any dimensional space. In the special case when sensor coverages form an interval graph (as in a linear grid), we describe a better O(Min(M/sup 2/, NM)) dynamic programming algorithm.

鉴于最佳传感器部署对战场战略家的重要性日益增加，对敌人的特定部署作出反应的相反问题同样重要，但在文献中尚未以可量化的方式解决。我们通过建模一个两阶段博弈来解决这个问题，在这个博弈中，对手部署传感器来覆盖传感器场，我们试图以最小的成本最大限度地减少他的覆盖范围。在这种情况下，我们引入了最小传感器完整性的概念，它可以衡量任何传感器部署的脆弱性。我们通过量化每种反应的优点来找到最佳反应。虽然最优部署受覆盖约束的传感器的问题是np完全的，但在本文中，我们表明，对于部署在任何维度空间中的点上具有任意覆盖能力的传感器，可以在多项式时间内计算出最佳响应(即最大脆弱性)。在传感器覆盖形成区间图(如线性网格)的特殊情况下，我们描述了一种更好的O(Min(M/sup 2/， NM))动态规划算法。

引用次数: 8

Reliable MAC layer multicast in IEEE 802.11 wireless networks IEEE 802.11无线网络中可靠的MAC层组播

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1002/wcm.129

Min-Te Sun, Lifei Huang, Shao-Cheng Wang, A. Arora, T. Lai

Multicast/broadcast is an important service primitive in networks. The IEEE 802.11 multicast/broadcast protocol is based on the basic access procedure of Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA). This protocol does not provide any media access control (MAC) layer recovery on multicast/broadcast frames. As a result, the reliability of the multicast/broadcast service is reduced due to the increased probability of lost frames resulting from interference or collisions. In this paper, we propose a reliable Batch Mode Multicast MAC protocol, BMMM, which substentially reduces the number of contention phases, thus considerably reduces the time required for a multicast/broadcast. We then propose a Location Aware Multicast MAC protocol, LAMM, that uses station location information to further improve upon BMMM. Extensive analysis and simulation results validate the reliability and efficiency of our multicast MAC protocols.

组播/广播是网络中重要的服务原语。IEEE 802.11组播/广播协议基于CSMA/CA (Carrier Sense Multiple access with Collision Avoidance)的基本接入流程。该协议不提供任何对多播/广播帧的媒体访问控制(MAC)层恢复。因此，由于干扰或碰撞导致的丢失帧的可能性增加，因此降低了组播/广播业务的可靠性。在本文中，我们提出了一种可靠的批处理模式组播MAC协议BMMM，它大大减少了争用阶段的数量，从而大大减少了组播/广播所需的时间。然后，我们提出了一种位置感知多播MAC协议LAMM，它利用站点位置信息来进一步改进BMMM。大量的分析和仿真结果验证了多播MAC协议的可靠性和有效性。

引用次数: 264

Dead timestamp identification in Stampede Stampede中的死时间戳识别

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040864

Nissim Harel, Hasnain A. Mandviwala, K. Knobe, U. Ramachandran

Stampede is a parallel programming system to support computationally demanding applications including interactive vision, speech and multimedia collaboration. The system alleviates concerns such as communication, synchronization, and buffer management in programming such real-time stream-oriented applications. Threads are loosely connected by channels that hold timestamped data items. There are two performance concerns when programming with Stampede. The first is space, namely, ensuring that memory is not wasted on items that are not fully processed. The second is time, namely, ensuring that processing resource is not wasted on a timestamp that is not fully processed. In this paper we introduce a single unifying framework, dead timestamp identification, that addresses both the space and time concerns simultaneously. Dead timestamps on a channel represent garbage. Dead timestamps at a thread represent computations that need not be performed. This framework has been implemented in the Stampede system. Experimental results showing the space advantage of this framework are presented. Using a color-based people tracker application, we show that the space advantage can be significant (up to 40%) compared to the previous garbage collection techniques in Stampede.

Stampede是一个并行编程系统，支持计算要求高的应用程序，包括交互式视觉、语音和多媒体协作。该系统减轻了编程实时流应用程序时的通信、同步和缓冲区管理等问题。线程通过保存时间戳数据项的通道松散连接。在使用Stampede编程时，有两个性能问题。首先是空间，即确保内存不会浪费在未完全处理的项目上。第二个是时间，即确保处理资源不会浪费在未完全处理的时间戳上。在本文中，我们引入了一个单一的统一框架，即死时间戳识别，它同时解决了空间和时间问题。信道上的死时间戳代表垃圾。线程中的死时间戳表示不需要执行的计算。该框架已在Stampede系统中实现。实验结果表明了该框架的空间优势。通过使用基于颜色的人员跟踪应用程序，我们发现与Stampede中以前的垃圾收集技术相比，空间优势非常明显(高达40%)。

{"title":"Dead timestamp identification in Stampede","authors":"Nissim Harel, Hasnain A. Mandviwala, K. Knobe, U. Ramachandran","doi":"10.1109/ICPP.2002.1040864","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040864","url":null,"abstract":"Stampede is a parallel programming system to support computationally demanding applications including interactive vision, speech and multimedia collaboration. The system alleviates concerns such as communication, synchronization, and buffer management in programming such real-time stream-oriented applications. Threads are loosely connected by channels that hold timestamped data items. There are two performance concerns when programming with Stampede. The first is space, namely, ensuring that memory is not wasted on items that are not fully processed. The second is time, namely, ensuring that processing resource is not wasted on a timestamp that is not fully processed. In this paper we introduce a single unifying framework, dead timestamp identification, that addresses both the space and time concerns simultaneously. Dead timestamps on a channel represent garbage. Dead timestamps at a thread represent computations that need not be performed. This framework has been implemented in the Stampede system. Experimental results showing the space advantage of this framework are presented. Using a color-based people tracker application, we show that the space advantage can be significant (up to 40%) compared to the previous garbage collection techniques in Stampede.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129269876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Effective methodology for deadlock-free minimal routing in InfiniBand networks InfiniBand网络中无死锁最小路由的有效方法

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040897

J. Sancho, A. Robles, J. Flich, P. López, J. Duato

The InfiniBand Architecture (IBA) defines a switch-based network with point-to-point links whose topology is arbitrarily established by the customer. We propose a simple and effective methodology for designing deadlock-free routing strategies that are able to route packets through minimal paths in InfiniBand networks. This methodology can meet the trade-off between network performance and the number of resources dedicated to deadlock avoidance. Evaluation results show that the resulting routing strategies significantly outperform up*/down* routing. In particular, throughput improvement ranges, on average, from 1.33 for small networks to 4.05 for large networks. Also, it is shown that just two virtual lanes and three service levels are enough to achieve more than 80% of the throughput improvement achieved by the best proposed routing strategy (the one that always provides minimal paths without limiting the number of resources).

ib (InfiniBand Architecture)架构定义了一个基于交换机的点对点链路网络，其拓扑结构由客户任意建立。我们提出了一种简单有效的方法来设计无死锁路由策略，该策略能够通过InfiniBand网络中的最小路径路由数据包。这种方法可以满足网络性能和用于避免死锁的资源数量之间的权衡。评估结果表明，所得到的路由策略明显优于上行/下行路由。特别是，吞吐量改进的平均范围从小型网络的1.33到大型网络的4.05。此外，研究还表明，仅两个虚拟通道和三个服务级别就足以实现最佳路由策略(在不限制资源数量的情况下始终提供最小路径的策略)所实现的80%以上的吞吐量改进。

引用次数: 49

A lower-bound algorithm for minimizing network communication in real-time systems 实时系统中最小化网络通信的下界算法

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040890

Cecilia Ekelin, Jan Jonsson

In this paper, we propose a pseudo-polynomial-time lower-bound algorithm for the problem of assigning and scheduling real-time tasks in a distributed system such that the network communication is minimized The key feature of our algorithm is translating the task assignment problem into the so called k-cut problem of a graph, which is known to be solvable in polynomial time for fixed k. Experiments show that the lower bound computed by our algorithm in fact is optimal in up to 89% of the cases and increases the speed of an overall optimization algorithm by a factor of two on average.

在本文中，我们提出了一种伪多项式时间下界算法来解决分布式系统中实时任务的分配和调度问题，使网络通信最小化。该算法的关键特征是将任务分配问题转化为图的k-切问题。对于固定k，已知可以在多项式时间内解决。实验表明，由我们的算法计算的下界实际上在高达89%的情况下是最优的，并且平均将整体优化算法的速度提高了两倍。

引用次数: 7

Dynamic service composition for wireless Web access 用于无线Web访问的动态服务组合

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040899

S. Chuang, A. Chan, Jiannong Cao

Describes a Web proxy architecture called WebPADS, short for "Web Proxy for actively deployable services." The RebPADS was developed to enhance Web applications running on a wireless network. The RebPADS provides mechanisms to automatically locate and configure a flexible and adaptive wireless Web proxy. In addition, it provides a framework that facilitates the development of add-on services, where the services can be actively deployed and migrated across Web proxies, in order to adapt to the changing wireless environment.

描述一种称为WebPADS的Web代理体系结构，它是“主动部署服务的Web代理”的缩写。开发RebPADS是为了增强在无线网络上运行的Web应用程序。RebPADS提供了自动定位和配置灵活、自适应的无线Web代理的机制。此外，它还提供了一个促进附加服务开发的框架，可以在其中跨Web代理主动部署和迁移服务，以适应不断变化的无线环境。

引用次数: 7

Optimal video replication and placement on a cluster of video-on-demand servers 最佳视频复制和放置在视频点播服务器集群上

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040912

Xiaobo Zhou, Chengzhong Xu

A cost-effective approach to building up scalable video-on-demand (VoD) servers is to couple a number of VoD servers together in a cluster. In this article, we study a crucial video replication and placement problem in a distributed storage VoD cluster for high quality and high availability services. We formulate it as a combinatorial optimization problem with objectives of maximizing the encoding bit rate and the number of replicas of each video and balancing the workload of the servers. It is subject to the constraints of the storage capacity and the outgoing network bandwidth of the servers. Under the assumption of single fixed encoding bit rate for all videos, we give an optimal replication algorithm and a bounded-placement algorithm for videos with different popularities. To reduce the complexity of the replication algorithm, we present an efficient algorithm that utilizes the Zipf-like video popularity distributions to approximate the optimal solution. For videos with scalable encoding bit rates, we propose a heuristic algorithm based on simulated annealing. We conduct a comprehensive performance evaluation of the algorithms and demonstrate their effectiveness via simulations over a synthetic workload set.

建立可扩展的视频点播(VoD)服务器的一种经济有效的方法是将多个视频点播服务器耦合在一个集群中。在本文中，我们研究了在分布式存储VoD集群中实现高质量和高可用性服务的关键视频复制和放置问题。我们将其表述为一个组合优化问题，其目标是最大化编码比特率和每个视频的副本数量，并平衡服务器的工作负载。受服务器的存储容量和出网络带宽的限制。在对所有视频采用单一固定编码比特率的假设下，针对不同流行度的视频给出了最优复制算法和有界放置算法。为了降低复制算法的复杂性，我们提出了一种有效的算法，该算法利用类似zipf的视频流行度分布来近似最优解。对于编码码率可扩展的视频，我们提出了一种基于模拟退火的启发式算法。我们对算法进行了全面的性能评估，并通过在合成工作负载集上的模拟来证明它们的有效性。

{"title":"Optimal video replication and placement on a cluster of video-on-demand servers","authors":"Xiaobo Zhou, Chengzhong Xu","doi":"10.1109/ICPP.2002.1040912","DOIUrl":"https://doi.org/10.1109/ICPP.2002.1040912","url":null,"abstract":"A cost-effective approach to building up scalable video-on-demand (VoD) servers is to couple a number of VoD servers together in a cluster. In this article, we study a crucial video replication and placement problem in a distributed storage VoD cluster for high quality and high availability services. We formulate it as a combinatorial optimization problem with objectives of maximizing the encoding bit rate and the number of replicas of each video and balancing the workload of the servers. It is subject to the constraints of the storage capacity and the outgoing network bandwidth of the servers. Under the assumption of single fixed encoding bit rate for all videos, we give an optimal replication algorithm and a bounded-placement algorithm for videos with different popularities. To reduce the complexity of the replication algorithm, we present an efficient algorithm that utilizes the Zipf-like video popularity distributions to approximate the optimal solution. For videos with scalable encoding bit rates, we propose a heuristic algorithm based on simulated annealing. We conduct a comprehensive performance evaluation of the algorithms and demonstrate their effectiveness via simulations over a synthetic workload set.","PeriodicalId":393916,"journal":{"name":"Proceedings International Conference on Parallel Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128339973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Linux/SimOS - a simulation environment for evaluating high-speed communication systems 用于评估高速通信系统的仿真环境

Proceedings International Conference on Parallel Processing

Pub Date : 2002-08-18 DOI: 10.1109/ICPP.2002.1040874

Chulho Won, Ben Lee, Chansu Yu, S. Moh, Yong-Youn Kim, K. Park

This paper presents Linux/SimOS, a Linux operating system port to SimOS, which is a complete machine simulator from Stanford. The motivation for Linux/SimOS is to alleviate the limitations of SimOS, which only supports proprietary operating systems. The contributions made in this paper are two-fold: First, the major modifications that were necessary to run Linux on SimOS are described. Second, a detailed analysis of the UDP/IP protocol and M-VIA is performed to demonstrate the capabilities of Linux/SimOS. The simulation study shows that Linux/SimOS is capable of capturing all aspects of communication performance, including the effects of the kernel, device drivers, and network interface.

本文介绍了Linux/SimOS，一个Linux操作系统移植到SimOS，这是斯坦福大学的一个完整的机器模拟器。Linux/ simo的动机是减轻simo的限制，simo只支持专有操作系统。本文的贡献有两个方面:首先，描述了在simo上运行Linux所必需的主要修改。其次，对UDP/IP协议和M-VIA进行了详细分析，以演示Linux/ simo的功能。仿真研究表明，Linux/ simo能够捕获通信性能的各个方面，包括内核、设备驱动程序和网络接口的影响。

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings International Conference on Parallel Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀