[1993] Proceedings Seventh International Parallel Processing Symposium最新文献

英文中文

Global combine on mesh architectures with wormhole routing 全局结合网格架构与虫洞路由

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262873

M. Barnett, R. Littlefield, D. G. Payne, R. V. D. Geijn

Several algorithms are discussed for implementing global combine (summation) on distributed memory computers using a two-dimensional mesh interconnect with wormhole routing. These include algorithms that are asymptotically optimal for short vectors (O(log(p)) for p processing nodes) and for long vectors (O(n) for n data elements per node), as well as hybrid algorithms that are superior for intermediate n. Performance models are developed that include the effects of link conflicts and other characteristics of the underlying communication system. The models are validated using experimental data from the Intel Touchstone DELTA computer. Each of the combine algorithms is shown to be superior under some circumstances.<>

讨论了在分布式存储计算机上采用虫洞路由的二维网格互连实现全局组合(求和)的几种算法。这些算法包括对于短向量(对于p个处理节点为O(log(p))和对于长向量(对于每个节点n个数据元素为O(n))的渐近最优算法，以及对于中间n个数据元素优越的混合算法。开发的性能模型包括链路冲突的影响和底层通信系统的其他特征。利用英特尔Touchstone DELTA计算机的实验数据对模型进行了验证。在某些情况下，每种组合算法都显示出优越性。

引用次数: 87

A framework for predicting delay due to job interactions in a 2-D mesh multicomputer 二维网格多计算机作业交互延迟预测框架

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262905

Dugki Min, M. Mutka

The authors develop expressions for predicting contention delay for wormhole-routed 2-D mesh multicomputers. The detrimental effect of contention caused by interference within jobs has led them to analyze two different kinds of communication contention. Starting contention occurs when a processor attempts to access the network at the first hop on its route from the source to destination. Intermediate contention has different characteristics, and is the contention facing a communication path as the message arrives at intermediate nodes on its path from its source to destination. They describe how their expressions are developed and relate them to the problem of evaluating interference within a job assigned to a multicomputer.<>

建立了虫孔路由二维网格多机争用时延预测表达式。由于工作内部干扰引起的争论的不利影响，他们分析了两种不同类型的沟通争论。当处理器试图在其从源到目的路由的第一跳上访问网络时，就会发生开始争用。中间争用具有不同的特征，当消息到达从源到目的地的中间节点时，它是通信路径所面临的争用。他们描述了这些表达是如何形成的，并将它们与评估分配给多台计算机的工作中的干扰问题联系起来。

引用次数: 5

2D and 3D optimal parallel image warping 2D和3D最佳并行图像翘曲

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262901

C. Wittenbrink, A. Somani

Spatial image warping is useful for image processing and graphics. The authors present optimal concurrent-read-exclusive-write (CREW) and exclusive-read-exclusive-write (EREW) parallel-random-access-machine (PRAM) algorithms that achieve O(1) asymptotic run time. The significant result is the creative processor assignment that results in an EREW PRAM forward direct warp algorithm. The forward algorithm calculates any nonscaling affine transform. The EREW algorithm is the most efficient in practice, and 16k processor MasPar MP-1 can rotate a 4 million element image in under a second and a 2 million element volume in 1/2 of a second. This high performance allows interactive viewing of volumes from arbitrary viewpoints and illustrates linear speedup.<>

空间图像翘曲在图像处理和图形学中是非常有用的。提出了最优并发读-排他写(CREW)和排他读-排他写(EREW)并行随机存取机(PRAM)算法，实现了O(1)渐近运行时间。重要的结果是创造性的处理器分配，结果在EREW PRAM前向直接翘曲算法。前向算法计算任意非标度仿射变换。EREW算法在实践中是最有效的，16k处理器MasPar MP-1可以在1秒内旋转400万元素的图像，在1/2秒内旋转200万元素的体积。这种高性能允许从任意视点交互式查看卷，并说明线性加速。

引用次数: 26

A separation between reconfigurable mesh models 可重构网格模型之间的分离

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262860

P. MacKenzie

The author proves separations between two models of the reconfigurable mesh (rmesh), the cross-over model and the non-cross-over model. Specifically he shows that in the non-cross-over model, a k*n rmesh requires Omega ((log n)/k) time to compute the parity of n bits stored one per column, and a square root n* square root n rmesh requires Omega (log*n) time to compute the parity of n bits stored one per processor. In the cross-over model, in either case, the parity can be computed in constant time. The lower bounds given in this paper are the first separations demonstrated between the cross-over and non-cross-over model. These lower bounds do not rely on the bandwidth constraints of the mesh and do not restrict the instruction sets of the processors. Moreover, they are the first lower bounds for the rmesh which require only binary inputs.<>

作者证明了可重构网格的两种模型——交叉模型和非交叉模型之间的分离。具体来说，他表明，在非交叉模型中，k*n网格需要Omega ((log n)/k)时间来计算每列存储的n位的奇偶性，而平方根n*平方根n网格需要Omega (log*n)时间来计算每个处理器存储的n位的奇偶性。在交叉模型中，无论哪种情况，奇偶校验都可以在常数时间内计算出来。本文给出的下界是交叉和非交叉模型之间的第一次分离。这些下界不依赖于网格的带宽约束，也不限制处理器的指令集。此外，它们是网格的第一个下界，只需要二进制输入。

引用次数: 10

Critical performance path analysis, and efficient code generation issues, for the Seamless architecture 无缝架构的关键性能路径分析和高效代码生成问题

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262813

D. L. Bright, S. Fineberg, B. H. Pease, M. L. Roderick, S. Sundaram, T. Casavant

An analytical study of potential pathological performance areas of the Seamless architecture is presented. Seamless is a latency-tolerant, distributed memory, multiprocessor architecture. A key component of the philosophy of Seamless, however, is the use of standard, commodity components for a large part of the system. A discussion of the unavoidable implementation compromises imposed by this decision is presented, followed by a summary of some optimistic performance studies. Then an analytical study that parameterizes the predicts the worst-case impact of using standard components is provided. Finally, it is shown that these bottlenecks are manageable via careful generation of target machine code so that the optimistic performance studies become realistic expectations for a range of program behaviors and granularities.<>

对无缝结构的潜在病态性能区域进行了分析研究。Seamless是一种容忍延迟、分布式内存、多处理器架构。然而，Seamless理念的一个关键组成部分是对系统的大部分使用标准的商品组件。本文讨论了这一决定所带来的不可避免的执行妥协，然后总结了一些乐观的绩效研究。在此基础上，进行了参数化预测标准构件最坏影响的分析研究。最后，研究表明，这些瓶颈是可以通过仔细生成目标机器代码来管理的，因此乐观的性能研究成为对一系列程序行为和粒度的现实期望。

引用次数: 0

'Unstable threads' kernel interface for minimizing the overhead of thread switching “不稳定线程”内核接口，用于最小化线程切换的开销

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262872

S. Inohara, Kazuhiko Kato, T. Masuda

The performance of threads is limited primarily by the overhead of two kinds of switching: vertical switching (user/kernel domain switching) and horizontal switching (context switching between threads). Although these switchings are indispensable in some situations, existing thread mechanisms involve unnecessary switchings on multiprogrammed systems, because of inappropriate interfaces between the operating system kernel and user-level programs. This paper presents a set of interfaces between the kernel and user-level programs that minimizes the overhead of the two kinds of switchings. The kernel provides 'unstable threads,' which are controlled solely by the kernel, while each user-level program monitors them and gives suggestions on their activities to the kernel through a shared memory area between the kernel and user address spaces. This new way of separating thread management minimizes the overhead of vertical and horizontal switchings.<>

线程的性能主要受到两种切换开销的限制:垂直切换(用户/内核域切换)和水平切换(线程之间的上下文切换)。尽管这些切换在某些情况下是必不可少的，但是由于操作系统内核和用户级程序之间不适当的接口，现有的线程机制在多程序系统上涉及不必要的切换。本文提出了一组内核和用户级程序之间的接口，可以最大限度地减少这两种切换的开销。内核提供“不稳定线程”，这些线程完全由内核控制，而每个用户级程序通过内核和用户地址空间之间的共享内存区域监视它们，并向内核提供有关它们的活动的建议。这种分离线程管理的新方法最大限度地减少了垂直和水平切换的开销。

引用次数: 2

Cache coherence for shared memory multiprocessors based on virtual memory support 基于虚拟内存支持的共享内存多处理器缓存一致性

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262854

K. Petersen, Kai Li

This paper presents a software cache coherence scheme that uses virtual memory (VM) support to maintain cache coherency for shared memory multiprocessors. Traditional VM translation hardware in each processor is used to detect memory access attempts that would violate cache coherence and system software is used to enforce coherence. The implementation of this class of coherence schemes is very economical: it requires neither special multiprocessor hardware nor compiler support, and easily incorporates different consistency models. The authors evaluated two consistency models for the VM-based approach: sequential consistency and lazy release consistency. The VM-based schemes are compared with a bus based snoopy caching architecture, and the authors' trace-driven simulation results show that the VM-based cache coherence schemes are practical for small-scale, shared memory multiprocessors.<>

本文提出了一种利用虚拟内存(VM)支持来保持共享内存多处理器缓存一致性的软件缓存一致性方案。每个处理器中的传统VM翻译硬件用于检测可能违反缓存一致性的内存访问尝试，并使用系统软件来强制一致性。这类一致性方案的实现非常经济:它既不需要特殊的多处理器硬件，也不需要编译器的支持，而且很容易合并不同的一致性模型。作者评估了基于vm方法的两种一致性模型:顺序一致性和延迟释放一致性。将基于虚拟机的缓存一致性方案与基于总线的snoopy缓存体系结构进行了比较，作者的跟踪驱动仿真结果表明，基于虚拟机的缓存一致性方案适用于小型共享内存多处理器。

引用次数: 30

Task scheduling on a hypercube with link contentions 具有链路争用的超多维数据集上的任务调度

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262907

S. Kon'ya, T. Satoh

The authors propose a new task scheduling algorithm, which takes communication delays and link contentions into account to meet the requirements of a communication model of a hypercube. It assigns a priority which includes communication delays to each task and selects the processor where the task will be allocated in order to minimize link contentions. Evaluation has been carried out by using randomly generated graphs. The results show that almost linear speed-up is obtained when the number of tasks is 1024 and the number of processors ranges between 2 and 32. A ratio of communication time to processing time (C/P), which indicates the difficulty of scheduling task graphs with communication, is introduced and verifies the effectiveness of the proposed algorithm.<>

为了满足超立方体通信模型的要求，提出了一种考虑通信延迟和链路争用的任务调度算法。它为每个任务分配一个优先级，其中包括通信延迟，并选择将任务分配到哪里的处理器，以最小化链路争用。通过使用随机生成的图进行评估。结果表明，当任务数为1024，处理器数在2 ~ 32之间时，获得了近乎线性的加速。通过引入通信时间与处理时间之比(C/P)来反映具有通信的任务图调度的难度，并验证了该算法的有效性。

引用次数: 9

Towards understanding block partitioning for sparse Cholesky factorization 理解稀疏Cholesky分解的块划分

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262780

Sesh Venugopal, V. Naik

The authors examine the effect of two partitioning parameters on the performance of block-based distributed sparse Cholesky factorization. They present result to show the trends in the effect of these parameters on the computation speeds, communication costs, extent of processor idling because of load imbalances, and bookkeeping overheads. These results provide a better understanding in selecting the partitioning parameters so as to reduce the computation and communication costs without increasing the overhead costs or the load imbalance among the processors. Experimental results from a 32-processor iPSC/860 are presented.<>

研究了两个分区参数对基于块的分布式稀疏Cholesky分解性能的影响。他们给出的结果显示了这些参数对计算速度、通信成本、由于负载不平衡而导致的处理器空闲程度和簿记开销的影响趋势。这些结果为选择分区参数提供了更好的理解，从而在不增加开销成本或处理器之间的负载不平衡的情况下减少计算和通信成本。本文给出了在32处理器iPSC/860上的实验结果。

引用次数: 2

New wormhole routing algorithms for multicomputers 多计算机的新虫洞路由算法

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262919

R. Boppana, S. Chalasani

Development of wormhole routing techniques so far has been largely independent of the results available for store-and-forward routing in literature. The authors provide a general result which enables them to design deadlock-free wormhole routing algorithms from store-and-forward routing algorithms that satisfy certain criteria. They illustrate this result by developing fully-adaptive deadlock-free wormhole routing algorithms from two well-known store-and-forward algorithms: the positive- and negative-hop algorithms based on the number of hops taken by messages. They compare the negative-hop algorithm with the commonly used non-adaptive e-cube and recently proposed partially adaptive north-last algorithm.<>

迄今为止，虫洞路由技术的发展在很大程度上与文献中存储转发路由的结果无关。作者提供了一个通用的结果，使他们能够从满足一定条件的存储转发路由算法中设计无死锁的虫洞路由算法。他们通过开发完全自适应无死锁虫洞路由算法来说明这一结果，该算法来自两种著名的存储转发算法:基于消息所采取的跳数的正跳和负跳算法。他们将负跳算法与常用的非自适应e-cube和最近提出的部分自适应北-末算法进行了比较。

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1993] Proceedings Seventh International Parallel Processing Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀