Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing最新文献

英文中文

Benchmarking parallel simulation algorithms 基准并行仿真算法

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472248

L. Barriga, R. Rönngren, R. Ayani

Parallel simulation has been an active research area for more than a decade. The parallel simulation community needs a common benchmark suite for performance evaluation of parallel simulation environments. Performance evaluation of a parallel simulation environment is harder than evaluating a parallel processing system, since the underlying system is nor only composed of architecture and operating system, but also of simulation kernel. Thus, simulation kernel designers often confront a twofold task: (i) to evaluate how efficiently their simulation kernel runs on certain architectures; and (ii) to evaluate how simulation problems scale using this kernel In this paper we advocate an incremental benchmarking methodology that focuses on the evaluation of a parallel simulation system which is based on Time Warp. We start from a reduced set of ping models that can effectively estimate the various overheads, contention and latencies of Time Warp running on a multiprocessor. The benchmark suite has been used to locate several sources of overhead in an existing Time Warp implementation. Using this benchmark suite we also compare the performance of the improved version of the Time Warp implementation with the original one.

十多年来，并行仿真一直是一个活跃的研究领域。并行仿真社区需要一个通用的基准套件来评估并行仿真环境的性能。并行仿真环境的性能评估比并行处理系统的性能评估困难，因为底层系统不仅由体系结构和操作系统组成，而且由仿真内核组成。因此，仿真内核设计者经常面临双重任务:(i)评估他们的仿真内核在某些体系结构上的运行效率;在本文中，我们提倡一种增量基准测试方法，该方法侧重于评估基于Time Warp的并行仿真系统。我们从一组简化的ping模型开始，这些模型可以有效地估计在多处理器上运行Time Warp的各种开销、争用和延迟。基准测试套件已被用于定位现有Time Warp实现中的几个开销源。使用这个基准测试套件，我们还比较了改进版本的Time Warp实现与原始版本的性能。

{"title":"Benchmarking parallel simulation algorithms","authors":"L. Barriga, R. Rönngren, R. Ayani","doi":"10.1109/ICAPP.1995.472248","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472248","url":null,"abstract":"Parallel simulation has been an active research area for more than a decade. The parallel simulation community needs a common benchmark suite for performance evaluation of parallel simulation environments. Performance evaluation of a parallel simulation environment is harder than evaluating a parallel processing system, since the underlying system is nor only composed of architecture and operating system, but also of simulation kernel. Thus, simulation kernel designers often confront a twofold task: (i) to evaluate how efficiently their simulation kernel runs on certain architectures; and (ii) to evaluate how simulation problems scale using this kernel In this paper we advocate an incremental benchmarking methodology that focuses on the evaluation of a parallel simulation system which is based on Time Warp. We start from a reduced set of ping models that can effectively estimate the various overheads, contention and latencies of Time Warp running on a multiprocessor. The benchmark suite has been used to locate several sources of overhead in an existing Time Warp implementation. Using this benchmark suite we also compare the performance of the improved version of the Time Warp implementation with the original one.","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129181572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

An introduction to the analysis and debug of distributed computations 介绍分布式计算的分析与调试

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472239

E. Fromentin, N. Plouzeau, M. Raynal

Distributed programs are much more difficult to design, understand and implement than sequential or parallel ones. This is mainly due to the uncertainty created by the asynchrony inherent to distributed machines. So appropriate concepts and tools have to be devised to help the programmer of distributed applications in his task. This paper is motivated by the practical problem called distributed debugging. It presents concepts and tools that help the programmer to analyze distributed executions. Two basic problems are addressed: replay of a distributed execution (how to reproduce an equivalent execution despite of asynchrony) and the detection of a stable or unstable property of a distributed execution. Concepts and tools presented are fundamental when designing an environment for distributed program development. This paper is essentially a survey presenting a state of the art in replay mechanisms and detection of unstable properties on global states of distributed executions.<>

与顺序或并行程序相比，分布式程序的设计、理解和实现要困难得多。这主要是由于分布式机器固有的异步所造成的不确定性。因此，必须设计适当的概念和工具来帮助分布式应用程序的程序员完成他的任务。本文的灵感来自于分布式调试的实际问题。它提供了帮助程序员分析分布式执行的概念和工具。解决了两个基本问题:分布式执行的重播(如何在异步的情况下重现等效的执行)和分布式执行的稳定或不稳定属性的检测。所提出的概念和工具是设计分布式程序开发环境的基础。本文本质上是对分布式执行全局状态下的重放机制和不稳定属性检测的最新研究。

引用次数: 12

Integrating memory consistency models and communication systems 集成内存一致性模型和通信系统

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472235

F. Schon

The shared memory paradigm offers a well known programming model for parallel systems. But it lacks from its bad performance in conventional implementations if it is used in large grain or page based systems. The main problems are (1) the transparent view on the system level, (2) the false sharing caused by locating several consistency units into the same transportation unit, and that (3) high level software implementations are not integrated within the system architecture. The first point is addressed by annotating programming objects and deriving a specific configuration of system functionalities. The second point is solved by GAME, the General and Autonomous Merging Environment which allows a multiple reader, multiple writer approach. The third point is directed by three implementation models of GAME. A hardware based implementation and even a software based implementation are able to hide the costs of the local activities to perform GAME by the network latency.<>

共享内存范式为并行系统提供了一个众所周知的编程模型。但是，如果在大粒度或基于页面的系统中使用它，则会因其在传统实现中的糟糕性能而有所不足。主要问题有:(1)系统级别的透明视图，(2)将几个一致性单元定位到同一个传输单元中导致的错误共享，以及(3)高级软件实现没有集成在系统架构中。第一点是通过注释编程对象和推导系统功能的特定配置来解决的。第二点是通过GAME解决的，GAME是通用和自治合并环境，它允许多个读取器，多个写入器的方法。第三点以GAME的三种实现模式为指导。基于硬件的实现，甚至是基于软件的实现，都能够通过网络延迟来隐藏本地活动执行GAME的成本

引用次数: 0

Synthesis of systolic arrays from single assignment algorithm 单赋值算法合成收缩数组

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472164

A. Al-Khalili

A systematic method of mapping algorithms from single assignment algorithms into systolic arrays is presented. The method is based on a space-time mapping technique of the index sets. We present a method of generation and selection of a valid transform dependency matrix that will yield an optimal or near optimal systolic array once it is mapped. The proposed method increases the visibility of the architecture in terms of processor delay and communication between processors at the algorithmic level, so that the designer is able to select a desired array at early stages of the design. An example of the proposed method is given.<>

提出了一种从单赋值算法到收缩数组映射算法的系统方法。该方法基于索引集的时空映射技术。我们提出了一种生成和选择有效变换依赖矩阵的方法，一旦映射，将产生最优或接近最优的收缩数组。所提出的方法增加了架构在处理器延迟和处理器之间通信方面的可见性，从而使设计人员能够在设计的早期阶段选择所需的阵列。最后给出了该方法的一个实例。

引用次数: 2

Handling data skew in parallel hash join computation using two-phase scheduling 使用两阶段调度处理并行哈希连接计算中的数据倾斜

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472237

Xiaofang Zhou, M. Orlowska

A large number of parallel join algorithms has been proposed to maintain load-balancing in the presence of data skew. However, one important type of data skew-join product skew (JPS)-has been little studied. In this paper, a dynamic parallel join algorithm, which employs a two-phase scheduling procedure, is designed to handle the JPS problem. Two sets of scheduling heuristics are studied against various parameters. It is shown that many of the existing algorithms can be regarded as a special case of our algorithm, whose cost is based on the nature of data skew. While it can cope with JPS which other algorithms cannot approach, it can be as efficient as most existing algorithms when JPS does not exist.<>

为了在存在数据倾斜的情况下保持负载平衡，已经提出了大量的并行连接算法。然而，一种重要的数据倾斜类型-连接产品倾斜(JPS)-很少被研究。本文设计了一种采用两阶段调度过程的动态并行连接算法来处理JPS问题。针对不同的调度参数，研究了两组调度启发式算法。结果表明，现有的许多算法都可以看作是我们算法的一个特例，其代价取决于数据倾斜的性质。虽然它可以处理其他算法无法处理的JPS，但当JPS不存在时，它可以像大多数现有算法一样高效。

引用次数: 18

On the acceleration of stencil operations in the data-parallel solution of PDEs PDEs数据并行解中模板运算的加速问题

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472192

D. Harrar

We propose some non-standard, yet straightforward, and highly efficacious alternative modes of data assignment which induce a significant reduction in communication volume and hence in execution time for stencil operations, i.e. local iterative updates, implemented within a data-parallel programming environment. Performance results obtained in the solution of two three-dimensional elliptic partial differential equations (PDEs) using iterative methods entailing such updates indicate that substantial performance increases can be realized using these alternative data assignment schemes.<>

我们提出了一些非标准的，但直接的，高效的数据分配替代模式，这些模式可以显著减少通信量，从而减少模板操作的执行时间，即在数据并行编程环境中实现的本地迭代更新。使用迭代方法求解两个三维椭圆型偏微分方程(PDEs)的性能结果表明，使用这些替代数据分配方案可以实现实质性的性能提高。

引用次数: 1

Block-level prediction for wide-issue superscalar processors 大规模超标量处理器的块级预测

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472179

S. Dutta, M. Franklin

Changes in control flow, caused primarily by conditional branches, are a prime impediment to the performance of wide-issue superscalar processors. This paper investigates a block-level prediction scheme to mitigate the effects of control flow changes caused by conditional branches. Instead of predicting the outcome of each conditional branch individually, this scheme predicts the target of a sequential block of instructions, thereby allowing the superscalar processor to go past multiple branches per cycle. This approach is evaluated using the MIPS architecture, for 8-way and 12-way superscalar processors, and an improvement in effective fetch size of approximately 15% and 25%, respectively, over identical processors that use branch prediction is observed. No appreciable difference in the prediction accuracy was observed, although block-level prediction predicted one out of four outcomes.<>

控制流的变化主要是由条件分支引起的，这是大问题超标量处理器性能的主要障碍。本文研究了一种块级预测方案，以减轻条件分支引起的控制流变化的影响。该方案不是单独预测每个条件分支的结果，而是预测顺序指令块的目标，从而允许超标量处理器在每个周期内经过多个分支。对于8路和12路标量处理器，使用MIPS架构对这种方法进行了评估，并且观察到与使用分支预测的相同处理器相比，有效提取大小分别提高了大约15%和25%。虽然块级预测预测了四分之一的结果，但预测准确性没有明显差异

引用次数: 4

Designing a new encryption method for optimum parallel performance 设计一种优化并行性能的新加密方法

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472276

K. C. Posch, R. Posch

This paper describes the design process from algorithm design to the chip level for a parallel implementation of a modified version of the RSA encryption method. The final system consists of several dozens of custom chips computing module exponentiation based on residue number system coding. Emphasis is put on the hierarchical design view, its benefits and ifs shortcomings.<>

本文描述了从算法设计到芯片级并行实现一种改进版RSA加密方法的设计过程。最终系统由几十个定制芯片计算模块组成，基于剩余数系统编码求幂。重点介绍了分层设计的观点，它的优点和缺点

引用次数: 1

Vectoring the N-body problem on the CM-5 在CM-5上对n体问题进行矢量化

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472279

F. Wang, Young-il Choo

We develop an optimized program for the N-body problem on the CM-5 with vector units. The work is intended to make full use of the power of the vector pipelines provided by the CM-5 equipped with vector units to improve the computation performance. Some development issues using the vector units are discussed. The code is written in CDPEAC, an assembly-like language which can be called from C. Performance data and some analysis results are given.<>

针对CM-5的n体问题，提出了一种矢量单元优化方案。本工作旨在充分利用CM-5配备矢量单元所提供的矢量管道的能力，提高计算性能。讨论了使用矢量单位的一些开发问题。代码是用CDPEAC编写的，这是一种可以从c语言中调用的类汇编语言

引用次数: 1

Dynamic bandwidth allocation for VBR video sources in ATM based BISDN 基于ATM的BISDN中VBR视频源的动态带宽分配

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1995-04-19 DOI: 10.1109/ICAPP.1995.472175

Young-Chon Kim, Pal-Jin Lee, D. Choi, Byung-Ok Kim, Sungwan Park, Young-sun Kim

With variable bit rate (VBR) video sources, adjacent slices in a frame are strongly correlated with each other. This is also the case for the frame represented by frame correlation. VBR video sources can be statistically characterized by peak rate, average rate, and standard deviation of the rate of generated cells. Taking account of each correlative and statistical properties, VBR video sources can be more efficiently transmitted by estimating the required bandwidth. In this paper, we propose a scheme that predicts and allocates dynamically transmission bandwidth for VBR video sources in ATM based BISDN. The performance of the proposed scheme is evaluated through simulations. Simulation results show that the proposed scheme is superior to the conventional ones in terms of bandwidth utilization and cell loss rate.<>

在可变比特率(VBR)视频源中，帧中的相邻片之间具有很强的相关性。对于由帧相关表示的帧也是如此。VBR视频源可以通过生成细胞的速率的峰值速率、平均速率和标准偏差进行统计表征。考虑到VBR视频源的各种相关特性和统计特性，通过对所需带宽的估计，可以提高VBR视频源的传输效率。本文提出了一种基于ATM的BISDN中VBR视频源传输带宽的动态预测和分配方案。通过仿真对该方案的性能进行了评价。仿真结果表明，该方案在带宽利用率和小区损失率方面均优于传统方案。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀