Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing最新文献

英文中文

Special purpose neurocomputers: an automatic design approach 专用神经计算机:一种自动设计方法

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651532

A. Basaglia, W. Fornaciari, F. Salice

A methodology to design a digital special purpose neurocomputer implementing feedforward multilayer neural networks is presented. The design flow consists of three stages: the weight discretization, which relaxes the precision requirements maintaining the compatibility with the original model; the architectural synthesis, which transforms the abstract description into an optimized digital structure; and the VHDL model generation, which produces the VHDL description of the general purpose neurocomputer by using a set of parametric components.

提出了一种实现前馈多层神经网络的数字专用神经计算机的设计方法。设计流程包括三个阶段:权值离散化，在保持与原模型兼容的前提下，放宽了精度要求;建筑综合，将抽象的描述转化为优化的数字结构;VHDL模型生成，利用一组参数化组件生成通用神经计算机的VHDL描述。

引用次数: 2

Update based distributed shared memory integrated into RHODOS' memory management 基于更新的分布式共享内存集成到RHODOS的内存管理中

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651494

J. Silcock, A. Gościński

The DSM system we propose in this paper is implemented completely at the operating system level as a component of RHODOS' Memory (Space) Manager. In addition, it is integrated with RHODOS' existing invalidation-based DSM allowing the programmers to choose the consistency protocol best suited to their application. These factors enable RHODOS DSM to provide the user with a transparent, efficient and scalable shared memory programming environment. In this paper, we describe the logical design, implementation and performance study of an update based DSM which strictly adheres to the above criteria. These criteria allow the user to program using a familiar model while taking advantage of the greater scalability of COWs.

本文提出的DSM系统作为RHODOS的内存(空间)管理器的一个组件，完全在操作系统级别实现。此外，它与RHODOS现有的基于失效的DSM集成，允许程序员选择最适合其应用程序的一致性协议。这些因素使RHODOS DSM能够为用户提供透明、高效和可扩展的共享内存编程环境。在本文中，我们描述了一个严格遵循上述标准的基于更新的需求侧管理的逻辑设计、实现和性能研究。这些标准允许用户使用熟悉的模型进行编程，同时利用奶牛更大的可伸缩性。

引用次数: 14

Modeling and evaluation of a new cluster-based system for commercial applications 基于商业应用的新型集群系统建模与评估

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651489

W. Hahn, Suk-Han Yoon, Kangwoo Lee, M. Dubois

We model and evaluate a new parallel processing system for commercial applications, so called SPAX. SPAX cost-effectively overcomes the SMP limitation by providing both scalability of the parallel processing system and application portability of the SMP. To investigate whether the new architecture satisfies the requirements of commercial applications, such as OLTP, we have built the system and workload model. The results of the simulation show that the IO subsystem becomes the bottleneck before the newly developed system network. We find that SPAX can still meet the IO requirement of the OLTP workload as its network and IO node support the flexible IO subsystem, in terms of the number of disk drives and IO nodes versus that of processing nodes.

我们建模并评估了一个新的用于商业应用的并行处理系统，称为SPAX。SPAX通过提供并行处理系统的可伸缩性和SMP的应用程序可移植性，经济有效地克服了SMP的限制。为了研究新的体系结构是否满足商业应用程序(如OLTP)的需求，我们构建了系统和工作负载模型。仿真结果表明，在新开发的系统网络之前，IO子系统成为瓶颈。我们发现，就磁盘驱动器和IO节点的数量与处理节点的数量而言，SPAX仍然可以满足OLTP工作负载的IO需求，因为它的网络和IO节点支持灵活的IO子系统。

引用次数: 0

Subtorii allocation strategies for torus connected networks 环面连通网络的子域分配策略

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651498

S. Gupta, P. Srimani

In this paper we investigate the problem of how to schedule n independent jobs on an m/spl times/m torus based network. We develop a model to quantify the effect of contention for communication links on the dilation of job execution time when multiple jobs share communication links.

本文研究了如何在m/spl次/m环面网络上调度n个独立作业的问题。我们开发了一个模型来量化当多个作业共享通信链路时，通信链路争用对作业执行时间扩张的影响。

引用次数: 4

Generating efficient parallel code for successive over-relaxation 生成连续过松弛的高效并行代码

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651517

P. Tang

A complete suite of algorithms for parallelizing compilers to generate efficient SPMD code for SOR problems is presented. By applying unimodular transformation before loop tiling and parallelization, the number of messages per iteration per processor is reduced from 3/sup n/-1 in the conventional parallel SOR algorithm to 2/sup n/-1, where n is the dimensionality of the data set. To maintain the memory-scalability, a novel approach to use the local dynamic memory of parallel processors to implement the skewed data set is proposed.

提出了一套完整的并行编译器算法，以生成有效的SPMD代码。通过在循环平铺和并行化之前应用单模变换，每个处理器每次迭代的消息数量从传统并行SOR算法中的3/sup n/-1减少到2/sup n/-1，其中n是数据集的维数。为了保持内存的可扩展性，提出了一种利用并行处理器的局部动态内存来实现倾斜数据集的新方法。

引用次数: 0

ATME: a parallel programming environment for applications with conditional task attributes ATME:用于具有条件任务属性的应用程序的并行编程环境

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651497

Lin Huang, M. Oudshoorn

Parallel applications with inconstant usage patterns presents a big challenge to programmers in that the spawning of tasks and the communication between them may be conditional (named "conditional parallel programming"). Ideally, the programmer should not be burdened by operational issues which have little relationship to the application itself. This paper proposes a new parallel programming environment, ATME, to automate task scheduling in conditional parallel programming. By adaptively producing accurate estimates of the task model prior to execution, ATME modifies task distribution to improve the system and application performance.

具有非恒定使用模式的并行应用程序对程序员提出了很大的挑战，因为任务的生成和它们之间的通信可能是有条件的(称为“条件并行编程”)。理想情况下，程序员不应该被与应用程序本身关系不大的操作问题所困扰。本文提出了一种新的并行编程环境ATME，用于条件并行编程中的任务自动调度。通过在执行之前自适应地产生任务模型的准确估计，ATME修改任务分配以提高系统和应用程序的性能。

引用次数: 4

Parallel neural network training on Multi-Spert 基于Multi-Spert的并行神经网络训练

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651531

P. Farber, K. Asanović

Multi-Spert is a scalable parallel system built from multiple Spert-II nodes which we have constructed to speed error backpropagation neural network training for speech recognition research. We present the Multi-Spert hardware and software architecture, and describe our implementation of two alternative parallelization strategies for the backprop algorithm. We have developed detailed analytic models of the two strategies which allow us to predict performance over a range of network and machine parameters. The models' predictions are validated by measurements for a prototype five node Multi-Spert system. This prototype achieves a neural network training performance of over 530 million connection updates per second (MCUPS) while training a realistic speech application neural network. The model predicts that performance will scale to over 800 MCUPS for eight nodes.

Multi-Spert是由多个Spert-II节点构建而成的可扩展并行系统，旨在加快语音识别研究中错误反向传播神经网络的训练速度。我们提出了Multi-Spert硬件和软件架构，并描述了我们对backprop算法的两种可选并行化策略的实现。我们已经开发了两种策略的详细分析模型，使我们能够预测网络和机器参数范围内的性能。模型的预测通过一个原型5节点Multi-Spert系统的测量得到了验证。该原型在训练真实语音应用神经网络的同时，实现了每秒超过5.3亿次连接更新(MCUPS)的神经网络训练性能。该模型预测，8个节点的性能将扩展到800 MCUPS以上。

引用次数: 35

Parallel implementation of synthetic aperture radar on high performance computing platforms 合成孔径雷达在高性能计算平台上的并行实现

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651522

Jinwoo Suh, M. Ung, Viktor K. Prasanna

We show a high throughput implementation of SAR on high performance computing (HPC) platforms. In our implementation, the processors are divided into two groups of size M and N. The first group consisting of M processors computes the FDC (frequency domain convolution) in range dimension, and the second group of N processors computes the FDC in azimuth dimension. M and N are determined by the computational requirements of FDC in range and azimuth dimensions respectively. The key contribution of this paper is the development of a general high-throughput M-to-N communication algorithm. The M-to-N communication algorithm is a basic communication primitive used in many signal processing applications when a software task pipeline is employed to obtain high throughput performance. Our algorithm reduces the number of communication steps to 1g(N/M+1)+n(k-1), where k/spl ges/2 and n=[1g/sub k/ M]. Implementation results on the IBM SP2 and the Cray T3D based on the MITRE real-time benchmarks are presented. The results show that, given an image of size 1K/spl times/1K, the minimum number of processors required for processing the SAR benchmarks can be reduced by 50% by using the proposed communication algorithm.

我们展示了SAR在高性能计算(HPC)平台上的高吞吐量实现。在我们的实现中，处理器被分为大小为M和N的两组，第一组由M个处理器组成，在范围维度上计算频域卷积(FDC)，第二组由N个处理器组成，在方位维度上计算FDC。M和N分别由FDC在距离和方位角尺寸上的计算要求决定。本文的主要贡献是开发了一种通用的高吞吐量M-to-N通信算法。M-to-N通信算法是许多信号处理应用中使用的一种基本通信原语，用于软件任务流水线以获得高吞吐量性能。我们的算法将通信步数减少到1g(N/M+1)+ N (k-1)，其中k/spl ges/2和N =[1g/sub k/ M]。给出了基于MITRE实时基准测试在IBM SP2和Cray T3D上的实现结果。结果表明，在给定大小为1K/spl次/1K的图像时，使用所提出的通信算法可以将处理SAR基准所需的最小处理器数量减少50%。

{"title":"Parallel implementation of synthetic aperture radar on high performance computing platforms","authors":"Jinwoo Suh, M. Ung, Viktor K. Prasanna","doi":"10.1109/ICAPP.1997.651522","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651522","url":null,"abstract":"We show a high throughput implementation of SAR on high performance computing (HPC) platforms. In our implementation, the processors are divided into two groups of size M and N. The first group consisting of M processors computes the FDC (frequency domain convolution) in range dimension, and the second group of N processors computes the FDC in azimuth dimension. M and N are determined by the computational requirements of FDC in range and azimuth dimensions respectively. The key contribution of this paper is the development of a general high-throughput M-to-N communication algorithm. The M-to-N communication algorithm is a basic communication primitive used in many signal processing applications when a software task pipeline is employed to obtain high throughput performance. Our algorithm reduces the number of communication steps to 1g(N/M+1)+n(k-1), where k/spl ges/2 and n=[1g/sub k/ M]. Implementation results on the IBM SP2 and the Cray T3D based on the MITRE real-time benchmarks are presented. The results show that, given an image of size 1K/spl times/1K, the minimum number of processors required for processing the SAR benchmarks can be reduced by 50% by using the proposed communication algorithm.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123755019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

An efficient local address generation for the block-cyclic distribution 一个有效的本地地址生成块循环分布

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651507

Oh-Young Kwon, Tae-Geun Kim, T. Han, Sung-Bong Yang, Shin-Dug Kim

In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, an efficient compiling method is required. In this paper, two local address generation methods for the block-cyclic distribution are presented. One is a simple local address generation method that is modified from the virtual-block scheme. The other is a linear-time /spl Delta/M table construction method. The array elements of A(l:h:s) to be accessed at run-time build up a family of lines. By using the equation of the lines, a /spl Delta/M table can be generated in O(k) time. Experimental results show that a simple local address generation method has poor performance but a linear-time /spl Delta/M table generation method is faster than other algorithms in /spl Delta/M table generation time and access time for 10,000 array elements.

为了生成具有块循环分布的数组段A(l:h:s)的本地地址，需要一种高效的编译方法。本文提出了两种块循环分布的本地地址生成方法。一种是基于虚拟块方案修改的简单本地地址生成方法。另一种是线性时间/spl Delta/M表构造方法。要在运行时访问的A(l:h:s)的数组元素构建了一行。利用直线方程，可以在O(k)时间内生成一个/spl Delta/M表。实验结果表明，简单的本地地址生成方法性能较差，但线性时间/spl Delta/M表生成方法在/spl Delta/M表生成时间和10000个数组元素的访问时间上都比其他算法快。

引用次数: 1

Adaptive routing for a bus-based multiprocessor 基于总线的多处理器的自适应路由

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651478

V. Fazio

This paper describes and compares an implementation of an unusual hot-spot-resistant adaptive routing architecture. This paper evaluates the performance of the architecture.

本文描述并比较了一种特殊的抗热点自适应路由体系结构的实现。本文对该体系结构的性能进行了评估。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀