Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing最新文献

英文中文

A fast parallel sorting algorithm on the k-dimensional reconfigurable mesh 基于k维可重构网格的快速并行排序算法

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651519

Ju-wook Jang, Kichul Kim

We presents a new parallel sorting algorithm on the k-dimensional reconfigurable mesh which is a generalized version of the well-studied (two dimensional) reconfigurable mesh. We introduce a new mapping technique which combines the enlarged bandwidth of the multidimensional mesh and the feature of the reconfigurable mesh. Using our mapping technique, we show that N/sup k/ numbers can be sorted in O(4/sup k/) (constant time for small k) time on a k+1 dimensional reconfigurable mesh of size k+1 times N/spl times/N/spl times/.../spl times/N. In addition, it is shown that the number of 1's in a 0/1 array of k times size N/spl times/N/spl times/.../spl times/N can be computed in O(log* N+log k) time on reconfigurable k times mesh of size N/spl times/N/spl times/.../spl times/N.

本文提出了一种新的基于k维可重构网格的并行排序算法，该算法是已有研究的二维可重构网格的推广版本。本文提出了一种新的映射技术，它结合了多维网格的宽频带和可重构网格的特点。使用我们的映射技术，我们证明了N/sup k/个数字可以在k+1次N/spl次/N/spl次/…的k+1维可重构网格上在O(4/sup k/)(小k的常数时间)时间内排序。/ spl倍/ N。此外，在大小为k倍的0/1数组中，1的个数为N/spl倍/N/spl倍/…/spl times/N可以在O(log* N+log k)时间内计算，可重构的k次网格大小为N/spl times/N/spl times/…/ spl倍/ N。

引用次数: 2

Parallelization of the H.261 video coding algorithm on the IBM SP2(R) multiprocessor system H.261视频编码算法在IBM SP2(R)多处理器系统上的并行化

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651523

N. Yung, K. Leung

In this paper, the parallelization of the H.261 video coding algorithm on the IBM SP2 multiprocessor system is described. Based on domain decomposition as a framework, data partitioning, data dependencies and communication issues are carefully assessed. From these, two parallel algorithms were developed. The first one maximizes processor utilization and the second one minimizes communications. Our analysis shows that the first algorithm exhibits poor scalability and high communication overhead; and the second algorithm exhibits good scalability and low communication overhead. A best median speed up of 13.72 or 11 frames/sec was achieved on 24 processors.

本文描述了H.261视频编码算法在IBM SP2多处理器系统上的并行化。以领域分解为框架，仔细评估数据分区、数据依赖和通信问题。在此基础上，提出了两种并行算法。第一个最大限度地利用处理器，第二个最大限度地减少通信。我们的分析表明，第一种算法具有较差的可扩展性和较高的通信开销;第二种算法具有良好的可扩展性和较低的通信开销。在24个处理器上实现了13.72或11帧/秒的最佳中位数速度提升。

引用次数: 9

Parallelization of IP-packet filter rules ip包过滤规则的并行化

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651506

Takeshi Miei, M. Maruyama, T. Ogura, N. Takahashi

A compiler for parallelizing IP-packet filter rules is presented which will improve network security and reduce packet-forwarding performance degradation. It analyzes the interdependence of packet-filtering rules specified by a network administrator and translates them into an intermediate program whose instructions can be executed in parallel. Three types of compiler operations are introduced: division is used to divide the rules into parallel expressions, simplification is used to simplify redundant rules, deletion is used to delete infeasible rules.

提出了一种并行处理ip包过滤规则的编译器，提高了网络的安全性，减少了包转发性能的下降。它分析网络管理员指定的包过滤规则之间的相互依赖关系，并将其转换为可并行执行指令的中间程序。介绍了三种类型的编译操作:除法用于将规则划分为并行表达式，简化用于简化冗余规则，删除用于删除不可行的规则。

引用次数: 3

Efficient run-time scheduling for parallelizing partially parallel loops 用于并行化部分并行循环的高效运行时调度

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651508

Tsung-Chuan Huang, Po-Hsueh Hsu, Tze-Nan Sheng

We propose an efficient run-time technique to find an optimal parallel execution schedule for partially parallel loops in which synchronizations between iterations are needed to ensure correct program semantics. For efficiency, we combine conventional mark phase and scheduler phase into a single parallel scheduler. The scheduler divides the loop iterations into several chunks then executes the iterations in one chunk in parallel. Our scheme not only runs fast but also achieves an optimal schedule. In addition, an atomic bit-vector operation is introduced to avoid global synchronization overhead and ensure the larger wavefront number is kept when the wavefront number of an iteration will be concurrently updated during scheduling.

我们提出了一种有效的运行时技术来为部分并行循环找到最佳并行执行计划，其中迭代之间需要同步以确保正确的程序语义。为了提高效率，我们将传统的标记阶段和调度阶段合并为一个并行调度程序。调度器将循环迭代划分为几个块，然后并行地在一个块中执行迭代。该方案不仅运行速度快，而且实现了最优调度。此外，还引入了原子位向量操作，避免了全局同步开销，并确保在调度过程中迭代的波前数同时更新时保持较大的波前数。

引用次数: 5

Multiple dependent queries execution using critical path scheduling in parallel databases 在并行数据库中使用关键路径调度执行多个依赖查询

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651534

K.H. Liu, C. Leung, Y. Jiang

Multiple processors are employed to improve the performance of database systems and the parallelism can be exploited at three levels in query processing: intra-operation, inter-operation, and inter-query parallelism. Intra-operation and inter-operation parallelism are also called intra-query parallelism which has been studied extensively. In contrast, inter-query parallelism has received little attention particularly for multiple dependent queries. We develop a decompression algorithm, CPS, for coping with multiple dependent queries which are represented by a directed graph, and the algorithm makes use of the activity analysis of critical path analysis, and the resource scheduling and levelling of project management. A simulation study has been conducted and the results show that the proposed algorithm outperforms other existing methods and is able to provide a global optimal solution when the number of processors available is sufficient.

使用多处理器来提高数据库系统的性能，并且在查询处理中可以在三个层次上利用并行性:操作内并行性、操作间并行性和查询间并行性。操作内并行和操作间并行又称为查询内并行，已被广泛研究。相比之下，查询间并行性很少受到关注，特别是对于多个依赖查询。我们开发了一种解压算法CPS，用于处理由有向图表示的多个依赖查询，该算法利用了关键路径分析的活动分析和项目管理的资源调度和水平。仿真研究结果表明，该算法在处理器数量足够的情况下，能够提供全局最优解，优于现有的算法。

引用次数: 0

A fibre channel-based architecture for Internet multimedia server clusters 基于光纤通道的因特网多媒体服务器集群体系结构

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651512

Shenze Chen, M. Thapar

In this paper, we present a cluster architecture for Internet multimedia servers, which uses the Fibre Channel (FC) technology to overcome some of the shortcomings of existing architectures. We also explore the design issues of an FC-based multimedia server cluster. A significant advantage of the FC-based cluster is that it allows physical storage attachment to the interconnect. Because of this feature, FC-based clusters will change the fundamental data-sharing paradigm of existing clusters by eliminating remote data accesses in a cluster. Many aspects of this architecture are critical to real-time multimedia applications, such as audio and video services.

本文提出了一种基于光纤通道(FC)技术的多媒体服务器集群架构，克服了现有架构的一些不足。我们还探讨了基于fc的多媒体服务器集群的设计问题。基于fc的集群的一个显著优势是，它允许物理存储连接到互连。由于这个特性，基于fc的集群将通过消除集群中的远程数据访问来改变现有集群的基本数据共享范式。该体系结构的许多方面对实时多媒体应用程序(如音频和视频服务)至关重要。

引用次数: 12

Artificial neural architecture for real time modelling applications 用于实时建模应用的人工神经架构

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651529

E. Petriu, A. Guergachi, G. Patry, L. Zhao, D. Petriu, G. Vukovich

This paper presents the random-pulse machine concept and shows how it can be used for the modular design of artificial neural networks. Random-pulse machines deal with analog variables represented by the mean rate of random-pulse streams and use simple digital technology to perform arithmetic and logic operations. As an application example, a NN is proposed for modeling of the activated sludge wastewater treatment plants.

本文提出了随机脉冲机的概念，并说明了如何将其用于人工神经网络的模块化设计。随机脉冲机处理由随机脉冲流的平均速率表示的模拟变量，并使用简单的数字技术来执行算术和逻辑运算。作为应用实例，提出了一种神经网络对活性污泥污水处理厂进行建模的方法。

引用次数: 0

HiPAR-DSP: a parallel VLIW RISC processor for real time image processing applications HiPAR-DSP:用于实时图像处理应用的并行VLIW RISC处理器

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651487

J. Wittenburg, M. Ohmacht, J. Kneip, W. Hinrichs, P. Pirsch

Derived from a thorough analysis of a wide class of image processing algorithms' properties, a parallel RISC architecture has been developed. The architecture gains performance from data level parallelism as well as from instruction level parallelism. From the beginning of the concept phase, high-level programming capabilities have been one of the major design goals. Thus, there has been a steady interaction between the design of the software development toolkit-optimizing assembler and C++ compiler-and the architecture itself. The RISC-typical register files are one of the most critical elements as well concerning die size and clock frequency as the assembler's ability in VLIW scheduling. Running at 100 MHz (200 mm/sup 2/, 0.35 /spl mu/m CMOS) the processor reaches a sustained performance of more than 2 GOPS for a wide range of image processing algorithms.

基于对多种图像处理算法特性的深入分析，开发了一种并行RISC架构。该体系结构从数据级并行性和指令级并行性中获得性能。从概念阶段开始，高级编程能力一直是主要设计目标之一。因此，在软件开发工具包的设计(优化汇编器和c++编译器)和体系结构本身之间存在着稳定的交互作用。risc典型的寄存器文件是影响芯片尺寸和时钟频率以及汇编器在VLIW调度中的能力的最关键因素之一。该处理器在100 MHz (200 mm/sup /， 0.35 /spl mu/m CMOS)下运行，可为各种图像处理算法提供超过2 GOPS的持续性能。

引用次数: 12

Shadow Stacks-a hardware-supported DSM for objects of any granularity 影子堆栈——硬件支持的用于任何粒度对象的DSM

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651493

S. Groh, M. Pizka, J. Rudolph

This paper presents a new Distributed Shared Memory (DSM) management concept that is integrated into a scalable distributed virtual memory management technique and circumvents false sharing while still preserving simplicity to the application level. Objects defined as usual by variables in the declaration part of functions are made sharable among threads executing in the distributed environment. These objects of varying granularity and with different consistency requirements are managed separately to avoid false sharing. Consistency is enforced at runtime by a distributed manager-agent architecture, that supports automatic and dynamic selection of an adequate coherence protocol per object. To provide efficiency, the implementation of the Shadow Stacks concept is based on the exploitation of the page fault mechanism provided by of the shelf hardware.

本文提出了一种新的分布式共享内存(DSM)管理概念，该概念集成到可扩展的分布式虚拟内存管理技术中，避免了错误共享，同时仍然保持了应用程序级别的简单性。通常由函数声明部分中的变量定义的对象可以在分布式环境中执行的线程之间共享。这些粒度不同、一致性要求不同的对象被分开管理，以避免错误共享。一致性由分布式管理器-代理体系结构在运行时强制执行，该体系结构支持为每个对象自动和动态地选择适当的一致性协议。为了提高效率，影子堆栈概念的实现是基于利用由架子硬件提供的页面错误机制。

引用次数: 9

Virtual parallel processors 虚拟并行处理器

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

Pub Date : 1997-12-10 DOI: 10.1109/ICAPP.1997.651485

C. Dick, F. Harris

The introduction of SRAM-based field programmable gate arrays (FPGAs) has opened-up a new dimension to parallel computing architectures. This paper describes an alternative approach to parallel computing-reconfigurable or virtual parallel processing (VPP). Rather than mapping an application onto a given parallel machine, the VPP approach synthesizes the appropriate type and number of processing elements, as well as the interconnection topology, that is optimal for the application. For each application, configuration data is downloaded to the machine that personalizes the hardware for the task at hand. The paper provides a brief description of the authors reconfigurable computer, Archimedes. The benefits of the VPP approach are highlighted by an example application-the 2-D FFT. A novel parallel implementation of a polynomial transform based 2-D transform is described and compared to results for distributed memory parallel machines that have been reported in the literature. The comparison highlights the computational advantage provided by reconfigurable computing.

基于sram的现场可编程门阵列(fpga)的引入为并行计算架构开辟了一个新的维度。本文描述了并行计算的另一种方法——可重构并行处理或虚拟并行处理(VPP)。VPP方法不是将应用程序映射到给定的并行机器上，而是综合了对应用程序最优的适当类型和数量的处理元素以及互连拓扑。对于每个应用程序，配置数据被下载到为手头的任务定制硬件的机器上。本文简要介绍了作者的可重构计算机“阿基米德”。VPP方法的优点通过一个示例应用-二维FFT来突出。描述了一种基于多项式变换的二维变换的新型并行实现，并与文献中报道的分布式存储并行机的结果进行了比较。这种比较突出了可重构计算提供的计算优势。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀