Microprocessing and Microprogramming最新文献

英文中文

On the routing of signals in parallel processor meshes 并行处理器网格中信号路由的研究

Microprocessing and Microprogramming

Pub Date : 1995-05-01 DOI: 10.1016/0165-6074(94)00090-W

Constantinos V. Papadopoulos

Wormhole message routing is supported by the communication hardware of several distributed memory machines. This particular method of message routing has numerous advantages but creates the problem of a routing deadlock. When long messages compete for the same channels in the network, some messages will be blocked until the first message is fully consumed by the processor at the destination of the message. A deadlock occurs if a set of messages mutually blocks, and no message can progress towards its destination. Most deadlock free routing schemes previously known are designed to work on regular binary hypercubes, a very special case of multicomputer interconnection networks. However, these routing schemes do not provide enough flexibility to deal with the irregular 2-D-tori and attached auxiliary cells found on many newer parallel systems.

To handle irregular topologies elegantly, a simple proof is necessary to verify the router code. The new proof given in this report is carried out directly on the network graph. It is constructive in the sense that it reveals the design options to deal with irregularities and shows how additional flexibility can be used to achieve better load balancing.

Based on the modified routing model, a set of deadlock free router functions relevant to the iWarp system configurations are described and proven to be correct.

虫洞消息路由由多台分布式内存机的通信硬件支持。这种特定的消息路由方法有许多优点，但会产生路由死锁的问题。当长消息在网络中竞争相同的通道时，一些消息将被阻塞，直到第一条消息被消息目的地的处理器完全消耗。如果一组消息相互阻塞，并且没有消息可以向其目的地前进，则会发生死锁。以前已知的大多数无死锁路由方案都设计用于正则二进制超立方体，这是多计算机互连网络的一种非常特殊的情况。然而，这些路由方案不能提供足够的灵活性来处理在许多新的并行系统中发现的不规则的二维环面和附加的辅助单元。为了优雅地处理不规则拓扑，需要一个简单的证明来验证路由器代码。本文给出的新证明是直接在网络图上进行的。它是建设性的，因为它揭示了处理不规则性的设计选项，并展示了如何使用额外的灵活性来实现更好的负载平衡。基于改进的路由模型，描述了一组与iWarp系统配置相关的无死锁路由函数，并证明了其正确性。

{"title":"On the routing of signals in parallel processor meshes","authors":"Constantinos V. Papadopoulos","doi":"10.1016/0165-6074(94)00090-W","DOIUrl":"10.1016/0165-6074(94)00090-W","url":null,"abstract":"<div>Wormhole message routing is supported by the communication hardware of several distributed memory machines. This particular method of message routing has numerous advantages but creates the problem of a routing deadlock. When long messages compete for the same channels in the network, some messages will be blocked until the first message is fully consumed by the processor at the destination of the message. A deadlock occurs if a set of messages mutually blocks, and no message can progress towards its destination. Most deadlock free routing schemes previously known are designed to work on regular binary hypercubes, a very special case of multicomputer interconnection networks. However, these routing schemes do not provide enough flexibility to deal with the irregular 2-D-tori and attached auxiliary cells found on many newer parallel systems.To handle irregular topologies elegantly, a simple proof is necessary to verify the router code. The new proof given in this report is carried out directly on the network graph. It is constructive in the sense that it reveals the design options to deal with irregularities and shows how additional flexibility can be used to achieve better load balancing.Based on the modified routing model, a set of deadlock free router functions relevant to the iWarp system configurations are described and proven to be correct.</div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 2","pages":"Pages 171-189"},"PeriodicalIF":0.0,"publicationDate":"1995-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00090-W","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129168612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Heuristic and neural algorithms for mapping tasks to a reconfigurable array 将任务映射到可重构数组的启发式和神经算法

Microprocessing and Microprogramming

Pub Date : 1995-05-01 DOI: 10.1016/0165-6074(95)00007-B

C.P. Ravikumar , Naresh Vedi

We consider the problem of mapping tasks onto processors in a reconfigurable array architecture. We assume a directed acyclic task graph as input. The node weights in the task graph represent their computational requirement; the weight on an edge (i, j) is an estimate of the communication requirement between tasks i and j. The problem is to (a) estimate the minimum number of processors p to execute all the tasks with the highest possible efficiency, (b) bind each task to a processor, (c) schedule the tasks within each processor, and (d) carry out link allocation among processors. We assume a realistic model of reconfigurable parallel processors, where each processor can be connected to at most d other processors through bidirectional links. The objective of the problem is to minimize the total overall execution time, which includes the time spent by the processors in computation, communication, and idling. The mapping problem is computationally hard, and we present two algorithms for obtaining near-optimal solutions. The first algorithm is a heuristic algorithm based on the critical path method and as soon as possible scheduling. The second algorithm uses the Boltzmann machine model of artificial neural networks to solve the mapping problem. We have implemented both the algorithms on a Sun/SPARC workstation. Experimental results on a set of benchmark problems indicate that the neural algorithm generates better solutions than the heuristic algorithm, but takes significantly larger amounts of time than the latter. The number of neurons required in the algorithm is equal to n.p and hence the connection matrix is np × np; thus the neural algorithm is also memory intensive and I/O intensive due to swapping. We have devised a parallel divide-and-conquer algorithm which decomposes a large mapping problem into several smaller ones and solves the subproblems concurrently on a network of Sun workstations.

我们考虑了在可重构阵列架构中将任务映射到处理器上的问题。我们假设一个有向无循环任务图作为输入。任务图中的节点权重表示其计算需求;边(i, j)上的权值是对任务i和任务j之间通信需求的估计。问题是(a)估计以最高效率执行所有任务的最小处理器数量p， (b)将每个任务绑定到一个处理器，(c)调度每个处理器内的任务，以及(d)在处理器之间进行链路分配。我们假设一个可重构并行处理器的现实模型，其中每个处理器可以通过双向链路连接到最多d个其他处理器。该问题的目标是最小化总执行时间，其中包括处理器在计算、通信和空闲上花费的时间。映射问题在计算上是困难的，我们提出了两种算法来获得近最优解。第一种算法是基于关键路径法和尽快调度的启发式算法。第二种算法利用人工神经网络的玻尔兹曼机模型来解决映射问题。我们已经在Sun/SPARC工作站上实现了这两种算法。在一组基准问题上的实验结果表明，神经网络算法比启发式算法产生更好的解，但所花费的时间明显大于启发式算法。算法中需要的神经元数等于n.p，因此连接矩阵为np × np;因此，由于交换，神经算法也是内存密集型和I/O密集型的。我们设计了一种并行分治算法，该算法将一个大的映射问题分解成几个较小的映射问题，并在Sun工作站网络上并发地解决子问题。

{"title":"Heuristic and neural algorithms for mapping tasks to a reconfigurable array","authors":"C.P. Ravikumar , Naresh Vedi","doi":"10.1016/0165-6074(95)00007-B","DOIUrl":"10.1016/0165-6074(95)00007-B","url":null,"abstract":"<div>We consider the problem of mapping tasks onto processors in a reconfigurable array architecture. We assume a directed acyclic task graph as input. The node weights in the task graph represent their computational requirement; the weight on an edge (i, j) is an estimate of the communication requirement between tasks i and j. The problem is to (a) estimate the minimum number of processors p to execute all the tasks with the highest possible efficiency, (b) bind each task to a processor, (c) schedule the tasks within each processor, and (d) carry out link allocation among processors. We assume a realistic model of reconfigurable parallel processors, where each processor can be connected to at most d other processors through bidirectional links. The objective of the problem is to minimize the total overall execution time, which includes the time spent by the processors in computation, communication, and idling. The mapping problem is computationally hard, and we present two algorithms for obtaining near-optimal solutions. The first algorithm is a heuristic algorithm based on the critical path method and as soon as possible scheduling. The second algorithm uses the Boltzmann machine model of artificial neural networks to solve the mapping problem. We have implemented both the algorithms on a Sun/SPARC workstation. Experimental results on a set of benchmark problems indicate that the neural algorithm generates better solutions than the heuristic algorithm, but takes significantly larger amounts of time than the latter. The number of neurons required in the algorithm is equal to n.p and hence the connection matrix is np × np; thus the neural algorithm is also memory intensive and I/O intensive due to swapping. We have devised a parallel divide-and-conquer algorithm which decomposes a large mapping problem into several smaller ones and solves the subproblems concurrently on a network of Sun workstations.</div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 2","pages":"Pages 137-151"},"PeriodicalIF":0.0,"publicationDate":"1995-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00007-B","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121795005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A new approach to schedule operations across nested-ifs and nested-loops 跨嵌套if和嵌套循环调度操作的新方法

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(94)00024-5

Shih-Hsu Huang , Cheng-Tsung Hwang , Yu-Chin Hsu , Yen-Jen Oyang

This paper presents a new global scheduling algorithm for automatic synthesis of the control blocks of special-purpose microprocessors. The main distinction of the proposed algorithm is that it exploits the inheritances of structured programs. The optimization goal is to maximize the speedup of the processor and minimize the size of the control block. If compared with existing global scheduling algorithms such as Trace scheduling, Tree compaction, and Percolation scheduling, the proposed algorithm consistently achieves better results in terms of the speedup of the processor and the size of the control block.

针对专用微处理器控制块的自动合成，提出了一种新的全局调度算法。该算法的主要区别在于它利用了结构化程序的继承。优化的目标是使处理器的加速最大化，并使控制块的大小最小化。与现有的Trace调度、Tree compaction和Percolation调度等全局调度算法相比，本文算法在处理器的加速速度和控制块的大小方面都取得了更好的结果。

引用次数: 3

Letter from the editors-in-chief 总编的来信

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(95)90006-3

Mariagiovanna Sami (Editor-in-Chief), Lutz Richter (Editor-in-Chief)

引用次数: 0

A parallel SVD algorithm and its application to financial ratio analysis 并行奇异值分解算法及其在财务比率分析中的应用

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(94)00025-6

Robert Manger , Mladen Grbić , Vito Leornardo Plantamura , Branko Souček

In this paper we describe a parallel version of the Hestenes algorithm for computing the singular value decomposition. We also describe a corresponding implementation on a transputer network. The implementation has been used to accelerate some programs for financial ratio analysis. Empirical results regarding the efficiency of our implementation are also presented.

本文描述了计算奇异值分解的Hestenes算法的一个并行版本。我们还描述了在转发器网络上的相应实现。该实现已用于加速一些财务比率分析程序。本文还提出了有关实施效率的实证结果。

引用次数: 3

A survey of languages integrating functional, object-oriented and logic programming 集函数式、面向对象和逻辑程序设计于一体的语言综述

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(94)00017-5

K.W. Ng , C.K. Luk

Functional, object-oriented and logic programming are widely regarded as the three most dominant programming paradigms nowadays. For the past decade, many attempts have been made to integrate these three paradigms into a single language. This paper is a survey of this new breed of multiparadigm languages. First we give a succinct introduction to the three paradigms. Then we discuss a variety of approaches to the integration of the three paradigms through an overview of some of the existing multiparadigm languages. All possible combinations of the three paradigms, namely logic + object-oriented, functional + logic, functional + object-oriented, and object-oriented + logic + functional, are considered separately. For the purpose of classification, we have proposed a design space of programming languages called the FOOL-space.

函数式编程、面向对象编程和逻辑编程被广泛认为是当今三种最主要的编程范式。在过去的十年中，许多人尝试将这三种范式集成到一种语言中。本文是对这一新型多范式语言的综述。首先，我们简要介绍了这三种范式。然后，我们通过对一些现有多范式语言的概述，讨论了集成三种范式的各种方法。这三种范式的所有可能组合，即逻辑+面向对象、功能+逻辑、功能+面向对象和面向对象+逻辑+功能，都被单独考虑。为了分类的目的，我们提出了一个编程语言的设计空间，称为愚空间。

引用次数: 9

Calendar of forthcoming conference and events 即将召开的会议和活动日历

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(95)90007-1

引用次数: 0

A genetic algorithm-based circuit partitioner for MCMs 一种基于遗传算法的mcm电路分割器

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(94)00089-S

Ananta K. Majhi , L.M. Patnaik , Srilata Raman

Multichip Modules (MCMs) is a packaging technology gaining importance, because it reduces the interconnect delays across chips, by bringing the interconnect delays closer in magnitude to the on-chip delays. The problem here is to partition a circuit across multiple chips, producing MCMs. Partitioning is a combinatorial optimization problem. One of the methods to solve the problem is by the use of Genetic Algorithms (GAs), which are based on genetics. GAs can be used to solve both combinatorial as well as functional optimization problems. This paper solves the problem of partitioning using the GA approach. The performance of GAs is compared with that of Simulated Annealing (SA), by executing the algorithms on three benchmark circuits. The effect of varying the parameters of the algorithm on the performance of GAs is studied.

多芯片模块(mcm)是一种越来越重要的封装技术，因为它通过使互连延迟在量级上接近片上延迟来减少芯片之间的互连延迟。这里的问题是在多个芯片上划分电路，产生mcm。分区是一个组合优化问题。解决这一问题的方法之一是使用基于遗传学的遗传算法(GAs)。GAs既可用于解决组合优化问题，也可用于解决函数优化问题。本文采用遗传算法解决了分区问题。通过在三个基准电路上执行算法，比较了模拟退火算法与模拟退火算法的性能。研究了算法参数的变化对算法性能的影响。

引用次数: 10

Performance evaluation in image processing with GAPP array processor GAPP阵列处理器在图像处理中的性能评价

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(94)00088-R

Philippe O.A. Navaux , César A.F. De Rose , Gerson G.H. Cavalheiro

The application of parallelism considering the field of image processing is an alternative to implement real time image processing. The data parallelism found in array processors simplify the mapping of this kind of problem as each processing element works on part of the image. The GAPP board (Geometric Arithmetic Parallel Processor) is a near-neighbor mesh architecture with 144 processors interconnected as a 12 × 12 bidimensional array. This work analyzes the performance of the GAPP board regarding image processing. The implementation of two image convolution algorithms and the analyses of the obtained results, as well as the utilization of the GAPP board in this kind of application and the performance achieved are discussed. The results found in this work are also compared to the results presented in [11]. Some ways to achieve a better performance of the GAPP array in this kind of application are also presented.

并行性在图像处理领域的应用是实现实时图像处理的一种选择。数组处理器中的数据并行性简化了这类问题的映射，因为每个处理元素都在图像的一部分上工作。GAPP板(几何算术并行处理器)是一个由144个处理器互连成一个12 × 12的二维阵列的近邻网格架构。本文分析了GAPP板在图像处理方面的性能。讨论了两种图像卷积算法的实现和结果分析，以及GAPP板在这类应用中的应用和所取得的性能。本研究的结果也与[11]中提出的结果进行了比较。本文还提出了在这种应用中提高GAPP阵列性能的一些方法。

引用次数: 5

Stable memory for a disk write cache 磁盘写缓存的稳定内存

Microprocessing and Microprogramming

Pub Date : 1995-04-01 DOI: 10.1016/0165-6074(95)90627-O

B.A. Coghlan, J.O. Jones

Lack of I/O performance is fast becoming a limiting factor in many computing systems. The yearly doubling of CPU speeds is not being matched by corresponding gains in I/O performance. This paper explores one aspect of the architecture of a high performance fault-tolerant cached RAID subsystem for a multiprocessor. The disk write cache is implemented as a memory-mapped stable memory. The features of a VRAM-based stable memory and its associated RAID controller are discussed.

缺乏I/O性能正迅速成为许多计算系统中的一个限制因素。CPU速度每年翻一番的同时，I/O性能却没有相应的提高。本文探讨了多处理器高性能容错缓存RAID子系统体系结构的一个方面。磁盘写缓存实现为内存映射的稳定内存。讨论了基于vram的稳定存储器及其配套的RAID控制器的特点。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Microprocessing and Microprogramming

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀