Microprocessing and Microprogramming最新文献

英文中文

Algorithm-architecture co-design by example: a coprocessor for on-line arithmetic 算法体系结构协同设计实例:在线算法协处理器

Microprocessing and Microprogramming

Pub Date : 1995-10-01 DOI: 10.1016/0165-6074(95)00020-O

Joachim König , Lothar Thiele

Domain specific architectures gain more and more attention in high performance applications, when general purpose processors are not capable of achieving the desired throughput. The algorithms to be implemented strongly influence the architecture of the design and vice versa. A systematic approach based on provably correct transformations, herein called Algorithm-Architecture Co-design, will be demonstrated on the design of a processor for long integer arithmetic.

在高性能应用中，当通用处理器无法达到预期的吞吐量时，特定领域的体系结构越来越受到关注。要实现的算法对设计的体系结构有很大的影响，反之亦然。一种基于可证明的正确转换的系统方法，这里称为算法-架构协同设计，将在长整数算法的处理器设计中进行演示。

引用次数: 1

Parallel programmable architectures and compilation for multi-dimensional processing 多维处理的并行可编程体系结构与编译

Microprocessing and Microprogramming

Pub Date : 1995-10-01 DOI: 10.1016/0165-6074(95)00019-K

F. Catthoor , M. Moonen

In this introduction, we will summarize the main contributions of the papers collected in this special issue. Moreover, the topics addressed in these papers will be linked to the major research trends in the domain of parallel algorithms, architectures and compilation.

在这篇引言中，我们将对本期特刊中收录的论文的主要贡献进行总结。此外，这些论文所讨论的主题将与并行算法、架构和编译领域的主要研究趋势联系起来。

引用次数: 2

Architecture and C++-programming environment of a highly parallel image signal processor 一个高度并行图像信号处理器的体系结构和c++编程环境

Microprocessing and Microprogramming

Pub Date : 1995-10-01 DOI: 10.1016/0165-6074(95)00023-H

J. Kneip, M. Ohmacht, K. Rönner, P. Pirsch

A highly parallel single-chip image signal processor architecture has been derived by analysis of image processing algorithms. Available levels of parallelism and their associated demands on data access, control and complexity of operations were taken into account. The RISC-architecture, called “HiPAR-DSP”, consists of a control unit, 16 parallel ASIMD-controlled datapaths with autonomous addressing and instruction selection capability, a local data cache per data path, a shared memory with matrix type data access and a powerful DMA-unit. The proposed architecture was designed by assessing the results of an analysis of characteristic algorithm properties with respect to their inherent parallelization resources, achievable speed up and implementation costs. This resulted in a proper balance between the degree of parallelism and flexibility, leading to a high performance for a wide field of applications. Additional measures were taken to support an efficient high level programmability of the processor. This was achieved by the concurrent implementation of special architectural features and a C++-programming environment. It consists of an adaptation of the GNU C++-compiler and an optimizing assembler, supporting all levels of concurrence offered by the hardware. While most levels of parallelization are kept invisible to the programmer, data-level parallelism is expressed by the programmer using special new data types added to the standard C/C++-data-types. A sustained performance of about 2.0 Gigaoperations per second is achieved by the 100 MHz clocked processor for numerous image processing algorithms, leading to a processing time e.g. for a normalized correlation of a 512 × 512 image with a 32 × 32 correlation mask of 450 ms. Thus, a performance is achieved with a programmable parallel processor architecture that hitherto required the application of a dedicated integrated circuit.

通过对图像处理算法的分析，推导出了一种高度并行的单片机图像信号处理器结构。考虑了可用的并行级别及其对数据访问、控制和操作复杂性的相关要求。称为“HiPAR-DSP”的risc架构由一个控制单元、16个具有自主寻址和指令选择能力的并行asimd控制数据路径、每个数据路径的本地数据缓存、具有矩阵类型数据访问的共享内存和一个功能强大的dma单元组成。该架构的设计是通过评估特征算法属性的分析结果，包括其固有的并行化资源、可实现的速度和实现成本。这导致了并行度和灵活性之间的适当平衡，从而为广泛的应用程序领域带来了高性能。采取了额外的措施来支持处理器的高效高级可编程性。这是通过特殊架构特性和c++编程环境的并发实现实现的。它由GNU c++编译器的改编版和优化的汇编器组成，支持硬件提供的所有级别的并发。虽然大多数级别的并行化对程序员来说是不可见的，但数据级别的并行性是由程序员使用添加到标准C/ c++数据类型的特殊新数据类型来表达的。对于许多图像处理算法，100 MHz时钟处理器实现了每秒约2.0千兆操作的持续性能，导致处理时间缩短，例如，对于具有32 × 32相关掩膜的512 × 512图像的归一化相关，处理时间为450 ms。因此，性能是通过可编程并行处理器架构实现的，而迄今为止需要应用专用集成电路。

{"title":"Architecture and C++-programming environment of a highly parallel image signal processor","authors":"J. Kneip, M. Ohmacht, K. Rönner, P. Pirsch","doi":"10.1016/0165-6074(95)00023-H","DOIUrl":"10.1016/0165-6074(95)00023-H","url":null,"abstract":"<div><p>A highly parallel single-chip image signal processor architecture has been derived by analysis of image processing algorithms. Available levels of parallelism and their associated demands on data access, control and complexity of operations were taken into account. The RISC-architecture, called “HiPAR-DSP”, consists of a control unit, 16 parallel ASIMD-controlled datapaths with autonomous addressing and instruction selection capability, a local data cache per data path, a shared memory with matrix type data access and a powerful DMA-unit. The proposed architecture was designed by assessing the results of an analysis of characteristic algorithm properties with respect to their inherent parallelization resources, achievable speed up and implementation costs. This resulted in a proper balance between the degree of parallelism and flexibility, leading to a high performance for a wide field of applications. Additional measures were taken to support an efficient high level programmability of the processor. This was achieved by the concurrent implementation of special architectural features and a C++-programming environment. It consists of an adaptation of the GNU C++-compiler and an optimizing assembler, supporting all levels of concurrence offered by the hardware. While most levels of parallelization are kept invisible to the programmer, data-level parallelism is expressed by the programmer using special new data types added to the standard C/C++-data-types. A sustained performance of about 2.0 Gigaoperations per second is achieved by the 100 MHz clocked processor for numerous image processing algorithms, leading to a processing time e.g. for a normalized correlation of a 512 × 512 image with a 32 × 32 correlation mask of 450 ms. Thus, a performance is achieved with a programmable parallel processor architecture that hitherto required the application of a dedicated integrated circuit.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 5","pages":"Pages 391-408"},"PeriodicalIF":0.0,"publicationDate":"1995-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00023-H","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116801854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Mapping real-time motion estimation type algorithms to memory efficient, programmable multi-processor architectures 映射实时运动估计类型算法到内存高效，可编程的多处理器架构

Microprocessing and Microprogramming

Pub Date : 1995-10-01 DOI: 10.1016/0165-6074(95)99030-9

E. De Greef, F. Catthoor, H. De Man

In this paper, an architectural template is presented, which is able to execute the full search motion estimation algorithm or other similar video or image processing algorithms in real time. The architecture is based on a set of programmable video signal processors (VSP's). It is also possible to integrate the processor cores and their local memories on a (set of) chip(s). Due to the programmability, the system is very flexible and can be used for emulation of other similar block-oriented local-neighborhood algorithms. The architecture can be easily divided into several partitions, without data-exchange between partitions. Special attention is paid to memory size and transfer optimization, which are dominant factors for both area and power cost. The trade-offs and techniques used to arrive at these solutions are explained in detail. It is shown that careful optimizations can lead to large savings in memory size (up to 66%) and bandwidth requirements (up to a factor of 4) compared to a straightforward solution.

本文提出了一个架构模板，该模板能够实时执行全搜索运动估计算法或其他类似的视频或图像处理算法。该体系结构基于一组可编程视频信号处理器(VSP)。也可以将处理器核心和它们的本地存储器集成在一组芯片上。由于具有可编程性，该系统具有很强的灵活性，可用于模拟其他类似的面向块的局部邻域算法。该体系结构可以很容易地划分为几个分区，分区之间不需要数据交换。特别注意存储器大小和传输优化，这是面积和功耗成本的主要因素。详细解释了用于实现这些解决方案的权衡和技术。结果表明，与直接的解决方案相比，仔细的优化可以大大节省内存大小(最多66%)和带宽需求(最多4倍)。

引用次数: 17

A scalable design for VLSI dictionary machines VLSI字典机的可扩展设计

Microprocessing and Microprogramming

Pub Date : 1995-10-01 DOI: 10.1016/0165-6074(95)00021-F

Thibault Duboux, Afonso Ferreira , Michel Gastaldo

Most of the proposed VLSI dictionary machines appearing in the literature were designed to fit in one chip only. If the number of acquired elements is larger than that of VLSI cells, another chip has to be designed and manufactured to take a larger dictionary into account. In this paper, we propose a new design for dictionary machines that assembles blocks of standard existing dictionary machines. Our machine is as efficient as the best machines described in the literature, with the enormous advantage of scaling up quite easily, with no degradation of its performance, by simply adding more and more standard blocks.

在文献中出现的大多数提议的VLSI字典机被设计为只适合一个芯片。如果获得的元件数量大于VLSI单元的数量，则必须设计和制造另一个芯片来考虑更大的字典。在本文中，我们提出了一种新的字典机设计，它将标准的现有字典机的块组装在一起。我们的机器与文献中描述的最好的机器一样高效，具有非常容易扩展的巨大优势，而不会降低其性能，只需添加越来越多的标准块。

引用次数: 0

Multilayered neural network implementation on transputer systolic array 基于转发器收缩阵列的多层神经网络实现

Microprocessing and Microprogramming

Pub Date : 1995-08-01 DOI: 10.1016/0165-6074(95)00010-L

Q. Song, E.K. Teoh, D.P. Mital

Performance analysis and comparison are carried out for the one- and two-dimensional systolic arrays based on transputers. Low efficiency has been found in the one-dimensional array because of communication overhead. The systolic algorithm is extended to the two-dimensional array to implement a full parallelism in each layer's calculation. This speeds up simulation of the network. Experiment results are provided to support the performance evaluation.

对基于转发器的一维和二维收缩阵列进行了性能分析和比较。由于通信开销的影响，一维阵列的效率很低。将收缩算法扩展到二维数组中，实现了每层计算的完全并行化。这加快了网络模拟的速度。实验结果为性能评价提供了依据。

引用次数: 2

Calendar of forthcoming conferences and events 即将召开的会议和活动日历

Microprocessing and Microprogramming

Pub Date : 1995-08-01 DOI: 10.1016/0165-6074(95)90003-9

引用次数: 0

Supporting user mobility in the Distributed Logical Machine System 支持分布式逻辑机系统中的用户移动性

Microprocessing and Microprogramming

Pub Date : 1995-08-01 DOI: 10.1016/0165-6074(95)00011-C

Jim-Min Lin , Shang Rong Tsai

User or program mobility in distributed computing systems is becoming increasingly significant in the modern community since users may change their working locations frequently. Job migration is supplementary to remote login in the support of user mobility. However, the migration facility is not a common feature in distributed systems yet. This is mainly due to the inherent complexity in implementing such a facility. This paper proposes a logical machine migration mechanism that can effectively support software environment migration. The basic idea behind logical machine migration is to migrate a logical machine, including the running processes and their execution environment, by a single mechanism. Thus most of the migration difficulties due to the dependency on the operating system kernel are eliminated. We have realized an experimental system, called DLMS386, which successfully demonstrates such idea.

分布式计算系统中的用户或程序移动性在现代社区中变得越来越重要，因为用户可能频繁地改变他们的工作地点。作业迁移是对远程登录的补充，支持用户移动性。然而，迁移功能在分布式系统中还不是一个常见的特性。这主要是由于实施这种设施的固有复杂性。本文提出了一种能够有效支持软件环境迁移的逻辑机器迁移机制。逻辑机器迁移背后的基本思想是通过单一机制迁移逻辑机器，包括正在运行的进程及其执行环境。因此，由于依赖于操作系统内核而导致的大多数迁移困难都被消除了。我们已经实现了一个实验系统DLMS386，成功地验证了这一思想。

引用次数: 2

An experimental mixed purpose network 一个实验性的混合目的网络

Microprocessing and Microprogramming

Pub Date : 1995-08-01 DOI: 10.1016/0165-6074(95)00012-D

R. Posch, F. Pucher

This paper deals with an experimental network application. As an example, an environment with on screen display of low density information is selected. Such an environment can be found in hospitals where patients have to be guided from one place to another, as well as in many other situations, like airports. The method used is a fiber based 20 Mbit/sec network. In order to have a homogeneous structure transputer links are used throughout. Both, packet oriented inter processor communications and low level bit streams for the video frames, can coexist over these links. Uniformity in the physical layer [1] ensures maximum reliability and flexibility. With the usage of transputer links, fault detection in this application is inherent. The overall design is a highly distributed and low cost solution. Interfaces to standard networks are easily available.

本文讨论了一个实验性的网络应用。作为示例，选择屏幕显示低密度信息的环境。这种环境可以在医院中找到，病人必须从一个地方引导到另一个地方，以及在许多其他情况下，如机场。使用的方法是基于光纤的20 Mbit/s网络。为了有一个均匀的结构，在整个过程中都使用了传输链路。在这些链路上，面向包的处理器间通信和用于视频帧的低级比特流都可以共存。物理层的均匀性[1]保证了最大的可靠性和灵活性。由于使用了转发器链路，该应用程序中的故障检测是固有的。整体设计是一个高度分布式和低成本的解决方案。标准网络的接口很容易获得。

引用次数: 0

On synthesizing cube and tree for parallel processing 并行处理中立方体与树的综合

Microprocessing and Microprogramming

Pub Date : 1995-08-01 DOI: 10.1016/0165-6074(95)00015-G

S.K. Basu , J.Datta Gupta , R.Datta Gupta

In this paper we propose a VLSI implementable architecture called Cube Connected Tree having advantageous properties of both tree and hypercube. This structure has a fixed low degree of nodes for any size of the network unlike the hypercube where the node degree is dependent on the size of the hypercube. The degree-diameter product metric [26]of CCT is low compared to that of a hypercube of comparable size. It overcomes the data congestion problem near the root of the binary tree by having multiple roots in the structure, thereby enhancing the I/O bandwidth of the system. The complexity of the VLSI layout of this structure has been addressed within the grid model of Thompson [12]. By using spare links and PEs, fault tolerance capabilities of the system have been enhanced. Easy programmability of this structure has been demonstrated by designing polylogarithmic algorithms for sorting and discrete Fourier transform.

在本文中，我们提出了一种具有树形和超立方体双重优点的VLSI可实现结构，称为立方体连接树。这种结构对于任何大小的网络都有固定的低节点度，这与超立方体不同，超立方体的节点度依赖于超立方体的大小。与同等尺寸的超立方体相比，CCT的度-直径乘积度量[26]较低。通过在结构中存在多个根，克服了二叉树根附近的数据拥塞问题，从而提高了系统的I/O带宽。这种结构的VLSI布局的复杂性已经在Thompson的网格模型中得到了解决[12]。通过使用备用链路和pe，增强了系统的容错能力。通过设计用于排序和离散傅里叶变换的多对数算法，证明了该结构的易编程性。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Microprocessing and Microprogramming

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀