首页 > 最新文献

Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)最新文献

英文 中文
Adaptive middleware for heterogeneous defence networks-an exploratory simulation study 异构防御网络自适应中间件探索性仿真研究
B. McClure, T. Au, J. Indulska
This paper presents the design and evaluation through a discrete event simulation of an ODP-based Adaptive Computing Architecture which manages network resources in large-scale heterogeneous error-prone networks. The emphasis is given to network (communication) adaptation of this architecture simulated for an exemplar defence network. The results show that, for this network, the architecture provides significant improvement in terms of higher priority requests meeting their QoS requirements and adaptation to link failure under heavy link utilisation. In addition, link utilisation is lower with the architecture active.
本文通过离散事件模拟,提出了一种基于odp的自适应计算体系结构的设计和评估方法,该体系结构用于管理大规模异构易出错网络中的网络资源。重点介绍了该体系结构的网络(通信)适应性,并对一个典型防御网络进行了仿真。结果表明,对于该网络,该架构在满足其QoS要求的高优先级请求和高链路利用率下链路故障的适应方面提供了显着改进。此外,当体系结构处于活动状态时,链路利用率较低。
{"title":"Adaptive middleware for heterogeneous defence networks-an exploratory simulation study","authors":"B. McClure, T. Au, J. Indulska","doi":"10.1109/ACAC.2000.824324","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824324","url":null,"abstract":"This paper presents the design and evaluation through a discrete event simulation of an ODP-based Adaptive Computing Architecture which manages network resources in large-scale heterogeneous error-prone networks. The emphasis is given to network (communication) adaptation of this architecture simulated for an exemplar defence network. The results show that, for this network, the architecture provides significant improvement in terms of higher priority requests meeting their QoS requirements and adaptation to link failure under heavy link utilisation. In addition, link utilisation is lower with the architecture active.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129270931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable re-configurable processor 可扩展的可重新配置的处理器
John Morris, G. Bundell, S. Tham
Several commercial and research projects have produced a variety of 'computing surfaces' based on FPGAs with some interconnection pattern. However, because the majority of these projects have constrained themselves to two-dimensional structures that can be fabricated on a single planar substrate, the interconnect patterns are fixed and severely constrain the ability of a problem to be mapped on to the prototyping system. This paper describes a simple development of the Achilles interprocessor switch. Achilles' 3D stack of processors provides a flexible and scalable system-any number of stacks may be connected together in a small volume and a user may set up a connection pattern quite different from any envisaged by the hardware designer. Simulation of control systems where there are large numbers of objects such as traffic flows, network message traffic, etc, is CPU intensive and generally requires inordinately long runs on conventional sequential processors. So we have chosen Petri Net simulation for a feasibility study for Achilles as a reconfigurable processor. This showed that the architecture is particularly suitable for Petri Net simulations as hundreds of places in a net can be simultaneously active-reducing by orders of magnitude the time necessary for simulations.
几个商业和研究项目已经产生了各种基于fpga的“计算面”,这些“计算面”具有一些互连模式。然而,由于这些项目中的大多数都将自己限制在可以在单个平面基板上制造的二维结构上,因此互连模式是固定的,并且严重限制了将问题映射到原型系统的能力。本文介绍了一种简单的阿基里斯处理器间开关的开发。Achilles的3D处理器堆栈提供了一个灵活且可扩展的系统——任何数量的堆栈都可以在一个小体积中连接在一起,并且用户可以设置与硬件设计者所设想的完全不同的连接模式。在有大量对象(如交通流、网络消息流量等)的控制系统中,仿真是CPU密集型的,通常需要在常规顺序处理器上超长时间运行。因此,我们选择Petri网仿真来研究Achilles作为可重构处理器的可行性。这表明该架构特别适合Petri网模拟,因为网络中的数百个位置可以同时活动-减少了模拟所需的时间的数量级。
{"title":"A scalable re-configurable processor","authors":"John Morris, G. Bundell, S. Tham","doi":"10.1109/ACAC.2000.824325","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824325","url":null,"abstract":"Several commercial and research projects have produced a variety of 'computing surfaces' based on FPGAs with some interconnection pattern. However, because the majority of these projects have constrained themselves to two-dimensional structures that can be fabricated on a single planar substrate, the interconnect patterns are fixed and severely constrain the ability of a problem to be mapped on to the prototyping system. This paper describes a simple development of the Achilles interprocessor switch. Achilles' 3D stack of processors provides a flexible and scalable system-any number of stacks may be connected together in a small volume and a user may set up a connection pattern quite different from any envisaged by the hardware designer. Simulation of control systems where there are large numbers of objects such as traffic flows, network message traffic, etc, is CPU intensive and generally requires inordinately long runs on conventional sequential processors. So we have chosen Petri Net simulation for a feasibility study for Achilles as a reconfigurable processor. This showed that the architecture is particularly suitable for Petri Net simulations as hundreds of places in a net can be simultaneously active-reducing by orders of magnitude the time necessary for simulations.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130467669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Static scheduling for out-of-order instruction issue processors 无序指令问题处理器的静态调度
D. Tate, G. Steven, F. Steven
Superscalar processors strive to increase the number of instructions issued in each processor cycle. Compilers therefore need to expose as much Instruction Level Parallelism (ILP) as possible by using increasingly complex code optimisations. However, the knowledge base of instruction scheduling is focused on in-order instruction issue. It has previously been determined that aggressive static instruction scheduling impedes the speedup achieved by out-of-order instruction issue given an ideal environment. This paper examines how the scheduling process impairs the performance of out-of-order instruction issue. The use of Boolean guards, function in-lining, register renaming and percolation both between basic blocks and around loop back edges is evaluated. The results show that removing Boolean guards and severely limiting percolation while retaining function in-lining produces an improvement over unscheduled benchmarks.
超标量处理器努力增加每个处理器周期中发出的指令数量。因此,编译器需要通过使用越来越复杂的代码优化来暴露尽可能多的指令级并行性(ILP)。然而,指令调度的知识库主要集中在指令顺序问题上。以前已经确定,在理想环境下,激进的静态指令调度会阻碍无序指令问题实现的加速。本文研究了调度过程如何影响无序指令问题的性能。评估了在基本块之间和循环后边缘之间使用布尔保护、函数内联、寄存器重命名和渗透。结果表明,在保留函数内联的同时移除布尔保护并严格限制渗透,会比未计划的基准测试产生改进。
{"title":"Static scheduling for out-of-order instruction issue processors","authors":"D. Tate, G. Steven, F. Steven","doi":"10.1109/ACAC.2000.824329","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824329","url":null,"abstract":"Superscalar processors strive to increase the number of instructions issued in each processor cycle. Compilers therefore need to expose as much Instruction Level Parallelism (ILP) as possible by using increasingly complex code optimisations. However, the knowledge base of instruction scheduling is focused on in-order instruction issue. It has previously been determined that aggressive static instruction scheduling impedes the speedup achieved by out-of-order instruction issue given an ideal environment. This paper examines how the scheduling process impairs the performance of out-of-order instruction issue. The use of Boolean guards, function in-lining, register renaming and percolation both between basic blocks and around loop back edges is evaluated. The results show that removing Boolean guards and severely limiting percolation while retaining function in-lining produces an improvement over unscheduled benchmarks.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114674365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dataflow Java: implicitly parallel Java 数据流Java:隐式并行Java
Gareth Lee, John Morris
Dataflow computation models enable simpler and more efficient management of the memory hierarchy-a key barrier to the performance of many parallel programs. This paper describes a dataflow language based on Java. Use of the dataflow model enables a programmer to generate parallel programs without explicit directions for message passing, work allocation and synchronisation. A small handful of additional syntactic constructs are required. A pre-processor is used to convert Dataflow Java programs to standard portable Java. The underlying run-time system was easy to implement using Java's object modelling and communications primitives. Although raw performance lags behind an equivalent C-based system, we were able to demonstrate useful speedups in a heterogeneous environment, thus amply illustrating the potential power of the Dataflow Java approach to use all machines-of whatever type-that might be available on a network when Java JIT compiler technology matures.
数据流计算模型能够更简单、更有效地管理内存层次结构——这是许多并行程序性能的一个关键障碍。本文介绍了一种基于Java的数据流语言。数据流模型的使用使程序员能够生成并行程序,而无需对消息传递、工作分配和同步进行明确的指示。还需要一些额外的语法结构。预处理器用于将数据流Java程序转换为标准的可移植Java。使用Java的对象建模和通信原语,底层运行时系统很容易实现。尽管原始性能落后于等效的基于c的系统,但我们能够在异构环境中演示有用的加速,从而充分说明当Java JIT编译器技术成熟时,Dataflow Java方法使用网络上可用的所有机器(无论哪种类型)的潜在能力。
{"title":"Dataflow Java: implicitly parallel Java","authors":"Gareth Lee, John Morris","doi":"10.1109/ACAC.2000.824321","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824321","url":null,"abstract":"Dataflow computation models enable simpler and more efficient management of the memory hierarchy-a key barrier to the performance of many parallel programs. This paper describes a dataflow language based on Java. Use of the dataflow model enables a programmer to generate parallel programs without explicit directions for message passing, work allocation and synchronisation. A small handful of additional syntactic constructs are required. A pre-processor is used to convert Dataflow Java programs to standard portable Java. The underlying run-time system was easy to implement using Java's object modelling and communications primitives. Although raw performance lags behind an equivalent C-based system, we were able to demonstrate useful speedups in a heterogeneous environment, thus amply illustrating the potential power of the Dataflow Java approach to use all machines-of whatever type-that might be available on a network when Java JIT compiler technology matures.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116153955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Parallel architecture for the implementation of the embedded zerotree wavelet algorithm 并行架构实现的嵌入式零树小波算法
H. Cheung, L. Ang, K. Eshraghian
We propose a parallel architecture for the implementation of the embedded zerotree wavelet (EZW) algorithm, based on the depth-first search (DFS) bit stream (BS) architecture. Using the depth-first search of the wavelet coefficient tree, the wavelet coefficients in the coefficient tree are first partitioned into independent sub-trees. In the case of full parallelism, each of the sub-trees is processed by an independent processor. The output from each processor is then multiplexed back into a single output bit stream. While the output bit stream from each sub-tree processor is in the depth-first search format, the overall multiplexed output bit stream represents the search of the sub-trees in parallel. The implementation of each of the sub-tree EZW processor is based on the DFS BS architecture, which accepts the bits of the coefficients in decreasing order of significance from a sub-tree. All the bits in a significant bit plane are processed to produce the output bit stream from the architecture in one scan of the sub-trees. The rise of the DFS BS structure also makes it possible for partial parallelism where a sub-tree processor can process two or more sub-trees in sequence. This provides flexibility for the design of the overall processor optimally to match the speed of the overall input bit stream. The emphasis in this paper is on the parallel processing aspect of the DFS BS architecture. A sub-tree processor can be easily modified to perform any improved EZW algorithm, and the multiplexer for the output bit streams from the processors can be modified to produce the format of the EZW algorithm based on other tree searching schemes similar to the SPIHT algorithm.
我们提出了一种基于深度优先搜索(DFS)比特流(BS)架构的嵌入式零树小波(EZW)算法的并行架构。利用小波系数树的深度优先搜索,首先将系数树中的小波系数划分为独立的子树;在完全并行的情况下,每个子树都由一个独立的处理器处理。然后,每个处理器的输出被多路复用回单个输出比特流。虽然来自每个子树处理器的输出位流是深度优先搜索格式,但总体多路输出位流表示并行搜索子树。每个子树EZW处理器的实现都基于DFS BS架构,该架构从子树中接受系数的位按重要性递减的顺序。有效位平面中的所有位都经过处理,在一次扫描子树时产生该体系结构的输出位流。DFS BS结构的兴起也使得部分并行成为可能,其中子树处理器可以按顺序处理两个或多个子树。这为整体处理器的设计提供了灵活性,以最佳地匹配整体输入比特流的速度。本文的重点是DFS - BS体系结构的并行处理方面。子树处理器可以很容易地修改以执行任何改进的EZW算法,并且可以修改处理器输出比特流的多路复用器,以产生基于类似于SPIHT算法的其他树搜索方案的EZW算法格式。
{"title":"Parallel architecture for the implementation of the embedded zerotree wavelet algorithm","authors":"H. Cheung, L. Ang, K. Eshraghian","doi":"10.1109/ACAC.2000.824316","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824316","url":null,"abstract":"We propose a parallel architecture for the implementation of the embedded zerotree wavelet (EZW) algorithm, based on the depth-first search (DFS) bit stream (BS) architecture. Using the depth-first search of the wavelet coefficient tree, the wavelet coefficients in the coefficient tree are first partitioned into independent sub-trees. In the case of full parallelism, each of the sub-trees is processed by an independent processor. The output from each processor is then multiplexed back into a single output bit stream. While the output bit stream from each sub-tree processor is in the depth-first search format, the overall multiplexed output bit stream represents the search of the sub-trees in parallel. The implementation of each of the sub-tree EZW processor is based on the DFS BS architecture, which accepts the bits of the coefficients in decreasing order of significance from a sub-tree. All the bits in a significant bit plane are processed to produce the output bit stream from the architecture in one scan of the sub-trees. The rise of the DFS BS structure also makes it possible for partial parallelism where a sub-tree processor can process two or more sub-trees in sequence. This provides flexibility for the design of the overall processor optimally to match the speed of the overall input bit stream. The emphasis in this paper is on the parallel processing aspect of the DFS BS architecture. A sub-tree processor can be easily modified to perform any improved EZW algorithm, and the multiplexer for the output bit streams from the processors can be modified to produce the format of the EZW algorithm based on other tree searching schemes similar to the SPIHT algorithm.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132215310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Micro-threading: a new approach to future RISC 微线程:未来RISC的新方法
C. Jesshope, Bing Luo
This paper briefly reviews the current research into RISC microprocessor architecture, which now seems to be so complex as to make the acronym somewhat of an oxymoron. In response to this development we present a new approach to RISC micro-architecture named micro-threading. Micro-threading exploits instruction-level parallelism by multi-threading but where the threads are all assumed to be drawn from the same context and are thus represented by just a program counter. This approach attempts to overcomes the limit of RISC instruction control (branch, loop, etc.) and data control (data miss, etc.) by providing such a low context switch time that it can be used not only to tolerate high latency memory but also avoid speculation in instruction execution. It is therefore able to provide a more efficient approach to instruction pipelining. In order to demonstrate this approach we compile simple examples to illustrate the concept of micro-threading within the same context. Then one possible architecture of a micro-threaded pipeline is presented in detail. At last, we give some comparisons and a conclusion.
本文简要回顾了当前对RISC微处理器架构的研究,现在似乎是如此复杂,以至于使首字母缩略词有点矛盾。针对这一发展,我们提出了一种新的RISC微架构方法——微线程。微线程利用多线程的指令级并行性,但假定所有线程都是从相同的上下文中绘制的,因此仅由程序计数器表示。这种方法试图克服RISC指令控制(分支、循环等)和数据控制(数据丢失等)的限制,通过提供如此低的上下文切换时间,它不仅可以用于容忍高延迟内存,还可以避免指令执行中的猜测。因此,它能够为指令流水线提供一种更有效的方法。为了演示这种方法,我们编译了一些简单的示例来说明在相同上下文中微线程的概念。然后详细介绍了一种可能的微线程流水线结构。最后进行了比较和总结。
{"title":"Micro-threading: a new approach to future RISC","authors":"C. Jesshope, Bing Luo","doi":"10.1109/ACAC.2000.824320","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824320","url":null,"abstract":"This paper briefly reviews the current research into RISC microprocessor architecture, which now seems to be so complex as to make the acronym somewhat of an oxymoron. In response to this development we present a new approach to RISC micro-architecture named micro-threading. Micro-threading exploits instruction-level parallelism by multi-threading but where the threads are all assumed to be drawn from the same context and are thus represented by just a program counter. This approach attempts to overcomes the limit of RISC instruction control (branch, loop, etc.) and data control (data miss, etc.) by providing such a low context switch time that it can be used not only to tolerate high latency memory but also avoid speculation in instruction execution. It is therefore able to provide a more efficient approach to instruction pipelining. In order to demonstrate this approach we compile simple examples to illustrate the concept of micro-threading within the same context. Then one possible architecture of a micro-threaded pipeline is presented in detail. At last, we give some comparisons and a conclusion.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"103 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113963823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Fast address-space switching on the StrongARM SA-1100 processor StrongARM SA-1100处理器上的快速地址空间切换
Adam Wiggins, G. Heiser
The StrongARM SA-1100 is a high-speed low-power processor aimed at embedded and portable applications. Its architecture features virtual caches and TLBs which are not tagged by an address-space identifier. Consequently, context switches on that processor are potentially very expensive, as they may require complete flushes of TLBs and caches. This paper presents the design of an address-space management technique for the StrongARM which minimises TLB and cache flushes and thus context switching costs. The basic idea is to implement the top-level of the (hardware-walked) page-table as a cache for page directory entries for different address spaces. This allows switching address spaces with minimal overhead as long as the working sets do not overlap. For small (/spl les/32 MB) address spaces further improvements are possible by making use of the StrongARM's re-mapping facility. Our technique is discussed in the context of the LA microkernel in which it will be implemented.
StrongARM SA-1100是一款针对嵌入式和便携式应用的高速低功耗处理器。它的架构特点是虚拟缓存和tlb,它们没有地址空间标识符标记。因此,该处理器上的上下文切换可能非常昂贵,因为它们可能需要完全刷新tlb和缓存。本文介绍了StrongARM的地址空间管理技术的设计,该技术可以最大限度地减少TLB和缓存刷新,从而减少上下文切换成本。基本思想是将(硬件遍历的)页表的顶层实现为不同地址空间的页目录条目的缓存。只要工作集不重叠,这就允许以最小的开销切换地址空间。对于较小的(/spl / 32mb)地址空间,通过使用StrongARM的重新映射功能可以进一步改进。我们的技术将在实现该技术的LA微内核上下文中进行讨论。
{"title":"Fast address-space switching on the StrongARM SA-1100 processor","authors":"Adam Wiggins, G. Heiser","doi":"10.1109/ACAC.2000.824330","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824330","url":null,"abstract":"The StrongARM SA-1100 is a high-speed low-power processor aimed at embedded and portable applications. Its architecture features virtual caches and TLBs which are not tagged by an address-space identifier. Consequently, context switches on that processor are potentially very expensive, as they may require complete flushes of TLBs and caches. This paper presents the design of an address-space management technique for the StrongARM which minimises TLB and cache flushes and thus context switching costs. The basic idea is to implement the top-level of the (hardware-walked) page-table as a cache for page directory entries for different address spaces. This allows switching address spaces with minimal overhead as long as the working sets do not overlap. For small (/spl les/32 MB) address spaces further improvements are possible by making use of the StrongARM's re-mapping facility. Our technique is discussed in the context of the LA microkernel in which it will be implemented.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129245870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
The circuit object organisation library 电路对象组织库
B. Gunther
The Circuit Object Organisation Library is a C++ class library for developing continuously executing circuit generator programs used in real-time, adaptive reconfigurable computing applications. A C++ program linked with COOL can execute autonomously, since COOL provides a high-speed place and route facility for realising fine grained FPGA circuits from object-oriented structural descriptions. With COOL the need for separate hardware description and software programming languages disappears. The class inheritance concept is used to define specialised circuits, composed of gate, port, and wire objects. An applications programming interface borrowing from graphical user interface toolkits, automatic storage reclamation, and use of operator overloading make circuit description intuitive and relatively accessible to developers without a strong hardware background. COOL features constructive placement algorithms, and a two-stage router that minimises average run time, yet handles difficult routes via a last-resort Lee maze router. Preliminary tests reveal that COOL can realise circuits at rates of tens of thousands of gates per second on a low-end PC.
电路对象组织库是一个c++类库,用于开发连续执行的电路生成器程序,用于实时,自适应可重构计算应用。与COOL相关联的c++程序可以自主执行,因为COOL为从面向对象的结构描述中实现细粒度FPGA电路提供了高速位置和路由设施。有了COOL,就不需要单独的硬件描述和软件编程语言了。类继承概念用于定义由门、端口和线对象组成的专用电路。应用程序编程接口借鉴了图形用户界面工具包、自动存储回收和操作符重载的使用,使得电路描述直观,对于没有强大硬件背景的开发人员来说相对容易理解。COOL具有建设性的布局算法和两阶段路由器,可最大限度地减少平均运行时间,但通过最后的Lee迷宫路由器处理困难的路线。初步测试表明,COOL可以在低端PC上以每秒数万门的速率实现电路。
{"title":"The circuit object organisation library","authors":"B. Gunther","doi":"10.1109/ACAC.2000.824319","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824319","url":null,"abstract":"The Circuit Object Organisation Library is a C++ class library for developing continuously executing circuit generator programs used in real-time, adaptive reconfigurable computing applications. A C++ program linked with COOL can execute autonomously, since COOL provides a high-speed place and route facility for realising fine grained FPGA circuits from object-oriented structural descriptions. With COOL the need for separate hardware description and software programming languages disappears. The class inheritance concept is used to define specialised circuits, composed of gate, port, and wire objects. An applications programming interface borrowing from graphical user interface toolkits, automatic storage reclamation, and use of operator overloading make circuit description intuitive and relatively accessible to developers without a strong hardware background. COOL features constructive placement algorithms, and a two-stage router that minimises average run time, yet handles difficult routes via a last-resort Lee maze router. Preliminary tests reveal that COOL can realise circuits at rates of tens of thousands of gates per second on a low-end PC.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130093212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reconfigurable computing based on universal configurable blocks-a new approach for supporting performance- and realtime-dominated applications 基于通用可配置块的可重构计算——一种支持以性能和实时为主导的应用程序的新方法
Christian Siemers, Sybille Siemers
A novel architecture for reconfigurable computing based on a coarse grain FPGA-like architecture is introduced. The basic blocks contain all arithmetical and logical capacities as well as some registers and will be programmable by sequential instruction streams produced by software compiler. Reconfiguration is related to hyper-blocks of instructions. For the composed reconfigurable processors a classification is introduced for describing realtime, multithreading and performance capabilities.
提出了一种基于类粗粒度fpga结构的可重构计算新架构。基本块包含所有算术和逻辑能力以及一些寄存器,并将由软件编译器产生的顺序指令流进行编程。重新配置与指令的超块有关。对于组合的可重构处理器,引入了描述实时性、多线程性和性能的分类。
{"title":"Reconfigurable computing based on universal configurable blocks-a new approach for supporting performance- and realtime-dominated applications","authors":"Christian Siemers, Sybille Siemers","doi":"10.1109/ACAC.2000.824328","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824328","url":null,"abstract":"A novel architecture for reconfigurable computing based on a coarse grain FPGA-like architecture is introduced. The basic blocks contain all arithmetical and logical capacities as well as some registers and will be programmable by sequential instruction streams produced by software compiler. Reconfiguration is related to hyper-blocks of instructions. For the composed reconfigurable processors a classification is introduced for describing realtime, multithreading and performance capabilities.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115750634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the feasibility of fixed-length block structured architectures 论定长块结构体系结构的可行性
L. Eeckhout, K. D. Bosschere, H. Neefs
Scaling contemporary superscalar microarchitectures to higher levels of parallelism in future technologies seems to be impractical due to the increasing complexity. In this paper, we show that a fixed-length block structured instruction set architecture (BSA), is capable of reducing the hardware complexity and is therefore feasible as an alternative architectural paradigm for traditional architectures with large virtual window sizes for future technologies. This is reached through two major interventions. First, statically, grouping instructions from various basic blocks into larger atomic units of work with a fixed length, called blocks, makes fetching easier. Second, a decentralized microarchitecture reduces the processor core logic significantly resulting in higher clock frequencies. The performance evaluation methodology used in this paper both considers IPC (number of useful instructions retired per clock cycle) and clock cycle period. In addition, a broad design space is explored by quantifying the influence of various microarchitectural parameters on overall performance.
由于复杂性的增加,在未来技术中将当前的超标量微架构扩展到更高的并行度似乎是不切实际的。在本文中,我们证明了固定长度的块结构指令集架构(BSA)能够降低硬件复杂性,因此可以作为未来技术中具有大虚拟窗口大小的传统架构的替代架构范例。这是通过两项主要干预措施实现的。首先,静态地将来自各种基本块的指令分组为具有固定长度的更大的原子工作单元(称为块),使获取更容易。其次,分散的微体系结构大大减少了处理器核心逻辑,从而导致更高的时钟频率。本文中使用的性能评估方法考虑了IPC(每个时钟周期内退役的有用指令数)和时钟周期。此外,通过量化各种微建筑参数对整体性能的影响,探索了广阔的设计空间。
{"title":"On the feasibility of fixed-length block structured architectures","authors":"L. Eeckhout, K. D. Bosschere, H. Neefs","doi":"10.1109/ACAC.2000.824318","DOIUrl":"https://doi.org/10.1109/ACAC.2000.824318","url":null,"abstract":"Scaling contemporary superscalar microarchitectures to higher levels of parallelism in future technologies seems to be impractical due to the increasing complexity. In this paper, we show that a fixed-length block structured instruction set architecture (BSA), is capable of reducing the hardware complexity and is therefore feasible as an alternative architectural paradigm for traditional architectures with large virtual window sizes for future technologies. This is reached through two major interventions. First, statically, grouping instructions from various basic blocks into larger atomic units of work with a fixed length, called blocks, makes fetching easier. Second, a decentralized microarchitecture reduces the processor core logic significantly resulting in higher clock frequencies. The performance evaluation methodology used in this paper both considers IPC (number of useful instructions retired per clock cycle) and clock cycle period. In addition, a broad design space is explored by quantifying the influence of various microarchitectural parameters on overall performance.","PeriodicalId":129890,"journal":{"name":"Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115266435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1