首页 > 最新文献

[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture最新文献

英文 中文
Fast Prolog with an extended general purpose architecture 具有扩展的通用架构的快速Prolog
Bruce K. Holmer, B. Sano, M. Carlton, P. V. Roy, R. Haygood, W. Bush, A. Despain, J. Pendleton, T. Dobry
Most Prolog machines have been based on specialized architectures. The authors' goal is to start with a general-purpose architecture and determine a minimal set of extensions for high-performance Prolog execution. They have developed both the architecture and optimizing compiler simultaneously, drawing on results of previous implementations. They find that most Prolog-specific operations can be done satisfactorily in software; however, there is a crucial set of features that the architecture must support to achieve the best Prolog performance. The emphasis in this study is on the authors' architecture and instruction set. The costs and benefits of the special architectural features and instructions are analyzed. Simulated performance results are presented and indicate a peak compiled Prolog performance of 3.68 million logical inferences per second.<>
大多数Prolog机器都是基于专门的体系结构。作者的目标是从一个通用的体系结构开始,并确定高性能Prolog执行的最小扩展集。他们利用以前实现的结果,同时开发了体系结构和优化编译器。他们发现大多数与prolog相关的操作都可以在软件中令人满意地完成;然而,要实现最佳的Prolog性能,体系结构必须支持一组关键的特性。本研究的重点是作者的架构和指令集。分析了特殊的建筑特征和指令的成本和收益。给出了模拟性能结果,并表明编译后的Prolog性能峰值为每秒368万个逻辑推理
{"title":"Fast Prolog with an extended general purpose architecture","authors":"Bruce K. Holmer, B. Sano, M. Carlton, P. V. Roy, R. Haygood, W. Bush, A. Despain, J. Pendleton, T. Dobry","doi":"10.1145/325164.325154","DOIUrl":"https://doi.org/10.1145/325164.325154","url":null,"abstract":"Most Prolog machines have been based on specialized architectures. The authors' goal is to start with a general-purpose architecture and determine a minimal set of extensions for high-performance Prolog execution. They have developed both the architecture and optimizing compiler simultaneously, drawing on results of previous implementations. They find that most Prolog-specific operations can be done satisfactorily in software; however, there is a crucial set of features that the architecture must support to achieve the best Prolog performance. The emphasis in this study is on the authors' architecture and instruction set. The costs and benefits of the special architectural features and instructions are analyzed. Simulated performance results are presented and indicate a peak compiled Prolog performance of 3.68 million logical inferences per second.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126549216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Monsoon: an explicit token-store architecture Monsoon:一个显式的令牌存储架构
G. Papadopoulos, D. Culler
Data-flow architectures tolerate long unpredictable communication delays and support generation and coordination of parallel activities directly in hardware, instead of assuming that program mapping will cause these issues to disappear. However, the proposed mechanisms are complex and introduce new mapping complications. A greatly simplified approach to data-flow execution, called the explicit token store (ETS) architecture, and its current realization in Monsoon are presented. The essence of dynamic data-flow execution is captured by a simple transition on state bits associated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dynamically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann architecture. Although the addressing capability is restrictive, there is exactly one instruction executed for each action on the data-flow graph. Thus, the machine-originated ETS model provides new understanding of the merits and the real cost of direct execution of data-flow graphs.<>
数据流体系结构容忍长时间的不可预测的通信延迟,并支持直接在硬件中生成和协调并行活动,而不是假设程序映射会导致这些问题消失。然而,所提出的机制是复杂的,并引入了新的映射复杂性。一种大大简化的数据流执行方法,称为显式令牌存储(ETS)架构,以及它在Monsoon中的当前实现。动态数据流执行的本质是通过对与处理器本地存储相关联的状态位的简单转换来捕获的。低级存储管理是由编译器在激活帧中将节点分配到插槽中执行的,而不是在硬件中动态地执行。处理器非常简单,高度流水线化,而且非常通用。它可以看作是一个相当原始的冯·诺伊曼体系结构的推广。尽管寻址功能是有限制的,但是对于数据流图上的每个操作,只执行一条指令。因此,机器生成的ETS模型提供了对直接执行数据流图的优点和实际成本的新理解。
{"title":"Monsoon: an explicit token-store architecture","authors":"G. Papadopoulos, D. Culler","doi":"10.1145/285930.285999","DOIUrl":"https://doi.org/10.1145/285930.285999","url":null,"abstract":"Data-flow architectures tolerate long unpredictable communication delays and support generation and coordination of parallel activities directly in hardware, instead of assuming that program mapping will cause these issues to disappear. However, the proposed mechanisms are complex and introduce new mapping complications. A greatly simplified approach to data-flow execution, called the explicit token store (ETS) architecture, and its current realization in Monsoon are presented. The essence of dynamic data-flow execution is captured by a simple transition on state bits associated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dynamically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann architecture. Although the addressing capability is restrictive, there is exactly one instruction executed for each action on the data-flow graph. Thus, the machine-originated ETS model provides new understanding of the merits and the real cost of direct execution of data-flow graphs.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116666173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Generation and analysis of very long address traces 生成和分析非常长的地址轨迹
A. Borg, R. Kessler, D. W. Wall
Existing methods of generating and analyzing traces suffer from a variety of limitations, including complexity, inaccuracy, short length, inflexibility, or applicability only to CISC (complex-instruction-set-computer) machines. The authors use a trace-generation mechanism based on link-time code modification which is simple to use, generates accurate long traces of multiuser programs, runs on a RISC (reduced-instruction-set-computer) machine, and can be flexibly controlled. Accurate performance data for large second-level caches can be obtained by on-the-fly analysis of the traces. A comparison is made of the performance of systems with 512 K to 16 M second-level caches, and it is show that, for today's large programs, second-level caches of more than 4 MB may be unnecessary. It is also shown that set associativity in second-level caches of more than 1 MB does not significantly improve system performance. In addition, the experiments provide insights into first-level and second-level cache line size.<>
现有的生成和分析轨迹的方法受到各种各样的限制,包括复杂性、不准确性、长度短、不灵活性,或者只适用于CISC(复杂指令集计算机)机器。作者采用了一种基于链路时间代码修改的跟踪生成机制,该机制使用简单,可生成精确的多用户程序长跟踪,可在精简指令集计算机上运行,并且可灵活控制。大型二级缓存的准确性能数据可以通过动态分析轨迹来获得。对具有512 K和16 M二级缓存的系统的性能进行了比较,结果表明,对于今天的大型程序,超过4 MB的二级缓存可能是不必要的。研究还表明,在大于1mb的二级缓存中,集合关联性并不能显著提高系统性能。此外,实验还提供了对一级和二级缓存线大小的见解。
{"title":"Generation and analysis of very long address traces","authors":"A. Borg, R. Kessler, D. W. Wall","doi":"10.1145/325164.325153","DOIUrl":"https://doi.org/10.1145/325164.325153","url":null,"abstract":"Existing methods of generating and analyzing traces suffer from a variety of limitations, including complexity, inaccuracy, short length, inflexibility, or applicability only to CISC (complex-instruction-set-computer) machines. The authors use a trace-generation mechanism based on link-time code modification which is simple to use, generates accurate long traces of multiuser programs, runs on a RISC (reduced-instruction-set-computer) machine, and can be flexibly controlled. Accurate performance data for large second-level caches can be obtained by on-the-fly analysis of the traces. A comparison is made of the performance of systems with 512 K to 16 M second-level caches, and it is show that, for today's large programs, second-level caches of more than 4 MB may be unnecessary. It is also shown that set associativity in second-level caches of more than 1 MB does not significantly improve system performance. In addition, the experiments provide insights into first-level and second-level cache line size.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123426876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 137
A distributed I/O architecture for HARTS 用于hart的分布式I/O体系结构
K. Shin, G. Dykema
The issue of I/O device access in HARTS (Hexagonal Architecture for Real-Time Systems)-a distributed real-time computer system under construction at the University of Michigan-is explicitly addressed. Several candidate solutions are introduced, explored and evaluated according to cost, complexity, reliability, and performance: (1) 'node-direct' distribution with the intranode bus and a local I/O bus; (2) use of dedicated I/O nodes, which are placed in the hexagonal mesh as regular applications nodes, but which provide I/O services rather than computing services; and (3) use of a separate I/O network; which has led to the proposal of an 'interlaced' I/O network. The interlaced I/O network is intended to provide both high performance without burdening node processors with I/O overhead and a high degree of reliability. Both static and dynamic multiownership protocols are developed for managing I/O device access in this I/O network. The relative merits of the two protocols are explored, and the performance and accessibility which each provides are simulated.<>
在hart(实时系统六边形架构)中I/O设备访问的问题——一个在密歇根大学正在建设的分布式实时计算机系统——被明确地解决了。根据成本、复杂性、可靠性和性能,介绍、探索和评估了几种候选解决方案:(1)使用内部节点总线和本地I/O总线的“节点直接”分布;(2)使用专用I/O节点,这些节点作为常规应用程序节点放置在六边形网格中,但提供I/O服务而不是计算服务;(3)使用单独的I/O网络;这导致了“交错”I/O网络的提出。交错I/O网络旨在提供高性能,而不会给节点处理器带来I/O开销和高度可靠性。开发了静态和动态多所有权协议来管理这个I/O网络中的I/O设备访问。探讨了两种协议的相对优点,并对各自提供的性能和可访问性进行了仿真。
{"title":"A distributed I/O architecture for HARTS","authors":"K. Shin, G. Dykema","doi":"10.1145/325164.325159","DOIUrl":"https://doi.org/10.1145/325164.325159","url":null,"abstract":"The issue of I/O device access in HARTS (Hexagonal Architecture for Real-Time Systems)-a distributed real-time computer system under construction at the University of Michigan-is explicitly addressed. Several candidate solutions are introduced, explored and evaluated according to cost, complexity, reliability, and performance: (1) 'node-direct' distribution with the intranode bus and a local I/O bus; (2) use of dedicated I/O nodes, which are placed in the hexagonal mesh as regular applications nodes, but which provide I/O services rather than computing services; and (3) use of a separate I/O network; which has led to the proposal of an 'interlaced' I/O network. The interlaced I/O network is intended to provide both high performance without burdening node processors with I/O overhead and a high degree of reliability. Both static and dynamic multiownership protocols are developed for managing I/O device access in this I/O network. The relative merits of the two protocols are explored, and the performance and accessibility which each provides are simulated.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130712843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multiple instruction issue in the NonStop Cyclone processor 多指令问题在直达旋风处理器
R. Horst, R. L. Harris, Robert L. Jardine
The architecture for issuing multiple instructions per clock in the NonStop Cyclone processor is described. Pairs of instructions are fetched and decoded by a dual two-stage prefetch pipeline and passed to a dual six-stage pipeline for execution. Dynamic branch prediction is used to reduce branch penalties. A unique microcode routine for each pair is stored in the large duplexed control store. The microcode controls parallel data paths optimized for executing the most frequent instruction pairs. Other features of the architecture include cache support for unaligned double-precision accesses, a virtually addressed main memory, and a novel precise exception mechanism.<>
描述了在NonStop Cyclone处理器中每个时钟发出多条指令的体系结构。指令对通过双两阶段预取管道获取和解码,并传递给双六阶段管道执行。动态分支预测用于减少分支惩罚。在大的双工控制存储器中存储每一对的唯一的微代码例程。微码控制并行数据路径,为执行最频繁的指令对而优化。该体系结构的其他特性包括对非对齐双精度访问的缓存支持、虚拟寻址主存和新颖的精确异常机制。
{"title":"Multiple instruction issue in the NonStop Cyclone processor","authors":"R. Horst, R. L. Harris, Robert L. Jardine","doi":"10.1145/325164.325147","DOIUrl":"https://doi.org/10.1145/325164.325147","url":null,"abstract":"The architecture for issuing multiple instructions per clock in the NonStop Cyclone processor is described. Pairs of instructions are fetched and decoded by a dual two-stage prefetch pipeline and passed to a dual six-stage pipeline for execution. Dynamic branch prediction is used to reduce branch penalties. A unique microcode routine for each pair is stored in the large duplexed control store. The microcode controls parallel data paths optimized for executing the most frequent instruction pairs. Other features of the architecture include cache support for unaligned double-precision accesses, a virtually addressed main memory, and a novel precise exception mechanism.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134083732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Reducing the cost of branches by using registers 通过使用寄存器来减少分支的开销
J. Davidson, D. Whalley
In an attempt to reduce the number of operand memory references, many RISC (reduced-instruction-set-computer) machines have 32 or more general-purpose registers (e.g. MIPS, ARM, Spectrum, 88 K). Without special compiler optimizations, such as inlining or interprocedural register allocation, it is rare that a computer will use a majority of these registers for a function. The authors explore the possibility of using some of these registers to hold branch target addresses and the corresponding instruction at each branch target. To evaluate the effectiveness of this scheme, two machines were designed and emulated. One machine had 32 general-purpose registers used for data references, while the other machine had 16 data registers and 16 registers used for branching. The results show that using registers for branching can effectively reduce the cost of transfers of control.<>
为了减少操作数内存引用的数量,许多RISC(精简指令集计算机)机器有32个或更多的通用寄存器(例如MIPS, ARM, Spectrum, 88k).如果没有特殊的编译器优化,例如内联或过程间寄存器分配,计算机很少会为一个函数使用这些寄存器的大部分。作者探讨了使用这些寄存器中的一些来保存分支目标地址和每个分支目标上相应指令的可能性。为了评估该方案的有效性,设计了两台机器并进行了仿真。一台机器有32个用于数据引用的通用寄存器,而另一台机器有16个数据寄存器和16个用于分支的寄存器。结果表明,使用寄存器进行分支可以有效地降低控制转移的成本。
{"title":"Reducing the cost of branches by using registers","authors":"J. Davidson, D. Whalley","doi":"10.1145/325164.325138","DOIUrl":"https://doi.org/10.1145/325164.325138","url":null,"abstract":"In an attempt to reduce the number of operand memory references, many RISC (reduced-instruction-set-computer) machines have 32 or more general-purpose registers (e.g. MIPS, ARM, Spectrum, 88 K). Without special compiler optimizations, such as inlining or interprocedural register allocation, it is rare that a computer will use a majority of these registers for a function. The authors explore the possibility of using some of these registers to hold branch target addresses and the corresponding instruction at each branch target. To evaluate the effectiveness of this scheme, two machines were designed and emulated. One machine had 32 general-purpose registers used for data references, while the other machine had 16 data registers and 16 registers used for branching. The results show that using registers for branching can effectively reduce the cost of transfers of control.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125609027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Synchronization with multiprocessor caches 与多处理器缓存同步
Joonwon Lee, U. Ramachandran
A new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism is presented. With this scheme high-level synchronization primitives, as well as low-level ones, can be implemented without excessive overhead. Cost functions for well-known synchronization methods are derived for invalidation schemes, write update schemes, and the authors' lock-based scheme. To predict the performance implications of the new scheme accurately, a new simulation model embodying a widely accepted paradigm of parallel programming is developed. It is shown that that authors' lock-based protocol outperforms existing cache protocols.<>
提出了一种新的基于锁的缓存方案,该方案将同步与缓存一致性机制相结合。使用这种模式,可以在没有过多开销的情况下实现高级同步原语和低级同步原语。众所周知的同步方法的代价函数是为无效方案、写入更新方案和作者的基于锁的方案导出的。为了准确预测新方案的性能影响,开发了一个新的仿真模型,该模型体现了广泛接受的并行编程范式。结果表明,作者提出的基于锁的缓存协议优于现有的缓存协议
{"title":"Synchronization with multiprocessor caches","authors":"Joonwon Lee, U. Ramachandran","doi":"10.1145/325164.325107","DOIUrl":"https://doi.org/10.1145/325164.325107","url":null,"abstract":"A new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism is presented. With this scheme high-level synchronization primitives, as well as low-level ones, can be implemented without excessive overhead. Cost functions for well-known synchronization methods are derived for invalidation schemes, write update schemes, and the authors' lock-based scheme. To predict the performance implications of the new scheme accurately, a new simulation model embodying a widely accepted paradigm of parallel programming is developed. It is shown that that authors' lock-based protocol outperforms existing cache protocols.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
The K2 parallel processor: architecture and hardware implementation K2并行处理器:体系结构和硬件实现
M. Annaratone, Marco Fillo, K. Nakabayashi, M. Viredaz
K2 is a distributed-memory parallel processor designed to support a multiuser, multitasking, time-sharing operating system and an automatically parallelizing Fortran compiler. The architecture and the hardware implementation of K2 are presented. The authors focus on the architectural features required by the operating system and the compiler. A prototype machine with 24 processors is currently being developed.<>
K2是一个分布式内存并行处理器,旨在支持多用户、多任务、分时操作系统和自动并行的Fortran编译器。介绍了K2的体系结构和硬件实现。作者着重于操作系统和编译器所需的体系结构特性。目前正在研制一台带有24个处理器的样机。
{"title":"The K2 parallel processor: architecture and hardware implementation","authors":"M. Annaratone, Marco Fillo, K. Nakabayashi, M. Viredaz","doi":"10.1145/325164.325118","DOIUrl":"https://doi.org/10.1145/325164.325118","url":null,"abstract":"K2 is a distributed-memory parallel processor designed to support a multiuser, multitasking, time-sharing operating system and an automatically parallelizing Fortran compiler. The architecture and the hardware implementation of K2 are presented. The authors focus on the architectural features required by the operating system and the compiler. A prototype machine with 24 processors is currently being developed.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131086499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Architectural support for the management of tightly-coupled fine-grain goals in Flat Concurrent Prolog 在扁平并发Prolog中对紧耦合细粒度目标管理的体系结构支持
L. Alkalaj, T. Lang, M. Ercegovac
Architectural support is proposed for goal management as part of a special-purpose processor architecture for the efficient execution of Flat Concurrent Prolog. Goal management operations, namely, halt, spawn, suspend, and commit, are decoupled from goal reduction and overlapped in the goal management unit. Their efficient execution is enabled using a goal cache. The authors evaluate the performance of the goal management support using an analytic performance model and program parameters characteristic of the system's development workload. Most goal management operations are completely overlapped, resulting in a speedup of 2. Higher speedups are obtained for workloads that exhibit greater goal management complexity.<>
提出了目标管理的体系结构支持,作为有效执行Flat Concurrent Prolog的专用处理器体系结构的一部分。目标管理操作,即暂停、生成、挂起和提交,与目标缩减分离,并在目标管理单元中重叠。它们的高效执行是使用目标缓存来实现的。作者使用分析性能模型和系统开发工作负载的程序参数特征来评估目标管理支持的性能。大多数目标管理操作是完全重叠的,这导致了2的加速。对于表现出更高目标管理复杂性的工作负载,可以获得更高的速度。
{"title":"Architectural support for the management of tightly-coupled fine-grain goals in Flat Concurrent Prolog","authors":"L. Alkalaj, T. Lang, M. Ercegovac","doi":"10.1145/325164.325155","DOIUrl":"https://doi.org/10.1145/325164.325155","url":null,"abstract":"Architectural support is proposed for goal management as part of a special-purpose processor architecture for the efficient execution of Flat Concurrent Prolog. Goal management operations, namely, halt, spawn, suspend, and commit, are decoupled from goal reduction and overlapped in the goal management unit. Their efficient execution is enabled using a goal cache. The authors evaluate the performance of the goal management support using an analytic performance model and program parameters characteristic of the system's development workload. Most goal management operations are completely overlapped, resulting in a speedup of 2. Higher speedups are obtained for workloads that exhibit greater goal management complexity.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114107237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
VAX vector architecture VAX矢量结构
D. Bhandarkar, Richard Brunner
The VAX architecture has been extended to include an integrated, register-based vector processor. This extension allows both high-end and low-end implementations and can be supported with only small changes by VAX/VMS and VAX/ULTRIX operating systems. The extension is effectively exploited by the new vectorizing capabilities of VAX Fortran. Features of the VAX vector architecture and the design decisions which make it a consistent extension of the VAX architecture are discussed.<>
VAX架构已经扩展到包括一个集成的、基于寄存器的矢量处理器。这个扩展允许高端和低端的实现,可以通过VAX/VMS和VAX/ULTRIX操作系统只做很小的改变来支持。VAX Fortran的新向量化功能有效地利用了该扩展。讨论了VAX矢量体系结构的特点和使其成为VAX体系结构一致扩展的设计决策
{"title":"VAX vector architecture","authors":"D. Bhandarkar, Richard Brunner","doi":"10.1145/325164.325145","DOIUrl":"https://doi.org/10.1145/325164.325145","url":null,"abstract":"The VAX architecture has been extended to include an integrated, register-based vector processor. This extension allows both high-end and low-end implementations and can be supported with only small changes by VAX/VMS and VAX/ULTRIX operating systems. The extension is effectively exploited by the new vectorizing capabilities of VAX Fortran. Features of the VAX vector architecture and the design decisions which make it a consistent extension of the VAX architecture are discussed.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114127130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1