MICRO 22最新文献

英文中文

DOAS: an object oriented architecture supporting secure languages DOAS:支持安全语言的面向对象体系结构

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75409

A. J. Goor, H. Corporaal

Current software engineering practice heavily relies on the reliability of software implementation languages and underlying architectures. However, both the currently used languages, as well as the traditional architectures suffer from a shortage of built-in security. In this paper, an architecture is presented, which is heavily influenced by two properties of secure languages: coercion and exception handling. It is shown that proper design decisions lead to an architecture having a compact data representation, allowing both generic and nongeneric instructions. The architecture is object oriented, and object addressing is under control of the operand stream, with optimalisation possibilities to bypass descriptor inspection.

当前的软件工程实践严重依赖于软件实现语言和底层架构的可靠性。然而，当前使用的语言和传统架构都缺乏内置安全性。本文提出了一个深受安全语言的两个特性影响的体系结构:强制转换和异常处理。结果表明，适当的设计决策导致体系结构具有紧凑的数据表示，允许通用和非通用指令。该体系结构是面向对象的，对象寻址在操作数流的控制下，具有绕过描述符检查的优化可能性。

引用次数: 0

Design and performance measurements of a parallel machine for the unification algorithm 统一算法并行机的设计与性能测量

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75398

F. Sibai, K. Watson, Mi Lu

Unification is known to be the most repeated operation in logic programming and PROLOG interpreters. To speed up the execution of logic programs, the performance of unification must be improved. We propose a parallel unification machine for speeding up the unification algorithm. The machine is simulated at the register transfer level and the simulation results as well as performance comparison with a serial unification coprocessor are presented.

统一是逻辑编程和PROLOG解释器中重复次数最多的操作。为了加快逻辑程序的执行速度，必须提高统一的性能。为了提高统一算法的速度，我们提出了一种并行统一机。在寄存器传输级对其进行了仿真，并给出了仿真结果以及与串行统一协处理器的性能比较。

引用次数: 0

Definition of elementary arithmetic operations by using ACM 用ACM定义初等算术运算

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75414

S. D'Angelo, G. Sechi

引用次数: 2

“Combining” as a compilation technique for VLIW architectures “组合”作为VLIW体系结构的编译技术

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75401

T. Nakatani, K. Ebcioglu

Combining is a local compiler optimization technique that can enhance the performance of global compaction techniques for VLIW machines. Given two adjacent operations of a certain class that are flow (read-after-write) dependent and that cannot be placed in the same micro-instruction, the combining technique can transform the operations so that the modified operations have no dependence. The transformed operations can be executed in the same micro-instruction, thus allowing the total execution time of the program to be reduced. In this paper, combining a pair of flow-dependent operations into a wide instruction word is suggested as an important compilation technique for VLIW architectures. Combining is particularly effective with software pipelining and loop unrolling since combinable operations can come together with a higher probability when these compilation techniques are used. We implemented combining in our parallelizing compiler for the wide instruction word architecture, which is now being built at the IBM T. J. Watson Research Center. It is shown that ten percent speedup is obtained on the Stanford integer benchmarks and other sequential-matured C programs, in comparison to compaction techniques that do not use combining. For a class of inner loops, combining can remove the inter-iteration dependencies completely and can improve performance in the same ratio as the loop is unrolled.

组合是一种局部编译器优化技术，可以提高VLIW机器的全局压缩技术的性能。给定某类中相邻的两个操作依赖于流(读后写)，且不能放在同一微指令中，组合技术可以对操作进行转换，使修改后的操作不依赖。转换后的操作可以在同一微指令中执行，从而减少了程序的总执行时间。本文提出将一对流相关操作组合成一个宽指令字作为VLIW体系结构的重要编译技术。组合对于软件流水线和循环展开特别有效，因为当使用这些编译技术时，可组合操作可以以更高的概率一起出现。我们在宽指令字架构的并行化编译器中实现了组合，该架构目前正在IBM t.j. Watson研究中心构建。结果表明，与不使用组合的压缩技术相比，在斯坦福整数基准测试和其他顺序成熟的C程序上获得了10%的加速。对于一类内部循环，组合可以完全消除迭代间的依赖关系，并且可以在展开循环时以相同的比例提高性能。

{"title":"“Combining” as a compilation technique for VLIW architectures","authors":"T. Nakatani, K. Ebcioglu","doi":"10.1145/75362.75401","DOIUrl":"https://doi.org/10.1145/75362.75401","url":null,"abstract":"Combining is a local compiler optimization technique that can enhance the performance of global compaction techniques for VLIW machines. Given two adjacent operations of a certain class that are flow (read-after-write) dependent and that cannot be placed in the same micro-instruction, the combining technique can transform the operations so that the modified operations have no dependence. The transformed operations can be executed in the same micro-instruction, thus allowing the total execution time of the program to be reduced. In this paper, combining a pair of flow-dependent operations into a wide instruction word is suggested as an important compilation technique for VLIW architectures. Combining is particularly effective with software pipelining and loop unrolling since combinable operations can come together with a higher probability when these compilation techniques are used. We implemented combining in our parallelizing compiler for the wide instruction word architecture, which is now being built at the IBM T. J. Watson Research Center. It is shown that ten percent speedup is obtained on the Stanford integer benchmarks and other sequential-matured C programs, in comparison to compaction techniques that do not use combining. For a class of inner loops, combining can remove the inter-iteration dependencies completely and can improve performance in the same ratio as the loop is unrolled.","PeriodicalId":365456,"journal":{"name":"MICRO 22","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114808962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

A flexible VLSI core for an adaptable architecture 灵活的VLSI核心，可适应的架构

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75423

Hans M. Mulder, P. Stravers

Two major limitations concerning the design of cost-effective application-specific architectures are the recurrent costs of system-software development and hardware implementation, in particular VLSI implementation, for each architecture.The SCalable ARChitecture Experiment (SCARCE) aims to provide a framework for application-specific processor design. The framework allows scaling of functionality, implementation complexity, and performance. The SCARCE framework consists and will consist of: an architecture framework defining the constraints for the design of application-specific architectures; tools for synthesizing architectures from application or application-area; VLSI cell libraries and tools for quick generation of application-specific processors; a system-software platform which can be retargeted quickly to fit the application-specific architecture;This paper concentrates on the micro-architecture framework of SCARCE and outlines the process of generating VLSI processors.

关于设计具有成本效益的特定应用程序架构的两个主要限制是系统软件开发和硬件实现的经常性成本，特别是VLSI实现，每个架构。可扩展架构实验旨在为特定于应用程序的处理器设计提供一个框架。该框架允许扩展功能、实现复杂性和性能。稀缺框架包括并将包括:一个体系结构框架，它定义了特定于应用程序的体系结构的设计约束;从应用程序或应用领域合成体系结构的工具;用于快速生成特定应用处理器的VLSI单元库和工具;这是一个系统软件平台，可以快速地重新定位以适应特定的应用架构;本文重点介绍了rare的微架构框架，并概述了生成VLSI处理器的过程。

引用次数: 5

Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors 前向语义:用于大量流水线处理器的编译器辅助指令获取方法

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75418

P. Chang, Wen-mei W. Hwu

A new instruction fetch method, forward semantic, is offered to enable the deeply pipelined processors to fetch one useful instruction every cycle. Forward semantic is an improved alternative to the delayed branching (with or without squashing), with five major advantages. Fist, no restriction is imposed on the type of instructions filling the branch slots, which allows a large number of slots to be filled. Second, no modification to the offsets and displacements is necessary when an instruction is copied to fill a branch slot, which simplifies the linker implementation. Third, an interrupted program can resume execution with a single program counter, eliminating the need for reloading the instruction pipeline before resuming execution. Fourth, programs compiled with N slots can execute on pipelines requiring K (K ≤ N) slots, which makes family architecture compatibility possible . Lastly, the filling of branch slots is totally transparent to code compaction and software interlocking schemes. These advantages combine to provide an efficient instruction fetch mechanism and to eliminate artificial penalties on branch cost. At the cost of 11% static code expansion, forward semantic achieves an instruction fetch cost of 1.2 cycles for pipelines requiring 10 slots for each taken branch. This level of instruction fetch efficiency has never been achieved before with conventional instruction fetch methods. The branch cost is dictated by the accuracy of the compile-time branch prediction rather than artificial limitations, such as data dependencies, which prevent the slots from being filled. These results are measured from the execution of real UNIX and CAD programs with complex control structures.

提出了一种新的指令获取方法——前向语义，使深度流水线处理器每个周期都能获取一条有用的指令。前向语义是延迟分支(有或没有压缩)的改进替代方案，有五个主要优点。首先，没有对填充分支槽的指令类型施加限制，这允许填充大量槽。其次，当复制一条指令以填充分支槽时，不需要修改偏移量和位移，这简化了链接器的实现。第三，被中断的程序可以用一个程序计数器恢复执行，在恢复执行之前不需要重新加载指令管道。第四，用N个插槽编译的程序可以在需要K (K≤N)个插槽的管道上执行，使得家族架构兼容成为可能。最后，分支槽的填充对代码压缩和软件联锁方案完全透明。这些优点结合起来提供了有效的指令获取机制，并消除了对分支成本的人为惩罚。以11%的静态代码扩展为代价，对于每个分支需要10个槽的管道，前向语义实现了1.2个周期的指令获取成本。这种级别的指令获取效率在以前的常规指令获取方法中从未实现过。分支成本是由编译时分支预测的准确性决定的，而不是由人为限制决定的，比如数据依赖关系，这些限制会阻止槽被填充。这些结果是通过具有复杂控制结构的实际UNIX和CAD程序的执行来测量的。

{"title":"Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors","authors":"P. Chang, Wen-mei W. Hwu","doi":"10.1145/75362.75418","DOIUrl":"https://doi.org/10.1145/75362.75418","url":null,"abstract":"A new instruction fetch method, forward semantic, is offered to enable the deeply pipelined processors to fetch one useful instruction every cycle. Forward semantic is an improved alternative to the delayed branching (with or without squashing), with five major advantages. Fist, no restriction is imposed on the type of instructions filling the branch slots, which allows a large number of slots to be filled. Second, no modification to the offsets and displacements is necessary when an instruction is copied to fill a branch slot, which simplifies the linker implementation. Third, an interrupted program can resume execution with a single program counter, eliminating the need for reloading the instruction pipeline before resuming execution. Fourth, programs compiled with N slots can execute on pipelines requiring K (K ≤ N) slots, which makes family architecture compatibility possible . Lastly, the filling of branch slots is totally transparent to code compaction and software interlocking schemes. These advantages combine to provide an efficient instruction fetch mechanism and to eliminate artificial penalties on branch cost. At the cost of 11% static code expansion, forward semantic achieves an instruction fetch cost of 1.2 cycles for pipelines requiring 10 slots for each taken branch. This level of instruction fetch efficiency has never been achieved before with conventional instruction fetch methods. The branch cost is dictated by the accuracy of the compile-time branch prediction rather than artificial limitations, such as data dependencies, which prevent the slots from being filled. These results are measured from the execution of real UNIX and CAD programs with complex control structures.","PeriodicalId":365456,"journal":{"name":"MICRO 22","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125708314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

On optimal loop parallelization 关于最优循环并行化

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75411

F. Gasperoni, U. Schwiegelshohn, K. Ebcioglu

The problem of automatic loop parallelization has received a lot of attention in the area of parallelizing compilers. Automatic loop parallelization can be achieved by several algorithms. In this paper we address the problem of time optimal parallelization of loops with conditional jumps. We prove that even for machines with unlimited resources there are simple loops for which no semantically and algorithmically equivalent time optimal program exists.

自动循环并行化问题在并行化编译器领域受到了广泛关注。自动循环并行化可以通过几种算法来实现。本文研究了具有条件跳跃的循环的时间最优并行化问题。我们证明了即使对于具有无限资源的机器，也存在一些简单的循环，这些循环不存在语义上和算法上等效的时间最优程序。

引用次数: 4

On reordering instruction streams for pipelined computers 关于流水线计算机指令流的重新排序

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75419

J. Shieh, C. Papachristou

This paper describes a method to reorder the straight line instruction streams for pipelined computers which have one instruction issue unit but may contain multiple function units. The objective is to make the most efficient usage of the pipelines within the computer system. The input to the scheduler is the intermediate code of a compiler, and is represented by a data dependence graph (DDG).The scheduler is a kind of list scheduler. The data dependence and the pipeline effect of the function units within the system have been considered for finding a most suitable time slot for each node during reordering time.The scheduler has been implemented and several scientific application programs have been tested. The results show that in most of the cases the scheduler will achieve the optimal result. The average instruction issue rate is over 96%. As a comparison, the issue rate of an ordinary compiler is only 22%; and the issue rate of a compiler with the effect of pipeline but without reordering the instruction stream is about 45%.

本文描述了一种对具有一个指令发布单元但可能包含多个功能单元的流水线计算机的直线指令流进行重新排序的方法。目标是最有效地利用计算机系统中的管道。调度器的输入是编译器的中间代码，由数据依赖图(DDG)表示。调度器是一种列表调度器。考虑了系统内各功能单元的数据依赖性和流水线效应，在重新排序时间内为各节点寻找最合适的时隙。调度程序已经实现，并进行了几个科学应用程序的测试。结果表明，在大多数情况下，调度程序都能获得最优的结果。平均指令发放率在96%以上。相比之下，普通编译器的发行率仅为22%;具有流水线效果但不重新排序指令流的编译器的发放率约为45%。

引用次数: 12

Incremental foresighted local compaction 增量预见局部压缩

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75415

Pantung Wijaya, V. Allan

Under timing constraints, local compaction may fail because of poor scheduling decisions. Su [SDWX87] uses foresight to avoid some of the poor scheduling decisions. However, the foresight takes a considerable amount of time. In this paper the Incremental Foresight algorithm is introduced. Experiments using four different target architectures show that the Incremental Foresight algorithm works as well as foresight, and saves around 48 percent of the excess time.

在时间约束下，由于调度决策不佳，局部压缩可能会失败。Su [SDWX87]使用预见性来避免一些糟糕的调度决策。然而，这种预见需要相当多的时间。本文介绍了增量预见算法。使用四种不同目标架构的实验表明，增量预测算法的预测效果良好，并节省了大约48%的多余时间。

引用次数: 8

MIES: a microarchitecture design tool 密斯:一个微架构设计工具

MICRO 22

Pub Date : 1989-08-01 DOI: 10.1145/75362.75422

J. Nestor, B. Soudan, Z. Mayet

This paper describes MIES, a design tool for the modeling, visualization, and analysis of VLSI microarchitectures. MIES combines a graphical data path model and symbolic control model and provides a number of user interfaces which allow these models to be created, simulated, and evaluated.

本文介绍了一种用于VLSI微架构建模、可视化和分析的设计工具MIES。MIES结合了图形数据路径模型和符号控制模型，并提供了许多允许创建、模拟和评估这些模型的用户界面。

引用次数: 11

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

MICRO 22

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀