首页 > 最新文献

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture最新文献

英文 中文
Data prefetching in multiprocessor vector cache memories 多处理器矢量高速缓存存储器中的数据预取
John W. C. Fu, J. Patel
This paper reports the cache performance of a set of vectorized numerical program from the Perfect Club benchmarks. Using a low cost trace driven simularion technique we show how a non-prefetching vector cache can result in unpredictable performance and how rhis unpredictability makes it difficult to find a good block size. We describe two simple prefetch schemes to reduce the influence of long stride vector accesses and misses due IO block invalidations in mulliprocessor vector caches. These two schemes are shown to have better performance than a non-prefetching cache.
本文从完美俱乐部的基准测试中报告了一组矢量化数值程序的缓存性能。使用低成本的跟踪驱动模拟技术,我们展示了非预取矢量缓存如何导致不可预测的性能,以及这种不可预测性如何使难以找到良好的块大小。我们描述了两种简单的预取方案,以减少在多处理器矢量缓存中由于IO块失效而导致的长步幅矢量访问和丢失的影响。这两种方案比非预取缓存具有更好的性能。
{"title":"Data prefetching in multiprocessor vector cache memories","authors":"John W. C. Fu, J. Patel","doi":"10.1145/115952.115959","DOIUrl":"https://doi.org/10.1145/115952.115959","url":null,"abstract":"This paper reports the cache performance of a set of vectorized numerical program from the Perfect Club benchmarks. Using a low cost trace driven simularion technique we show how a non-prefetching vector cache can result in unpredictable performance and how rhis unpredictability makes it difficult to find a good block size. We describe two simple prefetch schemes to reduce the influence of long stride vector accesses and misses due IO block invalidations in mulliprocessor vector caches. These two schemes are shown to have better performance than a non-prefetching cache.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128820537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 171
Classification and performance evaluation of instruction buffering techniques 指令缓冲技术的分类与性能评价
L. John, P. T. Hulina, L. D. Coraor, Dhamir N. Mannai
The speed disparity between processor and memory subsystenis has been bridged in many existing large- scale scientific computers and microproc.essors with the help of instruction burners or instruction caches. In this pa.per we 'classify t.hese bulrers into traditional in- struction buffers, conventional inst,ruct.ion caches and prefetch queues, det.ail their prominent. features, and evaluat.e.the percormanre of buffers in srveral cxisting -systems, using trace driven siinulat,ion. We compare ihse srhemes wit,ti a recentl) pro1iose"d queue-based in- sihction cache nieniory. An implementation indrprn- dent. perforrnaiice metric is proposed for thi. various or- ganizations and used for the evaluat.ions. M'r analyze the simulation results and discuss the eIfec.1 of various paralneters RIIC~ as prefetch threshold, bus width and buffer size on performance.
在现有的许多大型科学计算机和微处理器中,处理器子系统和存储子系统之间的速度差距已经通过指令刻录器或指令缓存来消除。在这个pa。我们将这些缓冲器分为传统的结构内缓冲器、传统的结构内缓冲器和结构内缓冲器。对于缓存和预取队列,详细说明它们的突出。使用跟踪驱动模拟,分析了几种现有系统中缓冲器的特性和性能。我们将这些方案与最近提出的基于队列的实时缓存环境进行了比较。一种执行指令。为此提出了性能度量。各种组织和用于评价。我们将对仿真结果进行分析,并对其进行讨论。1 .各种参数RIIC~作为预取阈值,总线宽度和缓冲区大小对性能的影响。
{"title":"Classification and performance evaluation of instruction buffering techniques","authors":"L. John, P. T. Hulina, L. D. Coraor, Dhamir N. Mannai","doi":"10.1145/115952.115968","DOIUrl":"https://doi.org/10.1145/115952.115968","url":null,"abstract":"The speed disparity between processor and memory subsystenis has been bridged in many existing large- scale scientific computers and microproc.essors with the help of instruction burners or instruction caches. In this pa.per we 'classify t.hese bulrers into traditional in- struction buffers, conventional inst,ruct.ion caches and prefetch queues, det.ail their prominent. features, and evaluat.e.the percormanre of buffers in srveral cxisting -systems, using trace driven siinulat,ion. We compare ihse srhemes wit,ti a recentl) pro1iose\"d queue-based in- sihction cache nieniory. An implementation indrprn- dent. perforrnaiice metric is proposed for thi. various or- ganizations and used for the evaluat.ions. M'r analyze the simulation results and discuss the eIfec.1 of various paralneters RIIC~ as prefetch threshold, bus width and buffer size on performance.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116992469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks 使用PERFECT俱乐部基准的CRAY Y-MP处理器的实证研究
S. Vajapeyam, G. Sohi, W. Hsu
Characterization of machines, by studying pro~am usage of their architectural and organizational features, IS art essential ~art of the desi~n recess. ln this aper we re ort Y EL an empimcal study of a smg e processor of t e CRAY Y- P, using as benchmarks long-running scientific applications from the PERFECT Club benchmark set. Since the compiler plays a major mle in determining machine utilization and program execution speed, we compile our benchmarks usin the state-of-the-art Cray Research production FORTRA” compiler. We investigate instruction set usage, operation execution counts, sizes of basic blocks in the prorams, and instruction issue rate. We observe, among other 3“ mgs, that the vectorized fraction of the dynamic rogram % operation count ranges from 4% to %% for our bent marks, Instructions that move values between the scalar registers and corresponding backup registers form a si nificant fraction of the dynamic instruction count. Basic %locks which are more than a hundred instructions in size are significant in number; both small and large basic blocks are important from the point of view of pro ram performance. The E
表征机器,通过研究他们的建筑和组织的特点,是艺术必不可少的艺术的设计,在休会。在本文中,我们报告了对CRAY Y- P的smg处理器的实证研究,使用来自PERFECT Club基准集的长期运行的科学应用程序作为基准。由于编译器在确定机器利用率和程序执行速度方面起着重要作用,因此我们使用最先进的Cray Research产品FORTRA编译器编译基准测试。我们研究指令集的使用、操作执行次数、程序中基本块的大小和指令发放率。我们观察到,在其他3“mgs中,动态程序%操作计数的矢量化分数范围从4%到%%,对于我们的弯曲标记,在标量寄存器和相应的备份寄存器之间移动值的指令构成了动态指令计数的重要部分。超过一百条指令的基本锁在数量上是显著的;从程序性能的角度来看,大小基本块都很重要。E
{"title":"An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks","authors":"S. Vajapeyam, G. Sohi, W. Hsu","doi":"10.1145/115952.115970","DOIUrl":"https://doi.org/10.1145/115952.115970","url":null,"abstract":"Characterization of machines, by studying pro~am usage of their architectural and organizational features, IS art essential ~art of the desi~n recess. ln this aper we re ort Y EL an empimcal study of a smg e processor of t e CRAY Y- P, using as benchmarks long-running scientific applications from the PERFECT Club benchmark set. Since the compiler plays a major mle in determining machine utilization and program execution speed, we compile our benchmarks usin the state-of-the-art Cray Research production FORTRA” compiler. We investigate instruction set usage, operation execution counts, sizes of basic blocks in the prorams, and instruction issue rate. We observe, among other 3“ mgs, that the vectorized fraction of the dynamic rogram % operation count ranges from 4% to %% for our bent marks, Instructions that move values between the scalar registers and corresponding backup registers form a si nificant fraction of the dynamic instruction count. Basic %locks which are more than a hundred instructions in size are significant in number; both small and large basic blocks are important from the point of view of pro ram performance. The E","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116893766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Instruction level profiling and evaluation of the IBM RS/6000 IBM RS/6000的指令级分析与评估
Chriss Stephens, B. Cogswell, J. Heinlein, Gregory Palmer, John Paul Shen
This paper reports preliminary results from using goblin, a new instruction level profiling system, to evaluate the IBM RISC System/6000 architecture. The evaluation presented is based on the SPEC benchmark suite. Each SPEC program (except gcc) is processed by goblin to produce an instrumented version. During execution of the instrumented program, profiling routines are invoked which trace the execution of the program. These routines also collect statistics on dynamic instruction mix, branching behavior, and resource utilization: Based on these statistics, the actual performance and the architectural efficiency of the RS/SOOO are evaluated. In order to provide a context for this evaluation, a comparison to the DECStation 3100 is also presented. The entire profiling and evaluation experiment on nine of the ten SPEC programs involves tracing and analyzing over 32 billion instructions on the RS/6000. The evaluation indicates that for the SPEC benchmark suite the architecture of the RS/6000 is well balanced and exhibits impressive performance, especially on the floating-point intensive applications.
本文报告了使用新的指令级分析系统goblin对IBM RISC system /6000体系结构进行评估的初步结果。给出的评估是基于SPEC基准套件的。每个SPEC程序(gcc除外)都由goblin处理以生成一个工具化的版本。在执行检测程序期间,将调用跟踪程序执行的分析例程。这些例程还收集有关动态指令组合、分支行为和资源利用的统计信息:基于这些统计信息,评估RS/SOOO的实际性能和体系结构效率。为了提供这一评价的背景,还提出了与DECStation 3100的比较。对十个SPEC程序中的九个进行的整个分析和评估实验涉及对RS/6000上超过320亿个指令的跟踪和分析。评估表明,对于SPEC基准套件,RS/6000的架构很好地平衡,并表现出令人印象深刻的性能,特别是在浮点密集型应用程序上。
{"title":"Instruction level profiling and evaluation of the IBM RS/6000","authors":"Chriss Stephens, B. Cogswell, J. Heinlein, Gregory Palmer, John Paul Shen","doi":"10.1145/115952.115971","DOIUrl":"https://doi.org/10.1145/115952.115971","url":null,"abstract":"This paper reports preliminary results from using goblin, a new instruction level profiling system, to evaluate the IBM RISC System/6000 architecture. The evaluation presented is based on the SPEC benchmark suite. Each SPEC program (except gcc) is processed by goblin to produce an instrumented version. During execution of the instrumented program, profiling routines are invoked which trace the execution of the program. These routines also collect statistics on dynamic instruction mix, branching behavior, and resource utilization: Based on these statistics, the actual performance and the architectural efficiency of the RS/SOOO are evaluated. In order to provide a context for this evaluation, a comparison to the DECStation 3100 is also presented. The entire profiling and evaluation experiment on nine of the ten SPEC programs involves tracing and analyzing over 32 billion instructions on the RS/6000. The evaluation indicates that for the SPEC benchmark suite the architecture of the RS/6000 is well balanced and exhibits impressive performance, especially on the floating-point intensive applications.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122281450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Evaluation of memory system extensions 内存系统扩展的评估
Kai Li, K. Petersen
A traditional memory system for a uniprocessor consists of one or two levels of cache, a main memory and a backing store. One can extend such a memory sys tem by adding inexpensive but slower memories into the memory hierarchy. This paper uses an experimental approach to evaluate two methods of extending a memory system: direct and caching. The direct method adds the slower memory into the memory hierarchy by putting it at the same level as the main memory, allowing the CPU to access the slower memories directly; whereas the caching method puts the slower memory between the main memory and the backing store, using the main memory as a cache for the slower memory. We have implemented both approaches and our experiments indicate that applications with very large data structures can benefit significantly using an extended memory system, and that the direct approach outperforms the caching approach in memory-bound applications.
单处理器的传统存储系统由一级或二级缓存、主存储器和后备存储器组成。可以通过向内存层次结构中添加便宜但速度较慢的内存来扩展这样的内存系统。本文采用实验的方法来评估两种扩展存储系统的方法:直接和缓存。直接方法将较慢的内存添加到内存层次结构中,将其置于与主内存相同的级别,允许CPU直接访问较慢的内存;而缓存方法将较慢的内存放在主内存和后备存储之间,使用主内存作为较慢内存的缓存。我们已经实现了这两种方法,我们的实验表明,使用扩展内存系统,具有非常大数据结构的应用程序可以显著受益,并且在内存受限的应用程序中,直接方法优于缓存方法。
{"title":"Evaluation of memory system extensions","authors":"Kai Li, K. Petersen","doi":"10.1109/ISCA.1991.1021602","DOIUrl":"https://doi.org/10.1109/ISCA.1991.1021602","url":null,"abstract":"A traditional memory system for a uniprocessor consists of one or two levels of cache, a main memory and a backing store. One can extend such a memory sys tem by adding inexpensive but slower memories into the memory hierarchy. This paper uses an experimental approach to evaluate two methods of extending a memory system: direct and caching. The direct method adds the slower memory into the memory hierarchy by putting it at the same level as the main memory, allowing the CPU to access the slower memories directly; whereas the caching method puts the slower memory between the main memory and the backing store, using the main memory as a cache for the slower memory. We have implemented both approaches and our experiments indicate that applications with very large data structures can benefit significantly using an extended memory system, and that the direct approach outperforms the caching approach in memory-bound applications.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129553646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Strategies for achieving improved processor throughput 实现改进处理器吞吐量的策略
M. Farrens, A. Pleszkun
Deeply pipelined processors have relatively low issue rates due to dependencies between instructions. In this paper we examine the possibility of interleaving a second stream of instructions into the pipeline, which would issue instructions during the cycles the first stream was unable to. Such an interleaving has the potential to significantly increase the throughput of a processor without seriously imparing the execution of either process. We propose a dynamic interleaving of at most 2 instructions streams, which share the the pipelined functional units of a machine. To support the interleaving of 2 instruction streams a number of interleaving policies are. described and discused. Finally, the amount of improvement in processor throughput is evaluated by simulating the interleaving policies for several machine varianv;.
由于指令之间的依赖关系,深度流水线处理器的问题率相对较低。在本文中,我们研究了在管道中插入第二指令流的可能性,这将在第一个指令流无法发出指令的周期内发出指令。这样的交错有可能显著增加处理器的吞吐量,而不会严重影响任何一个进程的执行。我们提出了至多2个指令流的动态交错,它们共享机器的流水线功能单元。为了支持2个指令流的交错,有许多交错策略。描述和讨论。最后,通过模拟几种机器变量的交错策略来评估处理器吞吐量的改善程度。
{"title":"Strategies for achieving improved processor throughput","authors":"M. Farrens, A. Pleszkun","doi":"10.1145/115952.115988","DOIUrl":"https://doi.org/10.1145/115952.115988","url":null,"abstract":"Deeply pipelined processors have relatively low issue rates due to dependencies between instructions. In this paper we examine the possibility of interleaving a second stream of instructions into the pipeline, which would issue instructions during the cycles the first stream was unable to. Such an interleaving has the potential to significantly increase the throughput of a processor without seriously imparing the execution of either process. We propose a dynamic interleaving of at most 2 instructions streams, which share the the pipelined functional units of a machine. To support the interleaving of 2 instruction streams a number of interleaving policies are. described and discused. Finally, the amount of improvement in processor throughput is evaluated by simulating the interleaving policies for several machine varianv;.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123527217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Reducing memory contention in shared memory multiprocessors 减少共享内存多处理器中的内存争用
D. Harper
Reducing Memory Contention in Multiprocessors~ D. T. Harper III Shared Memory Department of Electrical Engineering The University of Texas at Dallas P.O. BOX 830688, NIP 33 Richardson, Texas 75083-0688 USA (214) 690-2893
D. T. Harper III .共享内存中的内存争用问题[j] .德克萨斯大学达拉斯分校电子工程系,75083-0688 (214):690-2893
{"title":"Reducing memory contention in shared memory multiprocessors","authors":"D. Harper","doi":"10.1145/115952.115960","DOIUrl":"https://doi.org/10.1145/115952.115960","url":null,"abstract":"Reducing Memory Contention in Multiprocessors~ D. T. Harper III Shared Memory Department of Electrical Engineering The University of Texas at Dallas P.O. BOX 830688, NIP 33 Richardson, Texas 75083-0688 USA (214) 690-2893","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126131110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
IXM2: a parallel associative processor IXM2:一个并行关联处理器
T. Higuchi, T. Furuya, Ken'ichi Handa, Naoto Takahashi, H. Nishiyama, A. Kokubu
This paper describes a parallel associative processor, lXM2, developed mainly for semantic network processing. IXM2 consists of 64 associative processors and 9 network processors, having a total of 256K words of associative memory. The large associative memory enables 65,536 semantic network nodes to be processed in parallel and reduces the order of algorithmic complexity to O( 1) in,basic semantic net operations. It is shown that IXM2 has computing power comparable to that of a Connection Machine. Programming for lXM2 is performed with the knowledge representation language IXL, a superset of Prolog, so that IXM2 can be utilized as a back-end to AI workstations.
本文介绍了一种主要为语义网络处理而开发的并行关联处理器lXM2。IXM2由64个关联处理器和9个网络处理器组成,共有256K单词的关联内存。庞大的联想内存使65,536个语义网络节点能够并行处理,并将基本语义网络运算的算法复杂度降至0(1)。结果表明,IXM2具有与连接机相当的计算能力。lXM2的编程是用知识表示语言IXL (Prolog的超集)执行的,因此IXM2可以用作AI工作站的后端。
{"title":"IXM2: a parallel associative processor","authors":"T. Higuchi, T. Furuya, Ken'ichi Handa, Naoto Takahashi, H. Nishiyama, A. Kokubu","doi":"10.1145/115952.115956","DOIUrl":"https://doi.org/10.1145/115952.115956","url":null,"abstract":"This paper describes a parallel associative processor, lXM2, developed mainly for semantic network processing. IXM2 consists of 64 associative processors and 9 network processors, having a total of 256K words of associative memory. The large associative memory enables 65,536 semantic network nodes to be processed in parallel and reduces the order of algorithmic complexity to O( 1) in,basic semantic net operations. It is shown that IXM2 has computing power comparable to that of a Connection Machine. Programming for lXM2 is performed with the knowledge representation language IXL, a superset of Prolog, so that IXM2 can be utilized as a back-end to AI workstations.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130742590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Virtualizing the VAX architecture 虚拟化VAX架构
J. S. Hall, Paul T. Robinson
This paper describes modifications to the VAX architecture to support virtual machines. The VAX architecture contains several instructions that are sensitive but not privileged. It is also the first architecture with more than two protection rings to support virtual machines. A technique for mapping four virtual rings onto three physical rings, employing both software and microcode, is described. Differences between the modified and standard VAX architectures are presented. along with a description of the virtual VAX computer.
本文描述了对VAX体系结构的修改以支持虚拟机。VAX架构包含几个敏感但没有特权的指令。它也是第一个具有两个以上保护环来支持虚拟机的体系结构。描述了一种将四个虚拟环映射到三个物理环上的技术,采用软件和微码。介绍了改进后的VAX体系结构与标准VAX体系结构之间的差异。以及虚拟VAX计算机的描述。
{"title":"Virtualizing the VAX architecture","authors":"J. S. Hall, Paul T. Robinson","doi":"10.1145/115952.115990","DOIUrl":"https://doi.org/10.1145/115952.115990","url":null,"abstract":"This paper describes modifications to the VAX architecture to support virtual machines. The VAX architecture contains several instructions that are sensitive but not privileged. It is also the first architecture with more than two protection rings to support virtual machines. A technique for mapping four virtual rings onto three physical rings, employing both software and microcode, is described. Differences between the modified and standard VAX architectures are presented. along with a description of the virtual VAX computer.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126860259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Multithreading: a revisionist view of dataflow architectures 多线程:数据流架构的修正主义观点
G. Papadopoulos, K. R. Traub
Although they are powerful intermediate representations for compilers, pure dataflow graphs are incomplete, and perhaps even undesirable, machine languages. They are incomplete because it is hard to encode critical sections and imperative operations which are essential for the efficient execution of operating system functions, such as resource management. They may be undesirable because they imply a uniform dynamic scheduling policy for all instructions, preventing a compiler from expressing a static schedule which could result in greater run time efficiency, both by reducing redundant operand synchronization, and by using high speed registers to communicate state between instructions. In this paper, we develop a new machine-level programming model which builds upon two previous improvements to the dataflow execution model: sequential scheduling of instructions, and multiported registers for expression temporaries. Surprisingly, these improvements have required almost no architectural changes to explicit token store (ETS) dataflow hardware, only a shift in mindset when reasoning about how that hardware works. Rather than viewing computational progress as the consumption of tokens and the firing of enabled instructions, we instead reason about the evolution of multiple, interacting sequential threads, where forking and joining are extremely efficient. Because this new paradigm has proven so valuable in coding resource management operations and in improving code efficiency, it is now the cornerstone of the Monsoon instruction set architecture and macro assembly language. In retrospect, this suggests that there is a continuum of multithreaded architectures, with pure ETS dataflow and single threaded von Neumann at the extrema. We use this new perspective to better understand the relative strengths and weaknesses of the Monsoon implement ation.
虽然它们是编译器强大的中间表示,但纯数据流图是不完整的,甚至可能是不受欢迎的机器语言。它们是不完整的,因为很难对关键区和命令式操作进行编码,而这些操作对于有效执行操作系统功能(如资源管理)至关重要。它们可能是不受欢迎的,因为它们意味着对所有指令使用统一的动态调度策略,从而阻止编译器表达静态调度,而静态调度可以通过减少冗余操作数同步和使用高速寄存器在指令之间通信来提高运行时效率。在本文中,我们开发了一个新的机器级编程模型,该模型建立在先前对数据流执行模型的两个改进之上:指令的顺序调度和表达式临时的多端口寄存器。令人惊讶的是,这些改进几乎不需要对显式令牌存储(ETS)数据流硬件进行架构更改,只需要在推理硬件如何工作时改变思维方式。我们不是将计算过程视为令牌的消耗和启用指令的触发,而是考虑多个相互作用的顺序线程的演变,其中分叉和连接非常高效。因为这个新范例在编码资源管理操作和提高代码效率方面被证明是非常有价值的,所以它现在是Monsoon指令集架构和宏汇编语言的基石。回想起来,这表明存在一个连续的多线程架构,在极端情况下,纯ETS数据流和单线程冯·诺伊曼。我们使用这个新的视角来更好地理解季风实施的相对优势和劣势。
{"title":"Multithreading: a revisionist view of dataflow architectures","authors":"G. Papadopoulos, K. R. Traub","doi":"10.1145/115953.115986","DOIUrl":"https://doi.org/10.1145/115953.115986","url":null,"abstract":"Although they are powerful intermediate representations for compilers, pure dataflow graphs are incomplete, and perhaps even undesirable, machine languages. They are incomplete because it is hard to encode critical sections and imperative operations which are essential for the efficient execution of operating system functions, such as resource management. They may be undesirable because they imply a uniform dynamic scheduling policy for all instructions, preventing a compiler from expressing a static schedule which could result in greater run time efficiency, both by reducing redundant operand synchronization, and by using high speed registers to communicate state between instructions. In this paper, we develop a new machine-level programming model which builds upon two previous improvements to the dataflow execution model: sequential scheduling of instructions, and multiported registers for expression temporaries. Surprisingly, these improvements have required almost no architectural changes to explicit token store (ETS) dataflow hardware, only a shift in mindset when reasoning about how that hardware works. Rather than viewing computational progress as the consumption of tokens and the firing of enabled instructions, we instead reason about the evolution of multiple, interacting sequential threads, where forking and joining are extremely efficient. Because this new paradigm has proven so valuable in coding resource management operations and in improving code efficiency, it is now the cornerstone of the Monsoon instruction set architecture and macro assembly language. In retrospect, this suggests that there is a continuum of multithreaded architectures, with pure ETS dataflow and single threaded von Neumann at the extrema. We use this new perspective to better understand the relative strengths and weaknesses of the Monsoon implement ation.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126589183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
期刊
[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1