首页 > 最新文献

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing最新文献

英文 中文
Approximation algorithms for time constrained scheduling 时间约束调度的近似算法
K. Jansen, Sabine R. Öhring
In this paper we consider the following time constrained scheduling problem. Given a set of jobs J with execution times e(j)/spl isin/(0, .<>
本文考虑以下的时间约束调度问题。给定一组作业J,其执行时间为e(J)/spl isin/(0, .>)
{"title":"Approximation algorithms for time constrained scheduling","authors":"K. Jansen, Sabine R. Öhring","doi":"10.1109/ICAPP.1995.472254","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472254","url":null,"abstract":"In this paper we consider the following time constrained scheduling problem. Given a set of jobs J with execution times e(j)/spl isin/(0, .<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116598723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
A general definition of deadlocks for distributed systems 分布式系统死锁的一般定义
J. Brzeziński, J. Hélary, M. Raynal
This paper is about the definition of deadlocks in asynchronous messages communication systems. The considered system model covers unspecified receptions, not FIFO channels, and general resource (message) requests including, among others, AND, OR, AND-OR and k-out-of-n requests.<>
本文讨论了异步消息通信系统中死锁的定义。所考虑的系统模型包括未指定的接收,而不是FIFO通道,以及一般资源(消息)请求,其中包括and, OR, and -OR和k- of-n请求。
{"title":"A general definition of deadlocks for distributed systems","authors":"J. Brzeziński, J. Hélary, M. Raynal","doi":"10.1109/ICAPP.1995.472201","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472201","url":null,"abstract":"This paper is about the definition of deadlocks in asynchronous messages communication systems. The considered system model covers unspecified receptions, not FIFO channels, and general resource (message) requests including, among others, AND, OR, AND-OR and k-out-of-n requests.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131452892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and efficient distributed simulation of multistage interconnection networks 多级互联网络的建模与高效分布式仿真
Y. M. Teo, S. Tay
Multistage interconnection networks are used in a number of application areas such as parallel computers and high-speed communication systems. As the performance of these systems lies on an efficient design of the interconnection network, a thorough analysis of the network's performance is important. Mathematical analysis so far provides inadequate results and simulation analysis using a uniprocessor usually requires extremely long run time to evaluate large networks. This paper addresses the use of parallel simulation techniques to speedup the simulation of multistage interconnection networks. The conventional null-message approach for resolving deadlock problem in conservative simulation may cause livelock if lookahead is not guaranteed. We propose a deadlock/livelock free scheme using null messages, but without the guaranteed lookahead, to coordinate the simulation, and different partitioning techniques for mapping of the simulation program onto multicomputers. A flushing mechanism is also used to resolve the null-message explosion problem. Our analysis shows that the proposed flushing mechanism effectively reduces the number of null messages from exponential to linear.<>
多级互连网络在并行计算机和高速通信系统等许多应用领域都有应用。由于这些系统的性能取决于互连网络的有效设计,因此对网络性能的全面分析非常重要。到目前为止,数学分析提供的结果并不充分,使用单处理器进行仿真分析通常需要非常长的运行时间来评估大型网络。本文讨论了使用并行仿真技术来加速多级互连网络的仿真。在保守模拟中,传统的空消息解决死锁问题的方法在不保证前瞻性的情况下可能导致活锁。我们提出了一种无死锁/活锁的方案,使用空消息,但没有保证的前瞻性,来协调模拟,以及将模拟程序映射到多台计算机的不同分区技术。还使用了刷新机制来解决空消息爆炸问题。我们的分析表明,提出的刷新机制有效地将空消息的数量从指数减少到线性。
{"title":"Modeling and efficient distributed simulation of multistage interconnection networks","authors":"Y. M. Teo, S. Tay","doi":"10.1109/ICAPP.1995.472173","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472173","url":null,"abstract":"Multistage interconnection networks are used in a number of application areas such as parallel computers and high-speed communication systems. As the performance of these systems lies on an efficient design of the interconnection network, a thorough analysis of the network's performance is important. Mathematical analysis so far provides inadequate results and simulation analysis using a uniprocessor usually requires extremely long run time to evaluate large networks. This paper addresses the use of parallel simulation techniques to speedup the simulation of multistage interconnection networks. The conventional null-message approach for resolving deadlock problem in conservative simulation may cause livelock if lookahead is not guaranteed. We propose a deadlock/livelock free scheme using null messages, but without the guaranteed lookahead, to coordinate the simulation, and different partitioning techniques for mapping of the simulation program onto multicomputers. A flushing mechanism is also used to resolve the null-message explosion problem. Our analysis shows that the proposed flushing mechanism effectively reduces the number of null messages from exponential to linear.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127462605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Parsim-message PAssing computeR SIMulator 帕西姆消息传递计算机模拟器
A. Symons, V. Narasimhan
The task of predicting the performance of an application on a new platform given its performance on a particular computer system is fairly difficult. Significant degree of exhaustive modeling of both algorithms and architectures is required before a reasonable prediction can be attempted. This paper describes a general purpose simulator for message passing multiprocessors (Parsim), which facilitates system modeling. Besides monitoring a number of system parameters, Parsim permits easy reconfiguration of topology and algorithm mappings. Parsim has been used to predict the performance of a number of different algorithms such as Fast Fourier Transform, Livermore loops and integer programming, on a variety of topologies such as transputer hypercubes, transputer meshes and a cluster of workstations on an ethernet backbone.<>
给定应用程序在特定计算机系统上的性能,预测其在新平台上的性能是相当困难的。在尝试合理的预测之前,需要对算法和体系结构进行相当程度的详尽建模。本文描述了一个通用的消息传递多处理器(Parsim)仿真器,便于系统建模。除了监视许多系统参数外,Parsim还允许轻松地重新配置拓扑和算法映射。Parsim已被用于预测许多不同算法的性能,如快速傅立叶变换、利弗莫尔循环和整数规划,以及各种拓扑结构,如超立方体、网格和以太网主干上的工作站集群。
{"title":"Parsim-message PAssing computeR SIMulator","authors":"A. Symons, V. Narasimhan","doi":"10.1109/ICAPP.1995.472249","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472249","url":null,"abstract":"The task of predicting the performance of an application on a new platform given its performance on a particular computer system is fairly difficult. Significant degree of exhaustive modeling of both algorithms and architectures is required before a reasonable prediction can be attempted. This paper describes a general purpose simulator for message passing multiprocessors (Parsim), which facilitates system modeling. Besides monitoring a number of system parameters, Parsim permits easy reconfiguration of topology and algorithm mappings. Parsim has been used to predict the performance of a number of different algorithms such as Fast Fourier Transform, Livermore loops and integer programming, on a variety of topologies such as transputer hypercubes, transputer meshes and a cluster of workstations on an ethernet backbone.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126164482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance analysis of median filtering on Meiko-a distributed multiprocessor system Meiko-a分布式多处理器系统中值滤波性能分析
K. M. Poon, N. Yung
This paper presents the performance analysis of realizing median filtering on a distributed multiprocessor system. The results of the performance analysis give a good indication of the performance gain in using multi-processor for median filtering over uni-processor. Such performance gain is proportional to the problem size as shown by varying the size of the image. Furthermore, through the analysis, it is clear that the computation time and inter-processor communications scale well with the number of processors in the system. However, the overall system performance does not have such behavior because of the initialization overhead dominating the computation time as the number of processors increases beyond a certain point. It is because of this relationship that an optimal performance is achievable with a certain number of processors. It is also found that this number varies with the problem size. In addition, the subimage model is found to be an acceptable approach far this type of processing as only the necessary parts of the image are sent to the other processors. The master and slave scheme proves to be easy for programming, control and data manipulation. As a whole, this type of non-linear processing seems to fit well into the MIMD architecture.<>
本文给出了在分布式多处理器系统上实现中值滤波的性能分析。性能分析的结果很好地说明了使用多处理器进行中值滤波比使用单处理器进行中值滤波的性能增益。这种性能增益与问题大小成正比,如改变图像大小所示。此外,通过分析可以看出,计算时间和处理器间通信随系统中处理器数量的增加而增加。然而,整体系统性能不会出现这种行为,因为随着处理器数量的增加超过某一点,初始化开销主导了计算时间。正是由于这种关系,使用一定数量的处理器可以实现最佳性能。我们还发现,这个数字随着问题的大小而变化。此外,对于这种类型的处理,子图像模型是一种可接受的方法,因为只有图像的必要部分被发送到其他处理器。主从方案易于编程、控制和数据操作。作为一个整体,这种类型的非线性处理似乎很适合MIMD架构。
{"title":"Performance analysis of median filtering on Meiko-a distributed multiprocessor system","authors":"K. M. Poon, N. Yung","doi":"10.1109/ICAPP.1995.472250","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472250","url":null,"abstract":"This paper presents the performance analysis of realizing median filtering on a distributed multiprocessor system. The results of the performance analysis give a good indication of the performance gain in using multi-processor for median filtering over uni-processor. Such performance gain is proportional to the problem size as shown by varying the size of the image. Furthermore, through the analysis, it is clear that the computation time and inter-processor communications scale well with the number of processors in the system. However, the overall system performance does not have such behavior because of the initialization overhead dominating the computation time as the number of processors increases beyond a certain point. It is because of this relationship that an optimal performance is achievable with a certain number of processors. It is also found that this number varies with the problem size. In addition, the subimage model is found to be an acceptable approach far this type of processing as only the necessary parts of the image are sent to the other processors. The master and slave scheme proves to be easy for programming, control and data manipulation. As a whole, this type of non-linear processing seems to fit well into the MIMD architecture.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115066053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Parallel object-oriented programming with multiple inheritance: language design issues 具有多重继承的并行面向对象程序设计:语言设计问题
A. Radenski
The transition from sequential object-oriented programming (OOP) to parallelism has been in the focus of active research. Experimental languages that try to integrate objects and parallelism are often seriously compromised in their capability to provide inheritance for parallel objects. Even languages that permit some amalgamation of parallelism and inheritance tend to support only single-class inheritance. The purpose of this paper is to specify a strongly typed language framework for parallel object-oriented programming which provides easy-to-use multiple inheritance for parallel objects, including inheritance for synchronization code. The proposed approach to parallelism is based on "separate" methods which generate processes and provide rendezvous-type coordination: it succeeds in cases where known languages fail to combine inheritance with parallelism. Or do it inefficiently and inconveniently.<>
从顺序面向对象编程(OOP)向并行编程的过渡一直是活跃研究的焦点。尝试将对象和并行性集成在一起的实验性语言在为并行对象提供继承的能力上经常受到严重损害。即使是允许并行性和继承相结合的语言也倾向于只支持单类继承。本文的目的是为并行面向对象编程指定一个强类型语言框架,该框架为并行对象提供了易于使用的多重继承,包括同步代码的继承。提出的并行性方法基于生成进程并提供会合类型协调的“分离”方法:在已知语言无法将继承与并行性结合起来的情况下,它成功了。或者效率低下、不方便地去做
{"title":"Parallel object-oriented programming with multiple inheritance: language design issues","authors":"A. Radenski","doi":"10.1109/ICAPP.1995.472169","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472169","url":null,"abstract":"The transition from sequential object-oriented programming (OOP) to parallelism has been in the focus of active research. Experimental languages that try to integrate objects and parallelism are often seriously compromised in their capability to provide inheritance for parallel objects. Even languages that permit some amalgamation of parallelism and inheritance tend to support only single-class inheritance. The purpose of this paper is to specify a strongly typed language framework for parallel object-oriented programming which provides easy-to-use multiple inheritance for parallel objects, including inheritance for synchronization code. The proposed approach to parallelism is based on \"separate\" methods which generate processes and provide rendezvous-type coordination: it succeeds in cases where known languages fail to combine inheritance with parallelism. Or do it inefficiently and inconveniently.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129556712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Packet multiplexing: an efficient router implementation for adaptive mesh networks 包复用:自适应网状网络的有效路由器实现
C. Izu, R. Beivide, A. Arruabarrena, J. Gregorio
The performance of an interconnection network with adaptive routing is strongly related to the deadlock avoidance method it applies. Virtual channels are normally used for this purpose in mesh and torus networks. This work compares true architectural alternatives in the router design: mapping each virtual channel onto a different physical link and multiplexing the set of virtual channels onto the same physical link. Besides, multiplexing at the packet level is proposed as an alternative to multiplexing at the flit level, showing the advantages of this not previously used approach. The benefits of each multiplexing type, both in message latency and throughput, have been evaluated under several traffic conditions. An estimation of the node delay for each implementation scheme has also been calculated.<>
自适应路由互连网络的性能与其所采用的死锁避免方法密切相关。在网格和环面网络中,虚拟通道通常用于此目的。这项工作比较了路由器设计中真正的架构选择:将每个虚拟通道映射到不同的物理链路上,并将虚拟通道集多路复用到同一物理链路上。此外,在分组级别上的多路复用被提议作为在飞行级别上的多路复用的替代方案,显示了这种以前未使用的方法的优点。每种多路复用类型在消息延迟和吞吐量方面的优势已经在几种流量条件下进行了评估。对每个实现方案的节点延迟进行了估计。
{"title":"Packet multiplexing: an efficient router implementation for adaptive mesh networks","authors":"C. Izu, R. Beivide, A. Arruabarrena, J. Gregorio","doi":"10.1109/ICAPP.1995.472247","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472247","url":null,"abstract":"The performance of an interconnection network with adaptive routing is strongly related to the deadlock avoidance method it applies. Virtual channels are normally used for this purpose in mesh and torus networks. This work compares true architectural alternatives in the router design: mapping each virtual channel onto a different physical link and multiplexing the set of virtual channels onto the same physical link. Besides, multiplexing at the packet level is proposed as an alternative to multiplexing at the flit level, showing the advantages of this not previously used approach. The benefits of each multiplexing type, both in message latency and throughput, have been evaluated under several traffic conditions. An estimation of the node delay for each implementation scheme has also been calculated.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123927248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A study of the communication cost of the FFT on torus multicomputers 环面多机上FFT的通信成本研究
L. D. D. Cerio, M. Valero-García, Antonio Gonzalez
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.<>
分析了c维环面多机上一维FFT的计算。提出了不同的方法,它们使用互连网络的方式不同。第一种方法是基于FFT计算的多维索引映射技术。第二种方法从超立方体算法开始,然后将超立方体嵌入到环面上。第三种方法通过将通信操作流水线化来降低超立方体算法的通信成本。提出了一种新的环面通信操作流水线化方法。提出了分析模型来比较不同的方法。比较研究表明,最佳方法取决于环面的维数以及通信启动和传输时间。分析模型使我们能够为现有的机器选择最有效的方法。
{"title":"A study of the communication cost of the FFT on torus multicomputers","authors":"L. D. D. Cerio, M. Valero-García, Antonio Gonzalez","doi":"10.1109/ICAPP.1995.472178","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472178","url":null,"abstract":"The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ONBAM: an objective-neuron-based active memory ONBAM:一种基于目标神经元的主动记忆
T. Ae, T. Tsyosaki, H. Fukumoto, K. Sakai
We propose an objective-neuron-based active memory, ONBAM, and design it by HDL. The ONBAM is an active topological memory, which is a memory with equivalently a topological space of data and a flexibly content-addressable function, which plays an important role of the memory-based AI machine.<>
提出了一种基于目标神经元的主动记忆系统ONBAM,并利用HDL进行了设计。ONBAM是一种主动拓扑存储器,它是一种具有数据等效拓扑空间和灵活内容寻址功能的存储器,在基于内存的人工智能机器中起着重要作用。
{"title":"ONBAM: an objective-neuron-based active memory","authors":"T. Ae, T. Tsyosaki, H. Fukumoto, K. Sakai","doi":"10.1109/ICAPP.1995.472190","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472190","url":null,"abstract":"We propose an objective-neuron-based active memory, ONBAM, and design it by HDL. The ONBAM is an active topological memory, which is a memory with equivalently a topological space of data and a flexibly content-addressable function, which plays an important role of the memory-based AI machine.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121648080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The effect of adding a scalar D-cache to the Cray-4 vector processor 向Cray-4矢量处理器添加标量d缓存的效果
S. Beaty, G. Johnson
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled with fast scalar processors. Processor speed has increased at a rate greater than memory speed. Indeed, current vector processors have cycle times far faster than the memories they are connected to. When compilers can predict memory access patterns, they vectorize computations and thereby hide the processor/memory disparity. When memory access patterns are not known until run-time, caches can pay large dividends. This paper studies the effects of adding a scalar data cache to a modern vector processor and shows some encouraging results.<>
过去,矢量超级计算机通过长运算管道和快速标量处理器来实现高性能。处理器速度的增长速度超过了内存速度。事实上,目前的矢量处理器的周期比它们所连接的存储器要快得多。当编译器可以预测内存访问模式时,它们会对计算进行矢量化,从而隐藏处理器/内存的差异。当内存访问模式直到运行时才知道时,缓存可以带来很大的好处。本文研究了在现代矢量处理器上添加标量数据缓存的效果,并显示了一些令人鼓舞的结果。
{"title":"The effect of adding a scalar D-cache to the Cray-4 vector processor","authors":"S. Beaty, G. Johnson","doi":"10.1109/ICAPP.1995.472189","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472189","url":null,"abstract":"In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled with fast scalar processors. Processor speed has increased at a rate greater than memory speed. Indeed, current vector processors have cycle times far faster than the memories they are connected to. When compilers can predict memory access patterns, they vectorize computations and thereby hide the processor/memory disparity. When memory access patterns are not known until run-time, caches can pay large dividends. This paper studies the effects of adding a scalar data cache to a modern vector processor and shows some encouraging results.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128160620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1