首页 > 最新文献

Proceedings. Advances in Parallel and Distributed Computing最新文献

英文 中文
The study of parallel simulation processing based on MPP technology 基于MPP技术的并行仿真处理研究
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574011
Yang Shi, Chenxi Zhang, Chunyuan Zhang
Computer numerical simulation is widely applied in engineering and social fields. It has shown great value in these fields. Small scale simulation applications can be processed on the traditional simulation computer, but with the size of problem increasing, sequential processing cannot meet the requirements. Dynamic real-time simulation and super real-time simulation require high performance simulation computers. In this paper we first analyse the structure of a classical simulation computer AD-100 which was developed by ADI Inc., then a novel structure for a simulation computer which adopts the MPP technology is proposed. At the end of this paper an experimental result is given to test the feasibility of parallel simulation processing.
计算机数值模拟在工程和社会领域有着广泛的应用。它在这些领域显示出巨大的价值。传统的仿真计算机可以处理小规模的仿真应用,但随着问题规模的增加,顺序处理已不能满足要求。动态实时仿真和超实时仿真都需要高性能的仿真计算机。本文首先分析了ADI公司开发的经典仿真计算机AD-100的结构,然后提出了一种采用MPP技术的新型仿真计算机结构。最后给出了一个实验结果,验证了并行仿真处理的可行性。
{"title":"The study of parallel simulation processing based on MPP technology","authors":"Yang Shi, Chenxi Zhang, Chunyuan Zhang","doi":"10.1109/APDC.1997.574011","DOIUrl":"https://doi.org/10.1109/APDC.1997.574011","url":null,"abstract":"Computer numerical simulation is widely applied in engineering and social fields. It has shown great value in these fields. Small scale simulation applications can be processed on the traditional simulation computer, but with the size of problem increasing, sequential processing cannot meet the requirements. Dynamic real-time simulation and super real-time simulation require high performance simulation computers. In this paper we first analyse the structure of a classical simulation computer AD-100 which was developed by ADI Inc., then a novel structure for a simulation computer which adopts the MPP technology is proposed. At the end of this paper an experimental result is given to test the feasibility of parallel simulation processing.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel replacement mechanism for multithread 多线程并行替换机制
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574052
Guangzuo Cui, Mingzeng Hu, Xiaoming Li
This paper presents a new rapid thread replacement mechanism which is important in multithread technology. Analysis to the memory system indicates that the memory utilization decreases with the increase of cache hit ratio. The parallelism between thread computation and thread replacement is found by analyzing their working processes. Based on these, we advance a rapid multithread replacement mechanism which overlaps the thread replacement with thread computation. More especially, with finite hardware contexts, this mechanism can play the same role of infinite contexts by tolerating the replacement overhead. By modifying the general thread switching model, we build the thread replacement model and evaluate this mechanism in theory and experiment methods. At last, we discuss the hardware implementation and put forward the problems to be resolved in the future.
本文提出了一种新的快速线程替换机制,该机制在多线程技术中具有重要意义。对内存系统的分析表明,随着缓存命中率的增加,内存利用率降低。通过分析线程计算和线程替换的工作过程,发现了它们之间的并行性。在此基础上,提出了一种将线程替换与线程计算相结合的快速多线程替换机制。更具体地说,对于有限的硬件上下文,这种机制可以通过容忍替换开销来发挥无限上下文的相同作用。通过对常规线程交换模型的修正,建立了线程替换模型,并从理论和实验两方面对其机理进行了评价。最后,对系统的硬件实现进行了讨论,并提出了今后需要解决的问题。
{"title":"Parallel replacement mechanism for multithread","authors":"Guangzuo Cui, Mingzeng Hu, Xiaoming Li","doi":"10.1109/APDC.1997.574052","DOIUrl":"https://doi.org/10.1109/APDC.1997.574052","url":null,"abstract":"This paper presents a new rapid thread replacement mechanism which is important in multithread technology. Analysis to the memory system indicates that the memory utilization decreases with the increase of cache hit ratio. The parallelism between thread computation and thread replacement is found by analyzing their working processes. Based on these, we advance a rapid multithread replacement mechanism which overlaps the thread replacement with thread computation. More especially, with finite hardware contexts, this mechanism can play the same role of infinite contexts by tolerating the replacement overhead. By modifying the general thread switching model, we build the thread replacement model and evaluate this mechanism in theory and experiment methods. At last, we discuss the hardware implementation and put forward the problems to be resolved in the future.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123356011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the optimization by redundancy using an extended LogP model 利用扩展LogP模型进行冗余优化
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574026
Jörn Eisenbiegler, Welf Löwe, A. Wehrenpfennig
We present a strategy for optimizing parallel algorithms introducing redundant computations. In order to calculate the optimal amount of redundancy, we generalize the LogP model to capture messages of varying sizes using functions instead of constants for the machine parameters. We validate our method for a wave simulation algorithm on a Parsytec PowerXplorer with eight processors and a workstation cluster with four workstations.
提出了一种引入冗余计算的并行算法优化策略。为了计算最优的冗余量,我们推广了LogP模型,使用函数而不是机器参数的常量来捕获不同大小的消息。我们在具有8个处理器和4个工作站的工作站集群的Parsytec PowerXplorer上验证了我们的波浪模拟算法方法。
{"title":"On the optimization by redundancy using an extended LogP model","authors":"Jörn Eisenbiegler, Welf Löwe, A. Wehrenpfennig","doi":"10.1109/APDC.1997.574026","DOIUrl":"https://doi.org/10.1109/APDC.1997.574026","url":null,"abstract":"We present a strategy for optimizing parallel algorithms introducing redundant computations. In order to calculate the optimal amount of redundancy, we generalize the LogP model to capture messages of varying sizes using functions instead of constants for the machine parameters. We validate our method for a wave simulation algorithm on a Parsytec PowerXplorer with eight processors and a workstation cluster with four workstations.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114809573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Eliminating two kinds of data flow inaccuracy in the presence of pointer aliasing 消除了指针混叠导致的两种数据流不准确性
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574063
Qiang Liu, Zhaoqing Zhang, Xiaomei Ji
Program languages with sophisticated usage of pointers as C are hard to analyze. Recent researches on pointer analysis focus on tracking the possible values of pointers, when a program point is reached, and great progress has been achieved. However, how to apply the result of pointer analysis to dataflow analysis and other program optimization/parallelization is not well studied. This paper presents an efficient interprocedural framework based on two insights in real C program and its use in deriving an context-sensitive pointer analysis algorithm and an accurate interprocedural modification side effects (MOD) computation. Based on the result of the pointer analysis, the inaccuracy induced by merging aliasing information is also studied.
像C这样使用复杂指针的程序语言很难分析。最近对指针分析的研究主要集中在跟踪指针在到达程序点时的可能值,并取得了很大进展。然而,如何将指针分析的结果应用到数据流分析和其他程序优化/并行化中,还没有得到很好的研究。本文基于对实际C程序的两个认识,提出了一个高效的过程间框架,并将其用于推导上下文敏感的指针分析算法和精确的过程间修改副作用(MOD)计算。在指针分析结果的基础上,研究了混叠信息合并引起的误差。
{"title":"Eliminating two kinds of data flow inaccuracy in the presence of pointer aliasing","authors":"Qiang Liu, Zhaoqing Zhang, Xiaomei Ji","doi":"10.1109/APDC.1997.574063","DOIUrl":"https://doi.org/10.1109/APDC.1997.574063","url":null,"abstract":"Program languages with sophisticated usage of pointers as C are hard to analyze. Recent researches on pointer analysis focus on tracking the possible values of pointers, when a program point is reached, and great progress has been achieved. However, how to apply the result of pointer analysis to dataflow analysis and other program optimization/parallelization is not well studied. This paper presents an efficient interprocedural framework based on two insights in real C program and its use in deriving an context-sensitive pointer analysis algorithm and an accurate interprocedural modification side effects (MOD) computation. Based on the result of the pointer analysis, the inaccuracy induced by merging aliasing information is also studied.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117030113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An environment for the parallel execution of multigrain clustered tasks 用于并行执行多粒集群任务的环境
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574050
Jean-Noel Colin
In this paper, we present an original approach for the design and execution of distributed applications that require numerous tasks of variable grain. The approach is based on the concept of task cluster which is an entity that groups tasks with strong logical interaction and that guarantees efficient communications between them. We describe the implementation of the model, that mainly relies on the use of lightweight processes as support for the distributed tasks. We also illustrate the use of the proposed approach on real size applications where it has improved both the ease of design and the performance.
在本文中,我们提出了一种设计和执行需要大量可变粒度任务的分布式应用程序的原始方法。该方法基于任务集群的概念,任务集群是一个实体,它将具有强逻辑交互的任务分组,并保证它们之间的有效通信。我们描述了模型的实现,它主要依赖于使用轻量级流程作为对分布式任务的支持。我们还演示了在实际规模的应用程序中使用所建议的方法,在这些应用程序中,它提高了设计的便利性和性能。
{"title":"An environment for the parallel execution of multigrain clustered tasks","authors":"Jean-Noel Colin","doi":"10.1109/APDC.1997.574050","DOIUrl":"https://doi.org/10.1109/APDC.1997.574050","url":null,"abstract":"In this paper, we present an original approach for the design and execution of distributed applications that require numerous tasks of variable grain. The approach is based on the concept of task cluster which is an entity that groups tasks with strong logical interaction and that guarantees efficient communications between them. We describe the implementation of the model, that mainly relies on the use of lightweight processes as support for the distributed tasks. We also illustrate the use of the proposed approach on real size applications where it has improved both the ease of design and the performance.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127555357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficiency issues of a parallel FEM implementation on shared memory computers 共享内存计算机上并行FEM实现的效率问题
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574027
L. Grabowsky, W. Rehm
In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems exist, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynamic behaviour with respect to data placement and load balancing. Therefore shared-memory architecture seems to be a more appropriate solution for getting efficient implementations. This paper presents a parallelized CG-method for shared memory systems which was implemented on a 4-processor SMP system and makes explicit use of shared memory to enhance the communication between different domains. It is based on an idea for implementing parallization on distributed memory systems and represents an appropriate modification of this method. The results show that an increased synchronization expense can partially compensate the advantages of shared memory communication depending on the levels of refinement and the processor number.
在并行FEM方法领域中,存在许多高效的分布式存储系统解决方案,但是向自适应并行FEM模拟的过渡很可能导致数据放置和负载平衡方面的更动态的行为。因此,共享内存架构似乎是获得高效实现的更合适的解决方案。本文提出了一种在4处理器SMP系统上实现的共享内存系统并行化cg方法,该方法显式地利用共享内存来增强不同域之间的通信。它基于在分布式存储系统上实现并行的思想,并代表了对该方法的适当修改。结果表明,增加的同步费用可以部分补偿共享内存通信的优势,这取决于改进级别和处理器数量。
{"title":"Efficiency issues of a parallel FEM implementation on shared memory computers","authors":"L. Grabowsky, W. Rehm","doi":"10.1109/APDC.1997.574027","DOIUrl":"https://doi.org/10.1109/APDC.1997.574027","url":null,"abstract":"In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems exist, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynamic behaviour with respect to data placement and load balancing. Therefore shared-memory architecture seems to be a more appropriate solution for getting efficient implementations. This paper presents a parallelized CG-method for shared memory systems which was implemented on a 4-processor SMP system and makes explicit use of shared memory to enhance the communication between different domains. It is based on an idea for implementing parallization on distributed memory systems and represents an appropriate modification of this method. The results show that an increased synchronization expense can partially compensate the advantages of shared memory communication depending on the levels of refinement and the processor number.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124474119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A dual-processors multithreaded architecture and its driven execution model 一种双处理器多线程体系结构及其驱动执行模型
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574035
Liquan Xiao, Weixia Xu, Xingming Zhou
The software overhead which includes interprocess communication latency and the overhead of management processes or threads, is a crucial factor to affect the performance of massively parallel processors system. Multiple threaded architecture can effectively reduce and hide the software overhead. Many models need to be implemented inside a microprocessor. Conversely, this paper addresses a multiple threaded architecture adopted for current microprocessors and implements the architecture using hardware description language. Furthermore, the paper presents its driven execution model and evaluates the performance of the presented multithreading system using a trace driven simulator.
软件开销包括进程间通信延迟和管理进程或线程的开销,是影响大规模并行处理器系统性能的关键因素。多线程架构可以有效地减少和隐藏软件开销。许多模型需要在微处理器内实现。相反,本文讨论了当前微处理器采用的多线程体系结构,并使用硬件描述语言实现了该体系结构。在此基础上,提出了多线程系统的驱动执行模型,并利用跟踪驱动模拟器对多线程系统的性能进行了评估。
{"title":"A dual-processors multithreaded architecture and its driven execution model","authors":"Liquan Xiao, Weixia Xu, Xingming Zhou","doi":"10.1109/APDC.1997.574035","DOIUrl":"https://doi.org/10.1109/APDC.1997.574035","url":null,"abstract":"The software overhead which includes interprocess communication latency and the overhead of management processes or threads, is a crucial factor to affect the performance of massively parallel processors system. Multiple threaded architecture can effectively reduce and hide the software overhead. Many models need to be implemented inside a microprocessor. Conversely, this paper addresses a multiple threaded architecture adopted for current microprocessors and implements the architecture using hardware description language. Furthermore, the paper presents its driven execution model and evaluates the performance of the presented multithreading system using a trace driven simulator.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124806968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of multidimensional loops with non-uniform dependences 具有非均匀依赖关系的多维循环分析
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574056
J. Sogno
For a parallelizing compiler, mainly based on loop transformations, dependence information that is as complete and precise as possible is required. In this paper, we propose a generalized method for computing, in any multi-dimensional loop, information which proved to be useful in the case of irregular dependences. Firstly, we solve the basic problem of the existence of a dependence with an algorithm composed of a preprocessing phase of reduction and of an integer simplex resolution. If a solution exists, we compute by integer simplex the bounds of the distances associated with loop indices. Depending on the values of these bounds, we finally define problems consisting in evaluating the bounds of slopes of dependence vectors, which we solve by integer linear fractional programming. The amount of computation for each new problem is very low. This algorithm has been implemented as an extension of the Janus Test, which was presented in a previous work.
对于主要基于循环转换的并行化编译器,需要尽可能完整和精确的依赖信息。在本文中,我们提出了一种广义的计算方法,在任何多维环中,证明了在不规则依赖情况下信息的计算是有用的。首先,我们用一个由预处理阶段约简和整数单纯形分解组成的算法解决了相依性存在的基本问题。如果解存在,我们用整数单纯形计算与循环指标相关的距离边界。根据这些边界的值,我们最后定义了计算相关向量的斜率边界的问题,并用整数线性分式规划解决了这些问题。每个新问题的计算量非常低。该算法已作为Janus Test的扩展实现,Janus Test在之前的工作中提出。
{"title":"Analysis of multidimensional loops with non-uniform dependences","authors":"J. Sogno","doi":"10.1109/APDC.1997.574056","DOIUrl":"https://doi.org/10.1109/APDC.1997.574056","url":null,"abstract":"For a parallelizing compiler, mainly based on loop transformations, dependence information that is as complete and precise as possible is required. In this paper, we propose a generalized method for computing, in any multi-dimensional loop, information which proved to be useful in the case of irregular dependences. Firstly, we solve the basic problem of the existence of a dependence with an algorithm composed of a preprocessing phase of reduction and of an integer simplex resolution. If a solution exists, we compute by integer simplex the bounds of the distances associated with loop indices. Depending on the values of these bounds, we finally define problems consisting in evaluating the bounds of slopes of dependence vectors, which we solve by integer linear fractional programming. The amount of computation for each new problem is very low. This algorithm has been implemented as an extension of the Janus Test, which was presented in a previous work.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125015136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Control mechanism for software pipelining on nested loop 嵌套循环上的软件流水线控制机制
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574053
Tao Yu, Zhizhong Tang, Chihong Zhang, Jun Luo
ILSP (Interlaced inner and outer Loop Software Pipelining) is an efficient algorithm of optimizing operations in the nested loops. To ensure the ILSP has a good time efficiency and a good space efficiency, there must be an efficient nested control mechanism to support the algorithm. Our control mechanism is realized by hardware, it avoids adding many extra instructions and minimises the II (Initialization Interval) of each loop in the nested loop. In cooperation with the compiler, our nested loop control mechanism can efficiently support the software pipelining of the nested loop, and can ensure the ILSP has a high speedup and a low space cost.
ILSP (inter隔行内外循环软件流水线)是一种有效的嵌套循环操作优化算法。为了保证ILSP具有良好的时间效率和空间效率,必须有一个有效的嵌套控制机制来支持该算法。我们的控制机制是通过硬件实现的,它避免了增加许多额外的指令,并且最小化了嵌套循环中每个循环的初始化间隔。在编译器的配合下,我们的嵌套循环控制机制可以有效地支持嵌套循环的软件流水线,保证ILSP具有高的加速和低的空间成本。
{"title":"Control mechanism for software pipelining on nested loop","authors":"Tao Yu, Zhizhong Tang, Chihong Zhang, Jun Luo","doi":"10.1109/APDC.1997.574053","DOIUrl":"https://doi.org/10.1109/APDC.1997.574053","url":null,"abstract":"ILSP (Interlaced inner and outer Loop Software Pipelining) is an efficient algorithm of optimizing operations in the nested loops. To ensure the ILSP has a good time efficiency and a good space efficiency, there must be an efficient nested control mechanism to support the algorithm. Our control mechanism is realized by hardware, it avoids adding many extra instructions and minimises the II (Initialization Interval) of each loop in the nested loop. In cooperation with the compiler, our nested loop control mechanism can efficiently support the software pipelining of the nested loop, and can ensure the ILSP has a high speedup and a low space cost.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125047729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A versatile directory scheme (Dir/sub 2/NB+L) and its implementation on BY91-1 multiprocessors system 一种通用目录方案(Dir/ sub2 /NB+L)及其在BY91-1多处理器系统上的实现
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574030
Tao Li, Ben-Wei Rong
Cache coherence and synchronization between processors have been two critical issues in designing a shared memory multiprocessors system. From the perspective of hardware design, a directory based cache coherence protocol and lock mechanism are employed to prevent inconsistency of caches and warrant atomic memory accesses. The BY91-1 multiprocessors efficiently integrate supports for cache coherence and hardware based primitives by using a uniform directory scheme which is dubbed as Dir/sub 2/NB+L. This integration allows for low hardware overhead while maintaining both a coherent caches system and indivisible memory accesses in a scalable and cohesive fashion. This paper describes the design and rationale of this versatile directory scheme. Results on the evaluation of different directory schemes based on a preliminary simulator-CASIMU demonstrate that Dir/sub 2/NB+L scheme is cost-effective. We also report on the experience gained by implementing this directory scheme on BY91-1 multiprocessors system. We believe that this scheme is well suited for CC-NUMA architecture.
缓存一致性和处理器间的同步是设计共享内存多处理器系统的两个关键问题。在硬件设计方面,采用基于目录的缓存一致性协议和锁机制,防止缓存不一致,保证原子内存访问。BY91-1多处理器通过使用统一的目录方案(称为Dir/sub 2/NB+L)有效地集成了对缓存一致性和基于硬件的原语的支持。这种集成允许低硬件开销,同时以可扩展和内聚的方式维护一致的缓存系统和不可分割的内存访问。本文描述了这种通用目录方案的设计和基本原理。基于初步仿真器casimu对不同目录方案的评价结果表明,Dir/sub 2/NB+L方案具有较高的性价比。我们还报告了在BY91-1多处理器系统上实现该目录方案所获得的经验。我们认为该方案非常适合于CC-NUMA架构。
{"title":"A versatile directory scheme (Dir/sub 2/NB+L) and its implementation on BY91-1 multiprocessors system","authors":"Tao Li, Ben-Wei Rong","doi":"10.1109/APDC.1997.574030","DOIUrl":"https://doi.org/10.1109/APDC.1997.574030","url":null,"abstract":"Cache coherence and synchronization between processors have been two critical issues in designing a shared memory multiprocessors system. From the perspective of hardware design, a directory based cache coherence protocol and lock mechanism are employed to prevent inconsistency of caches and warrant atomic memory accesses. The BY91-1 multiprocessors efficiently integrate supports for cache coherence and hardware based primitives by using a uniform directory scheme which is dubbed as Dir/sub 2/NB+L. This integration allows for low hardware overhead while maintaining both a coherent caches system and indivisible memory accesses in a scalable and cohesive fashion. This paper describes the design and rationale of this versatile directory scheme. Results on the evaluation of different directory schemes based on a preliminary simulator-CASIMU demonstrate that Dir/sub 2/NB+L scheme is cost-effective. We also report on the experience gained by implementing this directory scheme on BY91-1 multiprocessors system. We believe that this scheme is well suited for CC-NUMA architecture.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"581 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings. Advances in Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1