首页 > 最新文献

Proceedings. Advances in Parallel and Distributed Computing最新文献

英文 中文
Eliminating two kinds of data flow inaccuracy in the presence of pointer aliasing 消除了指针混叠导致的两种数据流不准确性
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574063
Qiang Liu, Zhaoqing Zhang, Xiaomei Ji
Program languages with sophisticated usage of pointers as C are hard to analyze. Recent researches on pointer analysis focus on tracking the possible values of pointers, when a program point is reached, and great progress has been achieved. However, how to apply the result of pointer analysis to dataflow analysis and other program optimization/parallelization is not well studied. This paper presents an efficient interprocedural framework based on two insights in real C program and its use in deriving an context-sensitive pointer analysis algorithm and an accurate interprocedural modification side effects (MOD) computation. Based on the result of the pointer analysis, the inaccuracy induced by merging aliasing information is also studied.
像C这样使用复杂指针的程序语言很难分析。最近对指针分析的研究主要集中在跟踪指针在到达程序点时的可能值,并取得了很大进展。然而,如何将指针分析的结果应用到数据流分析和其他程序优化/并行化中,还没有得到很好的研究。本文基于对实际C程序的两个认识,提出了一个高效的过程间框架,并将其用于推导上下文敏感的指针分析算法和精确的过程间修改副作用(MOD)计算。在指针分析结果的基础上,研究了混叠信息合并引起的误差。
{"title":"Eliminating two kinds of data flow inaccuracy in the presence of pointer aliasing","authors":"Qiang Liu, Zhaoqing Zhang, Xiaomei Ji","doi":"10.1109/APDC.1997.574063","DOIUrl":"https://doi.org/10.1109/APDC.1997.574063","url":null,"abstract":"Program languages with sophisticated usage of pointers as C are hard to analyze. Recent researches on pointer analysis focus on tracking the possible values of pointers, when a program point is reached, and great progress has been achieved. However, how to apply the result of pointer analysis to dataflow analysis and other program optimization/parallelization is not well studied. This paper presents an efficient interprocedural framework based on two insights in real C program and its use in deriving an context-sensitive pointer analysis algorithm and an accurate interprocedural modification side effects (MOD) computation. Based on the result of the pointer analysis, the inaccuracy induced by merging aliasing information is also studied.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117030113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An environment for the parallel execution of multigrain clustered tasks 用于并行执行多粒集群任务的环境
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574050
Jean-Noel Colin
In this paper, we present an original approach for the design and execution of distributed applications that require numerous tasks of variable grain. The approach is based on the concept of task cluster which is an entity that groups tasks with strong logical interaction and that guarantees efficient communications between them. We describe the implementation of the model, that mainly relies on the use of lightweight processes as support for the distributed tasks. We also illustrate the use of the proposed approach on real size applications where it has improved both the ease of design and the performance.
在本文中,我们提出了一种设计和执行需要大量可变粒度任务的分布式应用程序的原始方法。该方法基于任务集群的概念,任务集群是一个实体,它将具有强逻辑交互的任务分组,并保证它们之间的有效通信。我们描述了模型的实现,它主要依赖于使用轻量级流程作为对分布式任务的支持。我们还演示了在实际规模的应用程序中使用所建议的方法,在这些应用程序中,它提高了设计的便利性和性能。
{"title":"An environment for the parallel execution of multigrain clustered tasks","authors":"Jean-Noel Colin","doi":"10.1109/APDC.1997.574050","DOIUrl":"https://doi.org/10.1109/APDC.1997.574050","url":null,"abstract":"In this paper, we present an original approach for the design and execution of distributed applications that require numerous tasks of variable grain. The approach is based on the concept of task cluster which is an entity that groups tasks with strong logical interaction and that guarantees efficient communications between them. We describe the implementation of the model, that mainly relies on the use of lightweight processes as support for the distributed tasks. We also illustrate the use of the proposed approach on real size applications where it has improved both the ease of design and the performance.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127555357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enlarging the scope of vector-based computations: extending Fortran 90 by nested data parallelism 扩大基于向量的计算范围:通过嵌套数据并行性扩展Fortran 90
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574015
K. T. Au, M. Chakravarty, J. Darlington, Yike Guo, Stefan Jähnichen, Martin Köhler, G. Keller, W. Pfannenstiel, M. Simons
This paper describes the integration of nested data parallelism into Fortran 90. Unlike flat data parallelism, nested data parallelism directly provides means for handling irregular data structures and certain forms of control parallelism, such as divide-and-conquer algorithms thus enabling the programmer to express such algorithms far more naturally. Existing work deals with nested data parallelism in a functional environment, which does help avoid a set of problems, but makes efficient implementations more complicated. Moreover functional languages are not readily accepted by programmers used to languages such as Fortran and C, which are currently predominant in programming parallel machines. In this paper, we introduce the imperative data-parallel language Fortran 90V and give an overview of its implementation.
本文描述了嵌套数据并行性在Fortran 90中的集成。与平面数据并行不同,嵌套数据并行直接提供了处理不规则数据结构和某些形式的控制并行的方法,例如分治算法,从而使程序员能够更自然地表达这些算法。现有工作处理功能环境中的嵌套数据并行性,这确实有助于避免一组问题,但使高效实现变得更加复杂。此外,函数式语言不容易被习惯于使用Fortran和C等语言的程序员所接受,这些语言目前在并行机器编程中占主导地位。本文介绍了命令式数据并行语言Fortran 90V,并对其实现进行了概述。
{"title":"Enlarging the scope of vector-based computations: extending Fortran 90 by nested data parallelism","authors":"K. T. Au, M. Chakravarty, J. Darlington, Yike Guo, Stefan Jähnichen, Martin Köhler, G. Keller, W. Pfannenstiel, M. Simons","doi":"10.1109/APDC.1997.574015","DOIUrl":"https://doi.org/10.1109/APDC.1997.574015","url":null,"abstract":"This paper describes the integration of nested data parallelism into Fortran 90. Unlike flat data parallelism, nested data parallelism directly provides means for handling irregular data structures and certain forms of control parallelism, such as divide-and-conquer algorithms thus enabling the programmer to express such algorithms far more naturally. Existing work deals with nested data parallelism in a functional environment, which does help avoid a set of problems, but makes efficient implementations more complicated. Moreover functional languages are not readily accepted by programmers used to languages such as Fortran and C, which are currently predominant in programming parallel machines. In this paper, we introduce the imperative data-parallel language Fortran 90V and give an overview of its implementation.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125875515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
The design considerations and test results of AFT-a new generation parallelizing compiler 介绍了新一代并行编译器aft的设计思想和测试结果
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574064
Chuanqi Zhu, B. Zang, Tong Chen
An effective automatic parallelizer is critical for users to exploit the resources of parallel computers. Research has gained much progress in recent years. This paper introduces AFT, a new generation of parallelizing compiler that we have developed. It integrates many advanced techniques in an effective and efficient system. The experimental results show that AFT is able to achieve notable parallelization on many programs.
一个有效的自动并行化是用户利用并行计算机资源的关键。近年来研究取得了很大进展。本文介绍了我们开发的新一代并行化编译器AFT。它集成了许多先进的技术在一个有效和高效的系统。实验结果表明,AFT能够在许多程序上实现显著的并行化。
{"title":"The design considerations and test results of AFT-a new generation parallelizing compiler","authors":"Chuanqi Zhu, B. Zang, Tong Chen","doi":"10.1109/APDC.1997.574064","DOIUrl":"https://doi.org/10.1109/APDC.1997.574064","url":null,"abstract":"An effective automatic parallelizer is critical for users to exploit the resources of parallel computers. Research has gained much progress in recent years. This paper introduces AFT, a new generation of parallelizing compiler that we have developed. It integrates many advanced techniques in an effective and efficient system. The experimental results show that AFT is able to achieve notable parallelization on many programs.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121808867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Definition of control variables for automatic performance modeling 定义用于自动性能建模的控制变量
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574012
H. Mierendorff, Helmut Schwamborn
Automatic model generation is studied as part of a hybrid modeling strategy using simulation for performance analysis. Two major steps have to be carried out in this context. The program which is being investigated has to be translated into a model. During the translation, runtime has to be estimated for numerous computational blocks of statements which are replaced by simple delays. For performance estimation, the model has finally to be analyzed by an evaluation tool. Model evaluation as well as runtime estimation of computational blocks requires values of some variables, the control variables. We discuss the problem of automatic definition of control variables in general and consider some important cases. For the implementation of a model generating tool, we concentrate on parallel Fortran programs using message passing primitives for process communication.
将自动模型生成作为混合建模策略的一部分,利用仿真技术进行性能分析。在这方面必须采取两个主要步骤。正在研究的程序必须转化为模型。在转换过程中,必须估计大量语句的计算块的运行时,这些语句被简单的延迟所取代。为了进行性能评估,模型最后需要通过评估工具进行分析。模型评估以及计算块的运行时估计需要一些变量的值,即控制变量。我们一般地讨论了控制变量的自动定义问题,并考虑了一些重要的情况。对于模型生成工具的实现,我们着重于使用消息传递原语进行进程通信的并行Fortran程序。
{"title":"Definition of control variables for automatic performance modeling","authors":"H. Mierendorff, Helmut Schwamborn","doi":"10.1109/APDC.1997.574012","DOIUrl":"https://doi.org/10.1109/APDC.1997.574012","url":null,"abstract":"Automatic model generation is studied as part of a hybrid modeling strategy using simulation for performance analysis. Two major steps have to be carried out in this context. The program which is being investigated has to be translated into a model. During the translation, runtime has to be estimated for numerous computational blocks of statements which are replaced by simple delays. For performance estimation, the model has finally to be analyzed by an evaluation tool. Model evaluation as well as runtime estimation of computational blocks requires values of some variables, the control variables. We discuss the problem of automatic definition of control variables in general and consider some important cases. For the implementation of a model generating tool, we concentrate on parallel Fortran programs using message passing primitives for process communication.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130981737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A dual-processors multithreaded architecture and its driven execution model 一种双处理器多线程体系结构及其驱动执行模型
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574035
Liquan Xiao, Weixia Xu, Xingming Zhou
The software overhead which includes interprocess communication latency and the overhead of management processes or threads, is a crucial factor to affect the performance of massively parallel processors system. Multiple threaded architecture can effectively reduce and hide the software overhead. Many models need to be implemented inside a microprocessor. Conversely, this paper addresses a multiple threaded architecture adopted for current microprocessors and implements the architecture using hardware description language. Furthermore, the paper presents its driven execution model and evaluates the performance of the presented multithreading system using a trace driven simulator.
软件开销包括进程间通信延迟和管理进程或线程的开销,是影响大规模并行处理器系统性能的关键因素。多线程架构可以有效地减少和隐藏软件开销。许多模型需要在微处理器内实现。相反,本文讨论了当前微处理器采用的多线程体系结构,并使用硬件描述语言实现了该体系结构。在此基础上,提出了多线程系统的驱动执行模型,并利用跟踪驱动模拟器对多线程系统的性能进行了评估。
{"title":"A dual-processors multithreaded architecture and its driven execution model","authors":"Liquan Xiao, Weixia Xu, Xingming Zhou","doi":"10.1109/APDC.1997.574035","DOIUrl":"https://doi.org/10.1109/APDC.1997.574035","url":null,"abstract":"The software overhead which includes interprocess communication latency and the overhead of management processes or threads, is a crucial factor to affect the performance of massively parallel processors system. Multiple threaded architecture can effectively reduce and hide the software overhead. Many models need to be implemented inside a microprocessor. Conversely, this paper addresses a multiple threaded architecture adopted for current microprocessors and implements the architecture using hardware description language. Furthermore, the paper presents its driven execution model and evaluates the performance of the presented multithreading system using a trace driven simulator.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124806968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficiency issues of a parallel FEM implementation on shared memory computers 共享内存计算机上并行FEM实现的效率问题
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574027
L. Grabowsky, W. Rehm
In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems exist, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynamic behaviour with respect to data placement and load balancing. Therefore shared-memory architecture seems to be a more appropriate solution for getting efficient implementations. This paper presents a parallelized CG-method for shared memory systems which was implemented on a 4-processor SMP system and makes explicit use of shared memory to enhance the communication between different domains. It is based on an idea for implementing parallization on distributed memory systems and represents an appropriate modification of this method. The results show that an increased synchronization expense can partially compensate the advantages of shared memory communication depending on the levels of refinement and the processor number.
在并行FEM方法领域中,存在许多高效的分布式存储系统解决方案,但是向自适应并行FEM模拟的过渡很可能导致数据放置和负载平衡方面的更动态的行为。因此,共享内存架构似乎是获得高效实现的更合适的解决方案。本文提出了一种在4处理器SMP系统上实现的共享内存系统并行化cg方法,该方法显式地利用共享内存来增强不同域之间的通信。它基于在分布式存储系统上实现并行的思想,并代表了对该方法的适当修改。结果表明,增加的同步费用可以部分补偿共享内存通信的优势,这取决于改进级别和处理器数量。
{"title":"Efficiency issues of a parallel FEM implementation on shared memory computers","authors":"L. Grabowsky, W. Rehm","doi":"10.1109/APDC.1997.574027","DOIUrl":"https://doi.org/10.1109/APDC.1997.574027","url":null,"abstract":"In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems exist, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynamic behaviour with respect to data placement and load balancing. Therefore shared-memory architecture seems to be a more appropriate solution for getting efficient implementations. This paper presents a parallelized CG-method for shared memory systems which was implemented on a 4-processor SMP system and makes explicit use of shared memory to enhance the communication between different domains. It is based on an idea for implementing parallization on distributed memory systems and represents an appropriate modification of this method. The results show that an increased synchronization expense can partially compensate the advantages of shared memory communication depending on the levels of refinement and the processor number.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124474119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A versatile directory scheme (Dir/sub 2/NB+L) and its implementation on BY91-1 multiprocessors system 一种通用目录方案(Dir/ sub2 /NB+L)及其在BY91-1多处理器系统上的实现
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574030
Tao Li, Ben-Wei Rong
Cache coherence and synchronization between processors have been two critical issues in designing a shared memory multiprocessors system. From the perspective of hardware design, a directory based cache coherence protocol and lock mechanism are employed to prevent inconsistency of caches and warrant atomic memory accesses. The BY91-1 multiprocessors efficiently integrate supports for cache coherence and hardware based primitives by using a uniform directory scheme which is dubbed as Dir/sub 2/NB+L. This integration allows for low hardware overhead while maintaining both a coherent caches system and indivisible memory accesses in a scalable and cohesive fashion. This paper describes the design and rationale of this versatile directory scheme. Results on the evaluation of different directory schemes based on a preliminary simulator-CASIMU demonstrate that Dir/sub 2/NB+L scheme is cost-effective. We also report on the experience gained by implementing this directory scheme on BY91-1 multiprocessors system. We believe that this scheme is well suited for CC-NUMA architecture.
缓存一致性和处理器间的同步是设计共享内存多处理器系统的两个关键问题。在硬件设计方面,采用基于目录的缓存一致性协议和锁机制,防止缓存不一致,保证原子内存访问。BY91-1多处理器通过使用统一的目录方案(称为Dir/sub 2/NB+L)有效地集成了对缓存一致性和基于硬件的原语的支持。这种集成允许低硬件开销,同时以可扩展和内聚的方式维护一致的缓存系统和不可分割的内存访问。本文描述了这种通用目录方案的设计和基本原理。基于初步仿真器casimu对不同目录方案的评价结果表明,Dir/sub 2/NB+L方案具有较高的性价比。我们还报告了在BY91-1多处理器系统上实现该目录方案所获得的经验。我们认为该方案非常适合于CC-NUMA架构。
{"title":"A versatile directory scheme (Dir/sub 2/NB+L) and its implementation on BY91-1 multiprocessors system","authors":"Tao Li, Ben-Wei Rong","doi":"10.1109/APDC.1997.574030","DOIUrl":"https://doi.org/10.1109/APDC.1997.574030","url":null,"abstract":"Cache coherence and synchronization between processors have been two critical issues in designing a shared memory multiprocessors system. From the perspective of hardware design, a directory based cache coherence protocol and lock mechanism are employed to prevent inconsistency of caches and warrant atomic memory accesses. The BY91-1 multiprocessors efficiently integrate supports for cache coherence and hardware based primitives by using a uniform directory scheme which is dubbed as Dir/sub 2/NB+L. This integration allows for low hardware overhead while maintaining both a coherent caches system and indivisible memory accesses in a scalable and cohesive fashion. This paper describes the design and rationale of this versatile directory scheme. Results on the evaluation of different directory schemes based on a preliminary simulator-CASIMU demonstrate that Dir/sub 2/NB+L scheme is cost-effective. We also report on the experience gained by implementing this directory scheme on BY91-1 multiprocessors system. We believe that this scheme is well suited for CC-NUMA architecture.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"581 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysis of multidimensional loops with non-uniform dependences 具有非均匀依赖关系的多维循环分析
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574056
J. Sogno
For a parallelizing compiler, mainly based on loop transformations, dependence information that is as complete and precise as possible is required. In this paper, we propose a generalized method for computing, in any multi-dimensional loop, information which proved to be useful in the case of irregular dependences. Firstly, we solve the basic problem of the existence of a dependence with an algorithm composed of a preprocessing phase of reduction and of an integer simplex resolution. If a solution exists, we compute by integer simplex the bounds of the distances associated with loop indices. Depending on the values of these bounds, we finally define problems consisting in evaluating the bounds of slopes of dependence vectors, which we solve by integer linear fractional programming. The amount of computation for each new problem is very low. This algorithm has been implemented as an extension of the Janus Test, which was presented in a previous work.
对于主要基于循环转换的并行化编译器,需要尽可能完整和精确的依赖信息。在本文中,我们提出了一种广义的计算方法,在任何多维环中,证明了在不规则依赖情况下信息的计算是有用的。首先,我们用一个由预处理阶段约简和整数单纯形分解组成的算法解决了相依性存在的基本问题。如果解存在,我们用整数单纯形计算与循环指标相关的距离边界。根据这些边界的值,我们最后定义了计算相关向量的斜率边界的问题,并用整数线性分式规划解决了这些问题。每个新问题的计算量非常低。该算法已作为Janus Test的扩展实现,Janus Test在之前的工作中提出。
{"title":"Analysis of multidimensional loops with non-uniform dependences","authors":"J. Sogno","doi":"10.1109/APDC.1997.574056","DOIUrl":"https://doi.org/10.1109/APDC.1997.574056","url":null,"abstract":"For a parallelizing compiler, mainly based on loop transformations, dependence information that is as complete and precise as possible is required. In this paper, we propose a generalized method for computing, in any multi-dimensional loop, information which proved to be useful in the case of irregular dependences. Firstly, we solve the basic problem of the existence of a dependence with an algorithm composed of a preprocessing phase of reduction and of an integer simplex resolution. If a solution exists, we compute by integer simplex the bounds of the distances associated with loop indices. Depending on the values of these bounds, we finally define problems consisting in evaluating the bounds of slopes of dependence vectors, which we solve by integer linear fractional programming. The amount of computation for each new problem is very low. This algorithm has been implemented as an extension of the Janus Test, which was presented in a previous work.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125015136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Control mechanism for software pipelining on nested loop 嵌套循环上的软件流水线控制机制
Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574053
Tao Yu, Zhizhong Tang, Chihong Zhang, Jun Luo
ILSP (Interlaced inner and outer Loop Software Pipelining) is an efficient algorithm of optimizing operations in the nested loops. To ensure the ILSP has a good time efficiency and a good space efficiency, there must be an efficient nested control mechanism to support the algorithm. Our control mechanism is realized by hardware, it avoids adding many extra instructions and minimises the II (Initialization Interval) of each loop in the nested loop. In cooperation with the compiler, our nested loop control mechanism can efficiently support the software pipelining of the nested loop, and can ensure the ILSP has a high speedup and a low space cost.
ILSP (inter隔行内外循环软件流水线)是一种有效的嵌套循环操作优化算法。为了保证ILSP具有良好的时间效率和空间效率,必须有一个有效的嵌套控制机制来支持该算法。我们的控制机制是通过硬件实现的,它避免了增加许多额外的指令,并且最小化了嵌套循环中每个循环的初始化间隔。在编译器的配合下,我们的嵌套循环控制机制可以有效地支持嵌套循环的软件流水线,保证ILSP具有高的加速和低的空间成本。
{"title":"Control mechanism for software pipelining on nested loop","authors":"Tao Yu, Zhizhong Tang, Chihong Zhang, Jun Luo","doi":"10.1109/APDC.1997.574053","DOIUrl":"https://doi.org/10.1109/APDC.1997.574053","url":null,"abstract":"ILSP (Interlaced inner and outer Loop Software Pipelining) is an efficient algorithm of optimizing operations in the nested loops. To ensure the ILSP has a good time efficiency and a good space efficiency, there must be an efficient nested control mechanism to support the algorithm. Our control mechanism is realized by hardware, it avoids adding many extra instructions and minimises the II (Initialization Interval) of each loop in the nested loop. In cooperation with the compiler, our nested loop control mechanism can efficiently support the software pipelining of the nested loop, and can ensure the ILSP has a high speedup and a low space cost.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125047729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings. Advances in Parallel and Distributed Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1