Manish Gupta, S. Midkiff, E. Schonberg, V. Seshadri, David Shields, Ko-Yang Wang, Wai-Mee Ching, T. Ngo
We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically parallelized. The compiler supports symbolic analysis of expressions. This allows parameters such as the number of processors to be unknown at compile-time without significantly affecting performance. Communication schedules and computation guards are generated in a parameterized form at compile-time. Several novel optimizations and improved versions of well-known optimizations have been implemented in pHPF to exploit parallelism and reduce communication costs. These optimizations include elimination of redundant communication using data-availability analysis; using collective communication; new techniques for mapping scalar variables; coarse-grain wavefronting; and communication reduction in multi-dimensional shift communications. We present experimental results for some well-known benchmark routines. The results show the effectiveness of the compiler in generating efficient code for HPF programs.
{"title":"An HPF Compiler for the IBM SP2","authors":"Manish Gupta, S. Midkiff, E. Schonberg, V. Seshadri, David Shields, Ko-Yang Wang, Wai-Mee Ching, T. Ngo","doi":"10.1145/224170.224422","DOIUrl":"https://doi.org/10.1145/224170.224422","url":null,"abstract":"We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically parallelized. The compiler supports symbolic analysis of expressions. This allows parameters such as the number of processors to be unknown at compile-time without significantly affecting performance. Communication schedules and computation guards are generated in a parameterized form at compile-time. Several novel optimizations and improved versions of well-known optimizations have been implemented in pHPF to exploit parallelism and reduce communication costs. These optimizations include elimination of redundant communication using data-availability analysis; using collective communication; new techniques for mapping scalar variables; coarse-grain wavefronting; and communication reduction in multi-dimensional shift communications. We present experimental results for some well-known benchmark routines. The results show the effectiveness of the compiler in generating efficient code for HPF programs.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122819487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, M. Lam
This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array and scalar variables, and symbolic analysis of array subscripts. The interprocedural analysis framework is designed to provide analysis results nearly as precise as full inlining but without its associated costs. Experimentation with this system shows that it is capable of detecting coarser granularity of parallelism than previously possible. Specifically, it can parallelize loops that span numerous procedures and hundreds of lines of codes, frequently requiring modifications to array data structures such as privatization and reduction transformations. Measurements from several standard benchmark suites demonstrate that an integrated combination of interprocedural analyses can substantially advance the capability of automatic parallelization technology.
{"title":"Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler","authors":"Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, M. Lam","doi":"10.1145/224170.224337","DOIUrl":"https://doi.org/10.1145/224170.224337","url":null,"abstract":"This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array and scalar variables, and symbolic analysis of array subscripts. The interprocedural analysis framework is designed to provide analysis results nearly as precise as full inlining but without its associated costs. Experimentation with this system shows that it is capable of detecting coarser granularity of parallelism than previously possible. Specifically, it can parallelize loops that span numerous procedures and hundreds of lines of codes, frequently requiring modifications to array data structures such as privatization and reduction transformations. Measurements from several standard benchmark suites demonstrate that an integrated combination of interprocedural analyses can substantially advance the capability of automatic parallelization technology.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117136880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Where is the Supercomputer Software Revolution?","authors":"Dennis Gannon, L. Smarr, V. Schuster","doi":"10.1145/224170.224507","DOIUrl":"https://doi.org/10.1145/224170.224507","url":null,"abstract":"","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115890015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Technology is used to empower students to go beyond traditional limitations. EarthVision provides the opportunity to participate in an authentic research environment enables the students to develop a sense of self worth and esteem established in the context of a phased curriculum, bringing together experts in a variety of disciplines. New techniques such as modeling and scientific visualization are employed to expand the types of phenomena which are possible to examine at a high school level. The use of concept strands going from simple elements to complicated representations helps to move the teacher/student teams from a highly structured learning environment to one that is highly independent. The scientific method, which employs validation throughout the computational science process, brings rigor and integrity which stimulates skill development needed for the development of autonomy. The result is significant cognitive development coupled with a positive affective orientation.
{"title":"Developing Computational Science Curricula: The EarthVision Experience","authors":"Ralph K. Coppola, E. Toth","doi":"10.1145/224170.224202","DOIUrl":"https://doi.org/10.1145/224170.224202","url":null,"abstract":"Technology is used to empower students to go beyond traditional limitations. EarthVision provides the opportunity to participate in an authentic research environment enables the students to develop a sense of self worth and esteem established in the context of a phased curriculum, bringing together experts in a variety of disciplines. New techniques such as modeling and scientific visualization are employed to expand the types of phenomena which are possible to examine at a high school level. The use of concept strands going from simple elements to complicated representations helps to move the teacher/student teams from a highly structured learning environment to one that is highly independent. The scientific method, which employs validation throughout the computational science process, brings rigor and integrity which stimulates skill development needed for the development of autonomy. The result is significant cognitive development coupled with a positive affective orientation.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124845002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As partof a NASA HPCC Grand Challenge project, we are designing and implementing a parallel atmospheric chemical tracer model that will be suitable for use in global simulations. To accomplish this goal, our starting point has been an atmospheric pollution model that was originally used to study pollution in the Los Angeles Basin. The model includes gas-phase and aqueous-phase chemistry, radiation, aerosol physics, advection, convection, deposition, visibility and emissions. The potential bottlenecks in the model for parallel implementation are the compute-intensiveODE solving phase with load balancing problems,and the communication-intensive-advection phase. We describe the implementation and performance results on a variety of platforms,with emphasis on a detailed performance model we developed to predict performance, identify bottlenecks, guide our implementation, assess scalability, and evaluate architectures. An atmospheric chemical tracer model such as the one we describe in this paper will be one component of a larger Earth Systems Model (ESM), being developed under the direction of C. R. Mechoso of UCLA, incorporating atmospheric dynamics, atmospheric physics, ocean dynamics, and a database and visualization system.
作为NASA HPCC大挑战项目的一部分,我们正在设计和实施一个平行的大气化学示踪剂模型,该模型将适用于全球模拟。为了实现这一目标,我们的出发点是一个大气污染模型,该模型最初用于研究洛杉矶盆地的污染。该模型包括气相和水相化学、辐射、气溶胶物理、平流、对流、沉积、能见度和排放。并行实现模型的潜在瓶颈是具有负载平衡问题的计算密集型ode求解阶段和通信密集型平流阶段。我们描述了在各种平台上的实现和性能结果,重点介绍了我们开发的详细性能模型,以预测性能、识别瓶颈、指导实现、评估可伸缩性和评估架构。我们在本文中描述的大气化学示踪模型将成为更大的地球系统模型(ESM)的一个组成部分,该模型正在加州大学洛杉矶分校C. R. Mechoso的指导下开发,包括大气动力学、大气物理学、海洋动力学以及数据库和可视化系统。
{"title":"Performance of a Parallel Global Atmospheric Chemical Tracer Model","authors":"J. Demmel, Sharon L. Smith","doi":"10.1145/224170.224504","DOIUrl":"https://doi.org/10.1145/224170.224504","url":null,"abstract":"As partof a NASA HPCC Grand Challenge project, we are designing and implementing a parallel atmospheric chemical tracer model that will be suitable for use in global simulations. To accomplish this goal, our starting point has been an atmospheric pollution model that was originally used to study pollution in the Los Angeles Basin. The model includes gas-phase and aqueous-phase chemistry, radiation, aerosol physics, advection, convection, deposition, visibility and emissions. The potential bottlenecks in the model for parallel implementation are the compute-intensiveODE solving phase with load balancing problems,and the communication-intensive-advection phase. We describe the implementation and performance results on a variety of platforms,with emphasis on a detailed performance model we developed to predict performance, identify bottlenecks, guide our implementation, assess scalability, and evaluate architectures. An atmospheric chemical tracer model such as the one we describe in this paper will be one component of a larger Earth Systems Model (ESM), being developed under the direction of C. R. Mechoso of UCLA, incorporating atmospheric dynamics, atmospheric physics, ocean dynamics, and a database and visualization system.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121759414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Bernard, C. DeTar, S. Gottlieb, U. Heller, J. Hetrick, N. Ishizuka, L. Kärkkäinen, S. Lantz, K. Rummukainen, R. Sugar, D. Toussaint, M. Wingate
A 512 node IBM Scalable POWERParallel Systems SP2 was installed at the Cornell Theory Center in October 1994. During the past couple of months we have been porting and optimizing code for carrying out lattice QCD calculations. Present performance is far from ideal, however, and optimization efforts are still under way. The rate limiting step in our code involves a rather generic inversion of a large, sparse system, based on a partial differential equation in a multidimensional space. The insights we have gained so far may be useful in diagnosing performance in a wide class of applications.
1994年10月,一台512节点的IBM可伸缩POWERParallel Systems SP2安装在康奈尔理论中心。在过去的几个月里,我们一直在移植和优化执行晶格QCD计算的代码。然而,目前的性能远非理想,优化工作仍在进行中。我们代码中的速率限制步骤涉及基于多维空间中的偏微分方程的大型稀疏系统的相当一般的反转。到目前为止,我们获得的见解可能有助于诊断各种应用程序的性能。
{"title":"Lattice QCD on the IBM Scalable POWERParallel Systems SP2","authors":"C. Bernard, C. DeTar, S. Gottlieb, U. Heller, J. Hetrick, N. Ishizuka, L. Kärkkäinen, S. Lantz, K. Rummukainen, R. Sugar, D. Toussaint, M. Wingate","doi":"10.1145/224170.224307","DOIUrl":"https://doi.org/10.1145/224170.224307","url":null,"abstract":"A 512 node IBM Scalable POWERParallel Systems SP2 was installed at the Cornell Theory Center in October 1994. During the past couple of months we have been porting and optimizing code for carrying out lattice QCD calculations. Present performance is far from ideal, however, and optimization efforts are still under way. The rate limiting step in our code involves a rather generic inversion of a large, sparse system, based on a partial differential equation in a multidimensional space. The insights we have gained so far may be useful in diagnosing performance in a wide class of applications.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127544778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A parallel time-dependent incompressible flow solver and a parallel multigrid elliptic kernel are described. The flow solver is based on a second-order projection method applied to a staggered finite-difference grid. The multigrid algorithms implemented in the elliptic kernel, which is needed by the flow solver, are V-cycle and full V-cycle schemes. A grid-partition strategy is used in the parallel implementations of both the flow solver and the multigrid elliptic kernel on all fine and coarse grids. Numerical experiments and parallel performance tests show the parallel solver package is numerically stable, physically robust and computationally efficient. Both the multigrid elliptic kernel and the flow solver scale very well to a large number of processors on the Intel Paragon and the Cray T3D for computations with moderate granularity. The solver package has been carefully designed and coded so that it can be easily adapted to solving a variety of interesting two and three-dimensional flow problems. The solver package is portable to parallel systems that support MPI, PVM and Intel NX for interprocessor communications.
{"title":"A Parallel Incompressible Flow Solver Package with a Parallel Multigrid Elliptic Kernel","authors":"J. Lou, R. Ferraro","doi":"10.1145/224170.224406","DOIUrl":"https://doi.org/10.1145/224170.224406","url":null,"abstract":"A parallel time-dependent incompressible flow solver and a parallel multigrid elliptic kernel are described. The flow solver is based on a second-order projection method applied to a staggered finite-difference grid. The multigrid algorithms implemented in the elliptic kernel, which is needed by the flow solver, are V-cycle and full V-cycle schemes. A grid-partition strategy is used in the parallel implementations of both the flow solver and the multigrid elliptic kernel on all fine and coarse grids. Numerical experiments and parallel performance tests show the parallel solver package is numerically stable, physically robust and computationally efficient. Both the multigrid elliptic kernel and the flow solver scale very well to a large number of processors on the Intel Paragon and the Cray T3D for computations with moderate granularity. The solver package has been carefully designed and coded so that it can be easily adapted to solving a variety of interesting two and three-dimensional flow problems. The solver package is portable to parallel systems that support MPI, PVM and Intel NX for interprocessor communications.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128802120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A statistical mechanical approach to the protein folding problem is developed based on computer simulations. The properties of proteins related to conformation and folding are determined from the density of states of the protein. A new simulation procedure, the Entropy Sampling Monte Carlo method, is used to determine accurately the density of states of the protein. To enhance the efficiency of sampling the conformational space of a protein, two techniques (a conformational-biased chain regrowth procedure and a jump-walking method) were introduced into the simulation. Applications of the approach to study a number of model polypeptides and a small protein, Bovine Pancreatic Trypsin Inhibitor, have been carried out. The results obtained demonstrate that the new approach is more powerful and produces richer information about the thermodynamics and folding behavior of proteins than conventional simulation methods.
{"title":"Computational Approach to the Statistical Mechanics of Protein Folding","authors":"M. Hao, H. Scheraga","doi":"10.1145/224170.224216","DOIUrl":"https://doi.org/10.1145/224170.224216","url":null,"abstract":"A statistical mechanical approach to the protein folding problem is developed based on computer simulations. The properties of proteins related to conformation and folding are determined from the density of states of the protein. A new simulation procedure, the Entropy Sampling Monte Carlo method, is used to determine accurately the density of states of the protein. To enhance the efficiency of sampling the conformational space of a protein, two techniques (a conformational-biased chain regrowth procedure and a jump-walking method) were introduced into the simulation. Applications of the approach to study a number of model polypeptides and a small protein, Bovine Pancreatic Trypsin Inhibitor, have been carried out. The results obtained demonstrate that the new approach is more powerful and produces richer information about the thermodynamics and folding behavior of proteins than conventional simulation methods.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127435739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software instrumentation is a widely used technique for parallel program performance evaluation, debugging, steering, and visualization. With increasing sophistication of parallel tool development technologies and broadening of application areas where these tools are being used, runtime data collection and management activities are growing in importance; we use the term instrumentation system (IS) to refer to components that support these activities in state-of-the-art parallel tool environments. An IS consists of Local Instrumentation Servers, an Instrumentation System Manager, and a Transfer Protocol. The overheads and perturbation effects attributed to an IS must be accounted for to ensure correct and efficient representation of program behavior, especially for on-line and real-time environments. Moreover, an IS is a key facilitator of integration of tools in an environment. In this paper, we define the primary components of an IS and their roles in an integrated environment, and classify ISs according to selected features. We introduce a structured approach to plan, design, model, evaluate, implement, and validate an IS. The approach provides a means to formally address domain-specific requirements. The modeling and evaluation processes are illustrated in the context of three distinctive IS case studies for PICL, Paradyn, and Vista. Valuable feedback on performance effects of IS parameters and policies can assist developers in making design decisions early in the software development cycle. Additionally, use of structured software engineering methods can support the mapping of an abstract IS model to an implementation of the IS.
{"title":"A Structured Approach to Instrumentation System Development and Evaluation","authors":"A. Waheed, D. Rover","doi":"10.1145/224170.224271","DOIUrl":"https://doi.org/10.1145/224170.224271","url":null,"abstract":"Software instrumentation is a widely used technique for parallel program performance evaluation, debugging, steering, and visualization. With increasing sophistication of parallel tool development technologies and broadening of application areas where these tools are being used, runtime data collection and management activities are growing in importance; we use the term instrumentation system (IS) to refer to components that support these activities in state-of-the-art parallel tool environments. An IS consists of Local Instrumentation Servers, an Instrumentation System Manager, and a Transfer Protocol. The overheads and perturbation effects attributed to an IS must be accounted for to ensure correct and efficient representation of program behavior, especially for on-line and real-time environments. Moreover, an IS is a key facilitator of integration of tools in an environment. In this paper, we define the primary components of an IS and their roles in an integrated environment, and classify ISs according to selected features. We introduce a structured approach to plan, design, model, evaluate, implement, and validate an IS. The approach provides a means to formally address domain-specific requirements. The modeling and evaluation processes are illustrated in the context of three distinctive IS case studies for PICL, Paradyn, and Vista. Valuable feedback on performance effects of IS parameters and policies can assist developers in making design decisions early in the software development cycle. Additionally, use of structured software engineering methods can support the mapping of an abstract IS model to an implementation of the IS.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132655817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Array dataflow information plays an important role for successful automatic parallelization of Fortran programs. This paper proposes a powerful symbolic array dataflow analysis to support array privatization and loop parallelization for programs with arbitrary control flow graphs and acyclic call graphs. Our scheme summarizes array access information using guarded array regions and propagates such regions over a Hierarchical Supergraph (HSG). The use of guards allows us to use the information in IF conditions to sharpen the array dataflow analysis and thereby to handle difficult cases which elude other existing techniques. The guarded array regions retain the simplicity of set operations for regular array regions in common cases, and they enhance regular array regions in complicated cases by using guards to handle complex symbolic expressions and array shapes. Scalar values that appear in array subscripts and loop limits are substituted on the fly during the array information propagation, which disambiguates the symbolic values precisely for set operations. We present efficient algorithms that implement our scheme. Initial experiments of applying our analysis to Perfect Benchmarks show promising results of improved array privatization.
{"title":"Symbolic Array Dataflow Analysis for Array Privatization and Program Parallelization","authors":"Junjie Gu, Zhiyuan Li, Gyungho Lee","doi":"10.1145/224170.224318","DOIUrl":"https://doi.org/10.1145/224170.224318","url":null,"abstract":"Array dataflow information plays an important role for successful automatic parallelization of Fortran programs. This paper proposes a powerful symbolic array dataflow analysis to support array privatization and loop parallelization for programs with arbitrary control flow graphs and acyclic call graphs. Our scheme summarizes array access information using guarded array regions and propagates such regions over a Hierarchical Supergraph (HSG). The use of guards allows us to use the information in IF conditions to sharpen the array dataflow analysis and thereby to handle difficult cases which elude other existing techniques. The guarded array regions retain the simplicity of set operations for regular array regions in common cases, and they enhance regular array regions in complicated cases by using guards to handle complex symbolic expressions and array shapes. Scalar values that appear in array subscripts and loop limits are substituted on the fly during the array information propagation, which disambiguates the symbolic values precisely for set operations. We present efficient algorithms that implement our scheme. Initial experiments of applying our analysis to Perfect Benchmarks show promising results of improved array privatization.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122841202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}