首页 > 最新文献

IEEE Parallel & Distributed Technology: Systems & Applications最新文献

英文 中文
Beyond execution time: expanding the use of performance models 超越执行时间:扩展性能模型的使用
Pub Date : 1994-06-01 DOI: 10.1109/88.311571
G. D. Peterson, R. Chamberlain
Improved performance is a major motivation for using parallel computation. However, performance models are frequently used only to predict an algorithm's execution time, not to accurately evaluate how the choices of architecture, operating system, interprocessor communication protocol, and programming language also dramatically affect parallel performance. We have developed an analytic model for synchronous iterative algorithms running on distributed-memory MIMD machines, and refined it for disrete-event simulation. The model describes the execution time of a single run in terms of application parameters such as the number of iterations and the required computation in each, and architectural parameters such as the number of processors, processor speed, and communication time. Our experience has shown us that an analytic model can not only accurately predict an algorithm's performance but can also match the algorithm to an appropriate architecture, identify ways to improve the algorithm's performance, quantify the performance effects of algorithmic or architectural changes, and provide a better understanding of how the algorithm works.<>
提高性能是使用并行计算的主要动机。然而,性能模型经常只用于预测算法的执行时间,而不是准确地评估体系结构、操作系统、处理器间通信协议和编程语言的选择如何显著地影响并行性能。我们开发了一个运行在分布式内存MIMD机器上的同步迭代算法的分析模型,并对其进行了改进,用于离散事件仿真。该模型根据应用程序参数(如每次迭代的次数和所需的计算)和架构参数(如处理器数量、处理器速度和通信时间)来描述单次运行的执行时间。我们的经验告诉我们,分析模型不仅可以准确地预测算法的性能,还可以将算法与适当的体系结构相匹配,确定改进算法性能的方法,量化算法或体系结构变化的性能影响,并提供对算法如何工作的更好理解。
{"title":"Beyond execution time: expanding the use of performance models","authors":"G. D. Peterson, R. Chamberlain","doi":"10.1109/88.311571","DOIUrl":"https://doi.org/10.1109/88.311571","url":null,"abstract":"Improved performance is a major motivation for using parallel computation. However, performance models are frequently used only to predict an algorithm's execution time, not to accurately evaluate how the choices of architecture, operating system, interprocessor communication protocol, and programming language also dramatically affect parallel performance. We have developed an analytic model for synchronous iterative algorithms running on distributed-memory MIMD machines, and refined it for disrete-event simulation. The model describes the execution time of a single run in terms of application parameters such as the number of iterations and the required computation in each, and architectural parameters such as the number of processors, processor speed, and communication time. Our experience has shown us that an analytic model can not only accurately predict an algorithm's performance but can also match the algorithm to an appropriate architecture, identify ways to improve the algorithm's performance, quantify the performance effects of algorithmic or architectural changes, and provide a better understanding of how the algorithm works.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124586887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A scalable debugger for massively parallel message-passing programs 用于大规模并行消息传递程序的可伸缩调试器
Pub Date : 1994-06-01 DOI: 10.1109/88.311572
S. Sistare, Don Allen, R. Bowker, K. Jourdenais, Josh Simons, R. Title
In a message-passing program, there are at least as many threads as processors, and the programmer must deal with large numbers of them on a massively parallel machine. On our target machine, the CM-5, we had previously developed Prism, a programming environment that supports debugging, data visualization, and performance analysis of data-parallel programs. We discuss how our new version, Node Prism, extends Prism's capabilities for message-passing programs. It looks and feels like the data-parallel version, but it uses new methods for user-debugger interaction that promote greater understanding of parallel programs. It offers scalable expression, execution, and interpretation of all debugging operations, making it easier to debug and understand message-passing programs.<>
在消息传递程序中,至少有与处理器一样多的线程,程序员必须在大型并行机器上处理大量的线程。在我们的目标机器CM-5上,我们先前开发了Prism,这是一个支持调试、数据可视化和数据并行程序的性能分析的编程环境。我们将讨论我们的新版本Node Prism如何扩展Prism的消息传递程序功能。它看起来和感觉上都像数据并行版本,但是它使用了新的用户-调试器交互方法,促进了对并行程序的更好理解。它提供了所有调试操作的可伸缩表达式、执行和解释,使调试和理解消息传递程序变得更加容易。
{"title":"A scalable debugger for massively parallel message-passing programs","authors":"S. Sistare, Don Allen, R. Bowker, K. Jourdenais, Josh Simons, R. Title","doi":"10.1109/88.311572","DOIUrl":"https://doi.org/10.1109/88.311572","url":null,"abstract":"In a message-passing program, there are at least as many threads as processors, and the programmer must deal with large numbers of them on a massively parallel machine. On our target machine, the CM-5, we had previously developed Prism, a programming environment that supports debugging, data visualization, and performance analysis of data-parallel programs. We discuss how our new version, Node Prism, extends Prism's capabilities for message-passing programs. It looks and feels like the data-parallel version, but it uses new methods for user-debugger interaction that promote greater understanding of parallel programs. It offers scalable expression, execution, and interpretation of all debugging operations, making it easier to debug and understand message-passing programs.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"417 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127598269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A distributed snooping algorithm for pixel merging 用于像素合并的分布式窥探算法
Pub Date : 1994-06-01 DOI: 10.1109/88.311570
M. Cox, P. Hanrahan
Previous pixel-merging algorithms have required special-purpose networks, and use more network bandwidth than is necessary. We developed an algorithm that merges pixels on shared-memory bus multiprocessors, using an existing bus. Analysis and simulations suggest that it uses less bus bandwidth than other algorithms. We based our algorithm on the snooping cache-coherency protocols on which a number of shared-memory multiprocessors have been based. In these architectures, each processor keeps its cache consistent with other processors' memories by listening (snooping) on a shared bus over which memory updates are written. Snooping maintains consistent globally shared memory. This algorithm assists graphics rendering by letting processors compare pixel values and delete those pixels that do not contribute to the final image. This reduces network bandwidth requirements and eliminates the need for a special-purpose network.<>
以前的像素合并算法需要特殊用途的网络,并且使用比必要的更多的网络带宽。我们开发了一种算法,使用现有总线合并共享内存总线多处理器上的像素。分析和仿真表明,该算法比其他算法占用更少的总线带宽。我们的算法基于窥探缓存一致性协议,许多共享内存多处理器都基于该协议。在这些体系结构中,每个处理器通过侦听(窥探)写入内存更新的共享总线来保持其缓存与其他处理器的内存一致。窥探维护一致的全局共享内存。该算法通过让处理器比较像素值并删除那些对最终图像没有贡献的像素来帮助图形渲染。这降低了对网络带宽的要求,并消除了对专用网络的需求。
{"title":"A distributed snooping algorithm for pixel merging","authors":"M. Cox, P. Hanrahan","doi":"10.1109/88.311570","DOIUrl":"https://doi.org/10.1109/88.311570","url":null,"abstract":"Previous pixel-merging algorithms have required special-purpose networks, and use more network bandwidth than is necessary. We developed an algorithm that merges pixels on shared-memory bus multiprocessors, using an existing bus. Analysis and simulations suggest that it uses less bus bandwidth than other algorithms. We based our algorithm on the snooping cache-coherency protocols on which a number of shared-memory multiprocessors have been based. In these architectures, each processor keeps its cache consistent with other processors' memories by listening (snooping) on a shared bus over which memory updates are written. Snooping maintains consistent globally shared memory. This algorithm assists graphics rendering by letting processors compare pixel values and delete those pixels that do not contribute to the final image. This reduces network bandwidth requirements and eliminates the need for a special-purpose network.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129560970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Multiapplication support in a parallel-program performance tool 并行程序性能工具中的多应用程序支持
Pub Date : 1994-03-01 DOI: 10.1109/88.281874
R. Irvin, B. Miller
We added new features for analyzing multiple programs to the IPS-2 parallel-program performance tools and were surprised at the wide range of performance problems for which this modified IPS-2 can be used. With multiapplication IPS-2, programmers can simultaneously run and analyze cooperating or contending applications; combine performance displays and metrics of multiple applications or multiple versions of the same application to directly compare performance; analyze critical paths of execution for individual applications, for a single application and the applications with which it interacts, or for entire workloads; study how the application workload performance affects the hardware, operating system, and network performance; study an application's evolution through multiple versions, hardware platforms, or input sets; study a workload's aggregate behavior, how applications interact, or how individual applications perform in the presence of other applications; and compare the measured performance of a program with predictions made by simulations or analytical models. This modified parallel-program performance tool analyzes multiple applications in a single session, allowing better performance tuning than is possible when programs are run in isolation.<>
我们为IPS-2并行程序性能工具添加了分析多个程序的新功能,并对这个改进后的IPS-2可以用于解决的广泛性能问题感到惊讶。使用多应用程序IPS-2,程序员可以同时运行和分析合作或竞争应用程序;结合多个应用程序或同一应用程序的多个版本的性能显示和指标,直接比较性能;分析单个应用程序、单个应用程序及其交互应用程序或整个工作负载的关键执行路径;研究应用程序工作负载性能如何影响硬件、操作系统和网络性能;通过多个版本、硬件平台或输入集研究应用程序的演变;研究工作负载的总体行为、应用程序如何交互,或者单个应用程序在存在其他应用程序时如何执行;并将程序的测量性能与模拟或分析模型的预测进行比较。此改进的并行程序性能工具分析单个会话中的多个应用程序,从而比程序单独运行时允许更好的性能调优。
{"title":"Multiapplication support in a parallel-program performance tool","authors":"R. Irvin, B. Miller","doi":"10.1109/88.281874","DOIUrl":"https://doi.org/10.1109/88.281874","url":null,"abstract":"We added new features for analyzing multiple programs to the IPS-2 parallel-program performance tools and were surprised at the wide range of performance problems for which this modified IPS-2 can be used. With multiapplication IPS-2, programmers can simultaneously run and analyze cooperating or contending applications; combine performance displays and metrics of multiple applications or multiple versions of the same application to directly compare performance; analyze critical paths of execution for individual applications, for a single application and the applications with which it interacts, or for entire workloads; study how the application workload performance affects the hardware, operating system, and network performance; study an application's evolution through multiple versions, hardware platforms, or input sets; study a workload's aggregate behavior, how applications interact, or how individual applications perform in the presence of other applications; and compare the measured performance of a program with predictions made by simulations or analytical models. This modified parallel-program performance tool analyzes multiple applications in a single session, allowing better performance tuning than is possible when programs are run in isolation.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123209377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Defining, analyzing, and transforming program constructs 定义、分析和转换程序结构
Pub Date : 1994-03-01 DOI: 10.1109/88.281872
Jingke Le, M. Wolfe
We have developed a framework for analyzing the behavior and relations of various sequential and parallel control constructs, which we can nest in a very general way. A simple yet powerful scheme defines the order of data accesses in a program, and provides a well-founded semantic structure for nested constructs. When defining parallel languages or extensions to current languages, designers can use this framework to define how each new feature interacts with the language's other features. Because our approach is based on well-known dependence analysis techniques, it is practical for compiler implementation. It determines which behavior the compiler and system must preserve while allowing aggressive automatic optimization. Instead of being confined to a single programming paradigm, programmers can use the most appropriate constructs for the application, and the compiler can transform and optimize the program for different parallel or sequential architectures.<>
我们已经开发了一个框架来分析各种顺序和并行控制结构的行为和关系,我们可以用一种非常通用的方式嵌套它。一个简单而强大的模式定义了程序中数据访问的顺序,并为嵌套结构提供了良好的语义结构。在定义并行语言或当前语言的扩展时,设计人员可以使用这个框架来定义每个新功能如何与语言的其他功能交互。因为我们的方法基于众所周知的依赖分析技术,所以它对编译器实现是实用的。它决定编译器和系统在允许积极的自动优化时必须保留哪些行为。程序员可以为应用程序使用最合适的结构,而不是局限于单一的编程范式,编译器可以针对不同的并行或顺序体系结构转换和优化程序
{"title":"Defining, analyzing, and transforming program constructs","authors":"Jingke Le, M. Wolfe","doi":"10.1109/88.281872","DOIUrl":"https://doi.org/10.1109/88.281872","url":null,"abstract":"We have developed a framework for analyzing the behavior and relations of various sequential and parallel control constructs, which we can nest in a very general way. A simple yet powerful scheme defines the order of data accesses in a program, and provides a well-founded semantic structure for nested constructs. When defining parallel languages or extensions to current languages, designers can use this framework to define how each new feature interacts with the language's other features. Because our approach is based on well-known dependence analysis techniques, it is practical for compiler implementation. It determines which behavior the compiler and system must preserve while allowing aggressive automatic optimization. Instead of being confined to a single programming paradigm, programmers can use the most appropriate constructs for the application, and the compiler can transform and optimize the program for different parallel or sequential architectures.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128387591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
How to measure, present, and compare parallel performance 如何度量、呈现和比较并行性能
Pub Date : 1994-03-01 DOI: 10.1109/88.281869
L. Crowl
Presentations of parallel performance can be easily and unintentionally distorted. Following some simple guidelines for measuring, presenting and comparing performance can greatly improve your presentation's accuracy and effectiveness.<>
并行性能的呈现很容易被无意地扭曲。遵循一些简单的准则来衡量、展示和比较你的表现,可以大大提高你展示的准确性和有效性
{"title":"How to measure, present, and compare parallel performance","authors":"L. Crowl","doi":"10.1109/88.281869","DOIUrl":"https://doi.org/10.1109/88.281869","url":null,"abstract":"Presentations of parallel performance can be easily and unintentionally distorted. Following some simple guidelines for measuring, presenting and comparing performance can greatly improve your presentation's accuracy and effectiveness.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129129804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Spin-lock synchronization on the Butterfly and KSR1 蝴蝶和KSR1的自旋锁定同步
Pub Date : 1994-03-01 DOI: 10.1109/88.281875
Xiaodong Zhang, R. Castañeda, E. Chan
The drawbacks of the simple spin-lock limit its effective use to small critical sections. Applications with large critical sections and a large number of processors require more efficient algorithms to minimize processor and network overheads. Variations on the spin-lock have been tested on the Sequent Symmetry, a bus-based shared-memory multiprocessor. Algorithms for scalable synchronization have also been tested on the BBN Butterfly I, a large-scale shared-memory multiprocessor with a multistage interconnection network(MIN). We have extended the investigation to the BBN GP1000 and TC2000, both MIN-based multiprocessors with network contention heavier than that on the Butterfly I. We have also implemented algorithms on Kendall Square Research's KSR1, a hierarchical-ring multiprocessor system, to study the effects of cache coherence. The execution behavior of spin-lock algorithms is significantly different between MIN-based and HR-based architectures. Our tests suggest that HR-based architectures handle network and memory contention more efficiently than MIN-based architectures. However, our results also suggest how spin-locks can be made cost-effective on both.<>
简单自旋锁的缺点限制了它在小临界截面上的有效使用。具有大临界区和大量处理器的应用程序需要更有效的算法来最小化处理器和网络开销。在基于总线的共享内存多处理器sequential Symmetry上测试了自旋锁的变化。可扩展同步算法也在BBN Butterfly I上进行了测试,这是一种具有多级互连网络(MIN)的大型共享内存多处理器。我们将研究扩展到BBN GP1000和TC2000,这两种基于mini的多处理器的网络争用比Butterfly i更严重。我们还在Kendall Square Research的KSR1(一个分层环多处理器系统)上实现了算法,以研究缓存一致性的影响。自旋锁算法的执行行为在基于min和基于hr的体系结构之间存在显著差异。我们的测试表明,基于hr的架构比基于min的架构更有效地处理网络和内存争用。然而,我们的结果也提示了如何使自旋锁在两者上都具有成本效益。
{"title":"Spin-lock synchronization on the Butterfly and KSR1","authors":"Xiaodong Zhang, R. Castañeda, E. Chan","doi":"10.1109/88.281875","DOIUrl":"https://doi.org/10.1109/88.281875","url":null,"abstract":"The drawbacks of the simple spin-lock limit its effective use to small critical sections. Applications with large critical sections and a large number of processors require more efficient algorithms to minimize processor and network overheads. Variations on the spin-lock have been tested on the Sequent Symmetry, a bus-based shared-memory multiprocessor. Algorithms for scalable synchronization have also been tested on the BBN Butterfly I, a large-scale shared-memory multiprocessor with a multistage interconnection network(MIN). We have extended the investigation to the BBN GP1000 and TC2000, both MIN-based multiprocessors with network contention heavier than that on the Butterfly I. We have also implemented algorithms on Kendall Square Research's KSR1, a hierarchical-ring multiprocessor system, to study the effects of cache coherence. The execution behavior of spin-lock algorithms is significantly different between MIN-based and HR-based architectures. Our tests suggest that HR-based architectures handle network and memory contention more efficiently than MIN-based architectures. However, our results also suggest how spin-locks can be made cost-effective on both.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117300167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Compiling functional parallelism on distributed-memory systems 在分布式内存系统上编译函数并行性
Pub Date : 1994-03-01 DOI: 10.1109/88.281878
S. Pande, D. Agrawal, J. Mauney
We have developed an automatic compilation method that combines data- and code-based approaches to schedule a program's functional parallelism onto distributed memory systems. Our method works with Sisal, a parallel functional language, and replaces the back end of the Optimizing Sisal Compiler so that it produces code for distributed memory systems. Our extensions allow the compiler to generate code for Intel's distributed-memory Touchstone iPSC/860 machines (Gamma, Delta, and Paragon). The modified compiler can generate a partition that minimizes program completion time (for systems with many processors) or the required number of processors (for systems with few processors). To accomplish this, we have developed a heuristic algorithm that uses the new concept of threshold to treat the problem of scheduling as a trade-off between schedule length and the number of required processors. Most compilers for distributed memory systems force the programmer to partition the data or the program code. This modified version of a Sisal compiler handles both tasks automatically in a unified framework, and lets the programmer compile for a chosen number of processors.<>
我们已经开发了一种自动编译方法,它结合了基于数据和基于代码的方法来将程序的功能并行性调度到分布式内存系统上。我们的方法与并行函数语言Sisal一起工作,并取代了优化Sisal编译器的后端,以便为分布式内存系统生成代码。我们的扩展允许编译器为Intel的分布式内存Touchstone iPSC/860机器(Gamma, Delta和Paragon)生成代码。修改后的编译器可以生成一个分区,以最小化程序完成时间(对于具有许多处理器的系统)或最小化所需的处理器数量(对于具有很少处理器的系统)。为了实现这一点,我们开发了一种启发式算法,该算法使用阈值的新概念将调度问题视为调度长度和所需处理器数量之间的权衡。大多数分布式内存系统的编译器都迫使程序员对数据或程序代码进行分区。这个Sisal编译器的修改版本在一个统一的框架中自动处理这两个任务,并允许程序员为选定的处理器数量进行编译。
{"title":"Compiling functional parallelism on distributed-memory systems","authors":"S. Pande, D. Agrawal, J. Mauney","doi":"10.1109/88.281878","DOIUrl":"https://doi.org/10.1109/88.281878","url":null,"abstract":"We have developed an automatic compilation method that combines data- and code-based approaches to schedule a program's functional parallelism onto distributed memory systems. Our method works with Sisal, a parallel functional language, and replaces the back end of the Optimizing Sisal Compiler so that it produces code for distributed memory systems. Our extensions allow the compiler to generate code for Intel's distributed-memory Touchstone iPSC/860 machines (Gamma, Delta, and Paragon). The modified compiler can generate a partition that minimizes program completion time (for systems with many processors) or the required number of processors (for systems with few processors). To accomplish this, we have developed a heuristic algorithm that uses the new concept of threshold to treat the problem of scheduling as a trade-off between schedule length and the number of required processors. Most compilers for distributed memory systems force the programmer to partition the data or the program code. This modified version of a Sisal compiler handles both tasks automatically in a unified framework, and lets the programmer compile for a chosen number of processors.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126766456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Teraflops and other false goals 万亿次浮点运算和其他错误的目标
Pub Date : 1994-01-22 DOI: 10.1109/MCC.1994.10013
J. Gustafson
The High-Performance Computing and Communications program has been attacked for having vague and dubious goals. Partly because of this perception, 65 members of the House of Representatives cast votes against extending it. We need concrete measures of HPCC progress without such narrowly defined goals. Measuring performance by high flops rates, speedup, and hardware efficiency can take us further from the solution to scientific problems, not closer. This paradox is especially pronounced for "Grand Challenge" and "teraflops computing". The author considers how we need a practical way to define and communicate ends-based performance of an application, not means-based measures such as teraflops or double precision. Human productivity issues such as development time and cost and the quality of the knowledge we obtain should be the basis of our performance metrics.<>
高性能计算和通信项目因目标模糊和可疑而受到攻击。部分由于这种看法,65名众议院议员投票反对延长该法案。我们需要没有这种狭隘目标的HPCC进展的具体措施。通过高吞吐量、加速和硬件效率来衡量性能可能会使我们离科学问题的解决方案更远,而不是更近。这种悖论在“大挑战”和“万亿次浮点运算”中尤为明显。作者认为我们需要一种实用的方法来定义和交流基于终端的应用程序性能,而不是基于方法的度量,如teraflops或双精度。人类生产力问题,如开发时间和成本,以及我们获得的知识的质量,应该是我们绩效指标的基础
{"title":"Teraflops and other false goals","authors":"J. Gustafson","doi":"10.1109/MCC.1994.10013","DOIUrl":"https://doi.org/10.1109/MCC.1994.10013","url":null,"abstract":"The High-Performance Computing and Communications program has been attacked for having vague and dubious goals. Partly because of this perception, 65 members of the House of Representatives cast votes against extending it. We need concrete measures of HPCC progress without such narrowly defined goals. Measuring performance by high flops rates, speedup, and hardware efficiency can take us further from the solution to scientific problems, not closer. This paradox is especially pronounced for \"Grand Challenge\" and \"teraflops computing\". The author considers how we need a practical way to define and communicate ends-based performance of an application, not means-based measures such as teraflops or double precision. Human productivity issues such as development time and cost and the quality of the knowledge we obtain should be the basis of our performance metrics.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129492909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Distributed computation of wave propagation models using PVM 基于PVM的波传播模型分布式计算
Pub Date : 1993-12-01 DOI: 10.1145/169627.169642
R. Ewing, R. Sharpley, D. Mitchum, Patrick O Leary, J. Sochacki
The Parallel Virtual Machine(PVM) allows researchers to connect workstations, mini-supercomputers, or specialty machines to form a relatively inexpensive, powerful, parallel computer. Such hardware is frequently abundant at research locations, so PVM incurs little or no hardware costs. PVM is also flexible: it uses existing communication networks (Ethernet or fiber) and remote procedural libraries; it lets programmers use either C or Fortran; and it can emulate several commercial architectures including hypercubes, meshes, and rings. The authors believe that PVM can compete effectively with traditional supercomputers, and they have demonstrated its computational power and cost-effectiveness by simulating the propagation of seismic waves using an isolated Ethernet ring comprising an IBM RS/6000 550 as the host and six RS/6000 320H machines as the nodes.<>
并行虚拟机(PVM)允许研究人员将工作站、微型超级计算机或专用机器连接起来,形成一台相对便宜、功能强大的并行计算机。这样的硬件在研究地点通常很丰富,因此PVM只需要很少的硬件成本,甚至不需要硬件成本。PVM也很灵活:它使用现有的通信网络(以太网或光纤)和远程过程库;它允许程序员使用C或Fortran;它可以模拟几种商业架构,包括超立方体、网格和环。作者认为,PVM可以有效地与传统的超级计算机竞争,并且他们已经通过使用由IBM RS/6000 550作为主机和六台RS/6000 320H作为节点的孤立以太网环模拟地震波的传播来证明其计算能力和成本效益。
{"title":"Distributed computation of wave propagation models using PVM","authors":"R. Ewing, R. Sharpley, D. Mitchum, Patrick O Leary, J. Sochacki","doi":"10.1145/169627.169642","DOIUrl":"https://doi.org/10.1145/169627.169642","url":null,"abstract":"The Parallel Virtual Machine(PVM) allows researchers to connect workstations, mini-supercomputers, or specialty machines to form a relatively inexpensive, powerful, parallel computer. Such hardware is frequently abundant at research locations, so PVM incurs little or no hardware costs. PVM is also flexible: it uses existing communication networks (Ethernet or fiber) and remote procedural libraries; it lets programmers use either C or Fortran; and it can emulate several commercial architectures including hypercubes, meshes, and rings. The authors believe that PVM can compete effectively with traditional supercomputers, and they have demonstrated its computational power and cost-effectiveness by simulating the propagation of seismic waves using an isolated Ethernet ring comprising an IBM RS/6000 550 as the host and six RS/6000 320H machines as the nodes.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128471397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
IEEE Parallel & Distributed Technology: Systems & Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1