ACM/IEEE SC 2002 Conference (SC'02)最新文献

英文中文

Asserting Performance Expectations 确立绩效预期

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10046

J. Vetter, P. Worley

Traditional techniques for performance analysis provide a means for extracting and analyzing raw performance information from applications. Users then compare this raw data to their performance expectations for application constructs. This comparison can be tedious for the scale of today's architectures and software systems. To address this situation, we present a methodology and prototype that allows users to assert performance expectations explicitly in their source code using performance assertions. As the application executes, each performance assertion in the application collects data implicitly to verify the assertion. By allowing the user to specify a performance expectation with individual code segments, the runtime system can jettison raw data for measurements that pass their expectation, while reacting to failures with a variety of responses. We present several compelling uses of performance assertions with our operational prototype, including raising a performance exception, validating a performance model, and adapting an algorithm empirically at runtime.

传统的性能分析技术提供了一种从应用程序中提取和分析原始性能信息的方法。然后，用户将这些原始数据与他们对应用程序结构的性能期望进行比较。对于今天的架构和软件系统的规模，这种比较可能是乏味的。为了解决这种情况，我们提出了一种方法和原型，允许用户使用性能断言在其源代码中显式地断言性能期望。在应用程序执行时，应用程序中的每个性能断言都会隐式地收集数据以验证断言。通过允许用户使用单独的代码段指定性能期望，运行时系统可以丢弃原始数据，以满足他们的期望，同时用各种响应对失败作出反应。在我们的操作原型中，我们展示了性能断言的几种引人注目的用法，包括引发性能异常、验证性能模型以及在运行时根据经验调整算法。

引用次数: 58

Active Proxy-G: Optimizing the Query Execution Process in the Grid 主动代理- g:优化网格中的查询执行过程

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10031

H. Andrade, T. Kurç, A. Sussman, J. Saltz

The Grid environment facilitates collaborative work and allows many users to query and process data over geographically dispersed data repositories. Over the past several years, there has been a growing interest in developing applications that interactively analyze datasets, potentially in a collaborative setting. We describe the Active Proxy-G service that is able to cache query results, use those results for answering new incoming queries, generate subqueries for the parts of a query that cannot be produced from the cache, and submit the subqueries for final processing at application servers that store the raw datasets. We present an experimental evaluation to illustrate the effects of various design tradeoffs. We also show the benefits that two real applications gain from using the middleware.

网格环境促进了协作工作，并允许许多用户在地理上分散的数据存储库上查询和处理数据。在过去的几年里，人们对开发交互式分析数据集的应用程序越来越感兴趣，这可能是在协作环境中实现的。我们描述了Active Proxy-G服务，它能够缓存查询结果，使用这些结果来回答新的传入查询，为无法从缓存中生成的查询部分生成子查询，并将子查询提交给存储原始数据集的应用程序服务器进行最终处理。我们提出了一个实验评估来说明各种设计权衡的影响。我们还展示了两个实际应用程序从使用中间件中获得的好处。

引用次数: 35

Scalable Directory Services Using Proactivity 使用主动性的可伸缩目录服务

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.5555/762761.762786

F. Bustamante, Patrick M. Widener, K. Schwan

Common to computational grids and pervasive computing is the need for an expressive, efficient, and scalable directory service that provides information about objects in the environment. We argue that a directory interface that ‘pushes’ information to clients about changes to objects can significantly improve scalability. This paper describes the design, implementation, and evaluation of the Proactive Directory Service (PDS). PDS’ interface supports a customizable ‘proactive’ mode through which clients can subscribe to be notified about changes to their objects of interest. Clients can dynamically tune the detail and granularity of these notifications through filter functions instantiated at the server or at the object’s owner, and by remotely tuning the functionality of those filters. We compare PDS’ performance against off-the-shelf implementations of DNS and the Lightweight Directory Access Protocol. Our evaluation results confirm the expected performance advantages of this approach and demonstrate that customized notification through filter functions can reduce bandwidth utilization while improving the performance of both clients and directory servers.

计算网格和普适计算的共同点是需要一种表达性强、高效且可伸缩的目录服务，该服务提供有关环境中对象的信息。我们认为，目录接口将对象更改的信息“推送”给客户端，可以显著提高可伸缩性。本文描述了主动目录服务(PDS)的设计、实现和评估。PDS的接口支持可定制的“主动”模式，通过该模式，客户端可以订阅有关其感兴趣的对象的更改的通知。客户端可以通过在服务器或对象所有者处实例化的过滤器函数，以及通过远程调优这些过滤器的功能，动态地调优这些通知的细节和粒度。我们将PDS的性能与现有的DNS实现和轻量级目录访问协议进行比较。我们的评估结果证实了这种方法的预期性能优势，并证明通过过滤功能定制通知可以降低带宽利用率，同时提高客户端和目录服务器的性能。

引用次数: 32

Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply 稀疏矩阵向量乘法的性能优化与边界

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10025

R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, Benjamin C. Lee

We consider performance tuning, by code and data structure reorganization, of sparse matrix-vector multiply (SpM×V), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how closely tuned code approaches these limits. Specifically, we develop upper and lower bounds on the performance (Mflop/s) of SpM×V when tuned using our previously proposed register blocking optimization. These bounds are based on the non-zero pattern in the matrix and the cost of basic memory operations, such as cache hits and misses. We evaluate our tuned implementations with respect to these bounds using hardware counter data on 4 different platforms and on test set of 44 sparse matrices. We find that we can often get within 20% of the upper bound, particularly on class of matrices from finite element modeling (FEM) problems; on non-FEM matrices, performance improvements of 2× are still possible. Lastly, we present new heuristic that selects optimal or near-optimal register block sizes (the key tuning parameters) more accurately than our previous heuristic. Using the new heuristic, we show improvements in SpM×V performance (Mflop/s) by as much as 2.5× over an untuned implementation. Collectively, our results suggest that future performance improvements, beyond those that we have already demonstrated for SpM×V, will come from two sources: (1) consideration of higher-level matrix structures (e.g. exploiting symmetry, matrix reordering, multiple register block sizes), and (2) optimizing kernels with more opportunity for data reuse (e.g. sparse matrix-multiple vector multiply, multiplication of AT A by a vector).

我们考虑通过代码和数据结构重组的稀疏矩阵向量乘法(SpM×V)的性能调优，这是科学应用中最重要的计算内核之一。本文讨论的基本问题是这种性能调优存在哪些限制，以及调优代码如何接近这些限制。具体来说，我们在使用之前提出的寄存器阻塞优化进行调优时，开发了SpM×V性能(Mflop/s)的上限和下限。这些边界基于矩阵中的非零模式和基本内存操作的成本，例如缓存命中和未命中。我们使用4个不同平台上的硬件计数器数据和44个稀疏矩阵的测试集来评估我们针对这些边界的优化实现。我们发现，我们经常可以得到上界的20%以内，特别是在一类从有限元建模(FEM)问题的矩阵;在非fem矩阵上，性能提高2倍仍然是可能的。最后，我们提出了一种新的启发式算法，它比以前的启发式算法更准确地选择最优或接近最优的寄存器块大小(关键调优参数)。使用新的启发式算法，我们发现SpM×V性能(Mflop/s)比未调优的实现提高了2.5倍。总的来说，我们的结果表明，未来的性能改进，除了我们已经为SpM×V展示的那些，将来自两个来源:(1)考虑更高级别的矩阵结构(例如利用对称性，矩阵重排序，多寄存器块大小)，以及(2)优化内核，提供更多的数据重用机会(例如稀疏矩阵-多向量乘法，AT A乘以向量)。

{"title":"Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply","authors":"R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, Benjamin C. Lee","doi":"10.1109/SC.2002.10025","DOIUrl":"https://doi.org/10.1109/SC.2002.10025","url":null,"abstract":"We consider performance tuning, by code and data structure reorganization, of sparse matrix-vector multiply (SpM×V), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how closely tuned code approaches these limits. Specifically, we develop upper and lower bounds on the performance (Mflop/s) of SpM×V when tuned using our previously proposed register blocking optimization. These bounds are based on the non-zero pattern in the matrix and the cost of basic memory operations, such as cache hits and misses. We evaluate our tuned implementations with respect to these bounds using hardware counter data on 4 different platforms and on test set of 44 sparse matrices. We find that we can often get within 20% of the upper bound, particularly on class of matrices from finite element modeling (FEM) problems; on non-FEM matrices, performance improvements of 2× are still possible. Lastly, we present new heuristic that selects optimal or near-optimal register block sizes (the key tuning parameters) more accurately than our previous heuristic. Using the new heuristic, we show improvements in SpM×V performance (Mflop/s) by as much as 2.5× over an untuned implementation. Collectively, our results suggest that future performance improvements, beyond those that we have already demonstrated for SpM×V, will come from two sources: (1) consideration of higher-level matrix structures (e.g. exploiting symmetry, matrix reordering, multiple register block sizes), and (2) optimizing kernels with more opportunity for data reuse (e.g. sparse matrix-multiple vector multiply, multiplication of AT A by a vector).","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117330874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 152

Compact Application Signatures for Parallel and Distributed Scientific Codes 并行和分布式科学代码的紧凑应用签名

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10059

Charng-Da Lu, D. Reed

Understanding the dynamic behavior of parallel programs is key to developing efficient system software and runtime environments; this is even more true on emerging computational Grids where resource availability and performance can change in unpredictable ways. Event tracing provides details on behavioral dynamics, albeit often at great cost. We describe an intermediate approach, based on curve fitting, that retains many of the advantages of event tracing but with lower overhead. These compact "application signatures" summarize the time-varying resource needs of scientific codes from historical trace data. We also developed a comparison scheme that measures similarity between two signatures, both across executions and across execution environments.

了解并行程序的动态行为是开发高效的系统软件和运行环境的关键;在资源可用性和性能以不可预测的方式变化的新兴计算网格中更是如此。事件跟踪提供了行为动态的细节，尽管通常代价很大。我们描述了一种基于曲线拟合的中间方法，它保留了事件跟踪的许多优点，但开销更低。这些紧凑的“应用签名”从历史跟踪数据中总结了科学代码随时间变化的资源需求。我们还开发了一个比较方案，用于度量跨执行和跨执行环境的两个签名之间的相似性。

引用次数: 27

Executing Multiple Pipelined Data Analysis Operations in the Grid 在网格中执行多个流水线数据分析操作

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10015

M. Spencer, R. Ferreira, M. Beynon, T. Kurç, Ümit V. Çatalyürek, A. Sussman, J. Saltz

Processing of data in many data analysis applications can be represented as an acyclic, coarse grain data flow, from data sources to the client. This paper is concerned with scheduling of multiple data analysis operations, each of which is represented as a pipelined chain of processing on data. We define the scheduling problem for effectively placing components onto Grid resources, and propose two scheduling algorithms. Experimental results are presented using a visualization application.

在许多数据分析应用程序中，数据处理可以表示为从数据源到客户端的无循环、粗粒度数据流。本文研究了多个数据分析操作的调度问题，每个数据分析操作都被表示为数据处理的流水线链。定义了有效地将组件放置到网格资源上的调度问题，并提出了两种调度算法。实验结果用可视化软件给出。

引用次数: 57

Interoperable Web Services for Computational Portals 用于计算门户的可互操作Web服务

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10030

M. Pierce, G. Fox, Choon-Han Youn, S. Mock, K. Mueller, Ozgur Balsoy

Computational web portals are designed to simplify access to diverse sets of high performance computing resources, typically through an interface to computational Grid tools. An important shortcoming of these portals is their lack of interoperable and reusable services. This paper presents an overview of research efforts undertaken by our group to build interoperating portal services around a Web Services model. We present a comprehensive view of an interoperable portal architecture, beginning with core portal services that can be used to build Application Web Services, which in turn may be aggregated and managed through portlet containers.

计算web门户旨在简化对各种高性能计算资源集的访问，通常通过计算网格工具的接口。这些门户的一个重要缺点是缺乏可互操作和可重用的服务。本文概述了我们小组围绕Web服务模型构建互操作门户服务所做的研究工作。我们将全面介绍可互操作的门户体系结构，首先介绍可用于构建Application Web services的核心门户服务，这些服务可以通过portlet容器进行聚合和管理。

引用次数: 52

SIGMA: A Simulator Infrastructure to Guide Memory Analysis SIGMA:引导内存分析的模拟器基础结构

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10055

L. D. Rose, K. Ekanadham, J. Hollingsworth, S. Sbaraglia

In this paper we present SIGMA (Simulation Infrastructure to Guide Memory Analysis), a new data collection framework and family of cache analysis tools. The SIGMA environment provides detailed cache information by gathering memory reference data using software-based instrumentation. This infrastructure can facilitate quick probing into the factors that influence the performance of an application by highlighting bottleneck scenarios including: excessive cache/TLB misses and inefficient data layouts. The tool can also assist in perturbation analysis to determine performance variations caused by changes to architecture or program. Our validation tests using the SPEC Swim benchmark show that most of the performance metrics obtained with SIGMA are within 1% of the metrics obtained with hardware performance counters, with the advantage that SIGMA provides performance data on a data structure level, as specified by the programmer.

在本文中，我们提出了SIGMA(模拟基础设施来指导内存分析)，一个新的数据收集框架和缓存分析工具家族。SIGMA环境通过使用基于软件的工具收集内存参考数据来提供详细的缓存信息。这个基础设施可以通过突出瓶颈场景来促进快速探测影响应用程序性能的因素，这些瓶颈场景包括:过多的缓存/TLB丢失和低效的数据布局。该工具还可以帮助进行扰动分析，以确定由架构或程序更改引起的性能变化。我们使用SPEC Swim基准的验证测试表明，使用SIGMA获得的大多数性能指标都在使用硬件性能计数器获得的指标的1%以内，其优点是SIGMA在数据结构级别上提供性能数据，由程序员指定。

引用次数: 82

A New Scheduling Algorithm for Parallel Sparse LU Factorization with Static Pivoting 一种新的静态旋转并行稀疏LU分解调度算法

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10032

L. Grigori, X. Li

In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of LT and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU_DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.

本文提出了一种具有静态旋转的并行稀疏LU分解的静态调度算法。该算法分为映射和调度两个阶段，使用LT和U的对称剪枝图来表示依赖关系。调度算法是为了在分布式内存架构下驱动分解的并行执行而设计的。在IBM SP RS/6000分布式存储机上对实际应用矩阵进行了实验，并与SuperLU_DIST进行了比较。

引用次数: 13

Data Reservoir: Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research 数据库:利用多千兆骨干网进行数据密集型研究

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.5555/762761.762826

K. Hiraki, M. Inaba, J. Tamatsukuri, Ryutaro Kurusu, Yukichi Ikuta, Hisashi Koga, A. Zinzaki

We propose data sharing facility for data intensive scientific research, "Data Reservoir"; which is optimized to transfer huge amount of data files between distant places fully Utilizing multi-gigabit backbone network. In addition, "Data Reservoir" can be used as an ordinary UNIX server in local network without any modification of server softwares. We use low-level protocol and hierarchical striping to realize (1) separation of bulk data transfer and local accesses by cashing, (2) file-system transparency, I.e. interoperable whatever in higher layer than disk driver, including file system. (3) scalability for network and storage. This paper shows our design, implementation using iSCSI protocol [1] and their performances for both 1Gbps model in the real network and 10Gbps model in our laboratory.

我们提出了数据密集型科学研究的数据共享设施“数据库”;充分利用千兆骨干网，在远距离传输海量数据文件。此外，无需修改服务器软件，“Data Reservoir”可以作为本地网络中的普通UNIX服务器使用。我们使用底层协议和分层条带来实现(1)通过cash实现批量数据传输和本地访问的分离;(2)文件系统透明性，即比磁盘驱动程序更高层的任何东西都可以互操作，包括文件系统。(3)网络和存储的可扩展性。本文展示了我们使用iSCSI协议的设计、实现[1]以及它们在实际网络中的1Gbps模型和我们实验室的10Gbps模型下的性能。

引用次数: 16

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM/IEEE SC 2002 Conference (SC'02)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀