2012 SC Companion: High Performance Computing, Networking Storage and Analysis最新文献

英文中文

High Performance Computing and Networking: Select Proceedings of CHSN 2021 高性能计算与网络:CHSN 2021论文集

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2022-01-01 DOI: 10.1007/978-981-16-9885-9

引用次数: 1

Poster: Memory-Conscious Collective I/O for Extreme-Scale HPC Systems 海报:超大规模高性能计算系统的内存意识集体I/O

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2013-06-10 DOI: 10.1145/2491661.2481430

Yin Lu, Yong Chen, R. Thakur, Zhuang Yu

The continuing decrease in memory capacity per core and the increasing disparity between core count and off-chip memory bandwidth create significant challenges for I/O operations in exascale systems. The exascale challenges require rethinking collective I/O for the effective exploitation of the correlation among I/O accesses in the exascale system. In this study we introduce a Memory-Conscious Collective I/O considering the constraint of the memory space. 1)Restricts aggregation data traffic within disjointed subgroups 2)Coordinates I/O accesses in intra-node and inter-node layer 3)Determines I/O aggregators at run time considering data distribution and memory consumption among processes.

每核内存容量的持续下降以及核数和片外内存带宽之间的差距越来越大，给百亿亿级系统中的I/O操作带来了重大挑战。百亿亿级的挑战需要重新考虑集体I/O，以便有效地利用百亿亿级系统中I/O访问之间的相关性。在本研究中，我们引入了一种考虑到内存空间约束的内存意识集体I/O。1)在不连接的子组中限制聚合数据流量2)协调节点内和节点间层的I/O访问3)在运行时考虑进程之间的数据分布和内存消耗确定I/O聚合器。

引用次数: 6

Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation 摘要:混合精度浮点计算的自动自适应程序

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2013-06-10 DOI: 10.1145/2464996.2465018

Michael O. Lam, B. Supinski, M. LeGendre, J. Hollingsworth

As scientific computation continues to scale, it is crucial to use floating-point arithmetic processors as efficiently as possible. Lower precision allows streaming architectures to perform more operations per second and can reduce memory bandwidth pressure on all architectures. However, using a precision that is too low for a given algorithm and data set will result in inaccurate results. In this poster, we present a framework that uses binary instrumentation and modification to build mixed-precision configurations of existing binaries that were originally developed to use only double-precision. This allows developers to easily experiment with mixed-precision configurations without modifying their source code, and it permits auto-tuning of floating-point precision. We also implemented a simple search algorithm to automatically identify which code regions can use lower precision. We include results for several benchmarks that show both the efficacy and overhead of our tool.

随着科学计算的不断扩展，尽可能高效地使用浮点算术处理器是至关重要的。较低的精度允许流架构每秒执行更多的操作，并且可以减少所有架构的内存带宽压力。然而，对于给定的算法和数据集，使用过低的精度将导致不准确的结果。在这张海报中，我们展示了一个框架，它使用二进制工具和修改来构建混合精度配置的现有二进制文件，而这些文件最初开发时只使用双精度。这允许开发人员在不修改源代码的情况下轻松地试验混合精度配置，并且允许自动调优浮点精度。我们还实现了一个简单的搜索算法来自动识别哪些代码区域可以使用较低的精度。我们包含了几个基准测试的结果，这些结果显示了我们的工具的效率和开销。

引用次数: 64

High Quality Real-Time Image-to-Mesh Conversion for Finite Element Simulations 高质量的实时图像到网格的有限元模拟转换

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2013-06-10 DOI: 10.1145/2464996.2465439

Panagiotis A. Foteinos, N. Chrisochoides

In this poster, we present a parallel Image-to-Mesh Conversion (I2M) algorithm with quality and fidelity guarantees achieved by dynamic point insertions and removals. Starting directly from an image, it is able to recover the surface and mesh the volume with tetrahedra of good shape. Our tightly-coupled shared-memory parallel speculative execution paradigm employs carefully designed memory and contention managers, load balancing, synchronization and optimizations schemes, while it maintains high single-threaded performance: our single-threaded performance is faster than CGAL, the state of the art sequential I2M software we are aware of. Our meshes come also with theoretical guarantees: the radius-edge is less than 2 and the planar angles of the boundary triangles are more than 30 degrees. The effectiveness of our method is shown on Blacklight, the large cache-coherent NUMA machine of Pittsburgh Supercomputing Center. We observe a more than 74% strong scaling efficiency for up to 128 cores and a super-linear weak scaling efficiency for up to 128 cores.

在这张海报中，我们提出了一种并行图像到网格转换(I2M)算法，通过动态点插入和移除来保证质量和保真度。直接从图像开始，它能够恢复表面，并用良好形状的四面体网格化体积。我们的紧密耦合共享内存并行推测执行范例采用了精心设计的内存和争用管理器、负载平衡、同步和优化方案，同时保持了较高的单线程性能:我们的单线程性能比CGAL更快，CGAL是我们所知道的最先进的串行I2M软件。我们的网格也有理论上的保证:半径边缘小于2，边界三角形的平面角大于30度。在匹兹堡超级计算中心的大型缓存相干NUMA机器Blacklight上验证了该方法的有效性。我们观察到在128核的情况下，超过74%的强缩放效率和超线性的弱缩放效率。

引用次数: 42

Abstract: Virtual Machine Packing Algorithms for Lower Power Consumption 面向低功耗的虚拟机封装算法

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2012-12-03 DOI: 10.1109/CloudCom.2012.6427493

Satoshi Takahashi, H. Nakada, A. Takefusa, T. Kudoh, Maiko Shigeno, Akiko Yoshise

VM (Virtual Machine)-based flexible capacity man- agement is an effective scheme to reduce total power consumption in the data center. However, there have been the following issues, tradeoff of power-saving and user experience, decision of VM packing in feasible calculation time and collision avoidance of VM migration processes. In order to resolve these issues, we propose a matching-based and a greedy-type VM packing algorithm, which enables to decide a suitable VM packing plan in polynomial time. The experiments evaluate not only a basic performance, but also a feasibility of the algorithms by comparing with optimization solvers. The feasibility experiment uses a super computer trace data prepared by Center for Computational Sciences of Univer- sity of Tsukuba. The basic performance experiment shows that the algorithms reduce total power consumption by between 18% and 50%.

基于虚拟机的灵活容量管理是降低数据中心总功耗的有效方案。然而，在迁移过程中存在着节能与用户体验的权衡、可行计算时间内虚拟机打包的决策以及虚拟机迁移过程的冲突避免等问题。为了解决这些问题，我们提出了一种基于匹配的贪婪型虚拟机封装算法，该算法能够在多项式时间内确定合适的虚拟机封装方案。实验不仅评估了算法的基本性能，而且通过与优化求解器的比较，验证了算法的可行性。可行性实验使用了筑波大学计算科学中心准备的超级计算机跟踪数据。基本性能实验表明，该算法可将总功耗降低18% ~ 50%。

引用次数: 5

Abstract: Digitization and Search: A Non-Traditional Use of HPC 摘要:数字化与搜索:高性能计算的一种非传统应用

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2012-12-01 DOI: 10.1109/SC.Companion.2012.259

Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry

We describe our efforts to provide a form of automated search of handwritten content for digitized document archives. To carry out the search we use a computer vision technique called word spotting. A form of content based image retrieval, it avoids the still difficult task of directly recognizing text by allowing a user to search using a query image containing handwritten text and ranking a database of images in terms of those that contain more similar looking content. In order to make this search capability available on an archive three computationally expensive pre-processing steps are required. We augment this automated portion of the process with a passive crowd sourcing element that mines queries from the systems users in order to then improve the results of future queries. We benchmark the proposed framework on 1930s Census data, a collection of roughly 3.6 million forms and 7 billion individual units of information.

我们描述了我们为数字化文档档案提供手写内容自动搜索形式的努力。为了进行搜索，我们使用了一种叫做单词定位的计算机视觉技术。它是一种基于内容的图像检索形式，它允许用户使用包含手写文本的查询图像进行搜索，并根据包含更相似内容的图像对图像数据库进行排序，从而避免了直接识别文本的困难任务。为了在存档中提供这种搜索功能，需要执行三个计算代价高昂的预处理步骤。我们用一个被动的众包元素来增强这个过程的自动化部分，这个元素挖掘来自系统用户的查询，以便改进未来查询的结果。我们以20世纪30年代的人口普查数据为基准，收集了大约360万份表格和70亿个单独的信息单位。

引用次数: 1

Flexible Analysis Software for Emerging Architectures 新兴架构的灵活分析软件

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.115

K. Moreland, Brad King, Robert Maynard, K. Ma

We are on the threshold of a transformative change in the basic architecture of high-performance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. In this paper we describe our approach to address these two immediate problems with respect to scientific analysis and visualization algorithms. Our approach to accelerator programming forms the basis of the Dax toolkit, a framework to build data analysis and visualization algorithms applicable to exascale computing.

我们正处在高性能计算基本架构变革的门槛上。使用加速器处理器的特点是核数大、共享但不对称的内存和繁重的线程负载，这些正在迅速成为高性能计算的标准。这些加速器代表了更新我们现有软件基础的重大挑战。这种转换的一个内在问题是，从消息传递进程到使用内存共享的更精细的线程调度的基本编程转变。另一个问题是加速器的实施缺乏稳定性;处理器和编译器技术目前正在迅速变化。在本文中，我们描述了我们的方法来解决这两个关于科学分析和可视化算法的直接问题。我们的加速器编程方法构成了Dax工具包的基础，这是一个用于构建适用于百亿亿次计算的数据分析和可视化算法的框架。

引用次数: 10

Acceleration of Data-Intensive Workflow Applications by Using File Access History 利用文件访问历史加速数据密集型工作流应用程序

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.31

Miki Horiuchi, K. Taura

Data I/O has been one of major bottlenecks in the execution of data-intensive workflow applications. Appropriate task scheduling of a workflow can achieve high I/O throughput by reducing remote data accesses. However, most such task scheduling algorithms require the user to explicitly describe files to be accessed by each job, typically by stage-in/stage-out directives in job description, where such annotations are at best tedious and sometime impossible. Thus, a more automated mechanism is necessary. In this paper, we propose a method for predicting input/output files of each job without user-supplied annotations. It predicts I/O files by collecting file access history in a profiling run prior to the production run. We implemented the proposed method in a workflow system GXP Make and a distributed file system Mogami. We evaluate our system with two real workflow applications. Our data-aware job scheduler increases the ratio of local file accesses from 50% to 75% in one application and from 23% to 45% in the other. As a result, it reduces the makespan of the two applications by 2.5% and 7.5%, respectively.

数据I/O一直是执行数据密集型工作流应用程序的主要瓶颈之一。适当的工作流任务调度可以通过减少远程数据访问来实现高I/O吞吐量。然而，大多数这样的任务调度算法要求用户显式地描述每个作业要访问的文件，通常是通过作业描述中的分阶段进入/分阶段退出指令，这样的注释充其量是乏味的，有时是不可能的。因此，一个更加自动化的机制是必要的。在本文中，我们提出了一种无需用户提供注释来预测每个作业的输入/输出文件的方法。它通过在生产运行之前收集分析运行中的文件访问历史来预测I/O文件。我们在工作流系统GXP Make和分布式文件系统Mogami中实现了该方法。我们用两个真实的工作流应用程序来评估我们的系统。我们的数据感知作业调度器将一个应用程序中的本地文件访问比率从50%提高到75%，在另一个应用程序中从23%提高到45%。因此，它将两个应用程序的完工时间分别缩短了2.5%和7.5%。

{"title":"Acceleration of Data-Intensive Workflow Applications by Using File Access History","authors":"Miki Horiuchi, K. Taura","doi":"10.1109/SC.Companion.2012.31","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.31","url":null,"abstract":"Data I/O has been one of major bottlenecks in the execution of data-intensive workflow applications. Appropriate task scheduling of a workflow can achieve high I/O throughput by reducing remote data accesses. However, most such task scheduling algorithms require the user to explicitly describe files to be accessed by each job, typically by stage-in/stage-out directives in job description, where such annotations are at best tedious and sometime impossible. Thus, a more automated mechanism is necessary. In this paper, we propose a method for predicting input/output files of each job without user-supplied annotations. It predicts I/O files by collecting file access history in a profiling run prior to the production run. We implemented the proposed method in a workflow system GXP Make and a distributed file system Mogami. We evaluate our system with two real workflow applications. Our data-aware job scheduler increases the ratio of local file accesses from 50% to 75% in one application and from 23% to 45% in the other. As a result, it reduces the makespan of the two applications by 2.5% and 7.5%, respectively.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"1 1","pages":"157-165"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75160574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Abstract: Exploring Performance Data with Boxfish 摘要:利用Boxfish挖掘性能数据

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.202

Katherine E. Isaacs, Aaditya G. Landge, T. Gamblin, P. Bremer, Valerio Pascucci, B. Hamann

The growth in size and complexity of scaling applications and the systems on which they run pose challenges in analyzing and improving their overall performance. With metrics coming from thousands or millions of processes, visualization techniques are necessary to make sense of the increasing amount of data. To aid the process of exploration and understanding, we announce the initial release of Boxfish, an extensible tool for manipulating and visualizing data pertaining to application behavior. Combining and visually presenting data and knowledge from multiple domains, such as the application's communication patterns and the hardware's network configuration and routing policies, can yield the insight necessary to discover the underlying causes of observed behavior. Boxfish allows users to query, filter and project data across these domains to create interactive, linked visualizations.

扩展应用程序及其运行的系统的规模和复杂性的增长对分析和改进其整体性能提出了挑战。由于指标来自数千或数百万个流程，因此需要可视化技术来理解不断增加的数据量。为了帮助探索和理解的过程，我们宣布Boxfish的初始版本，这是一个可扩展的工具，用于操作和可视化与应用程序行为有关的数据。将来自多个领域(例如应用程序的通信模式和硬件的网络配置和路由策略)的数据和知识组合并可视化地呈现出来，可以产生必要的洞察力，从而发现观察到的行为的潜在原因。Boxfish允许用户查询、过滤和项目跨这些域的数据，以创建交互式的、链接的可视化。

引用次数: 18

Abstract: Networking Research Activities at Fermilab for Big Data Analysis 摘要:费米实验室大数据分析网络研究活动

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Pub Date : 2012-11-10 DOI: 10.1109/SC.Companion.2012.214

P. DeMar, D. Dykstra, G. Garzoglio, P. Mhashikar, Anupam Rajendran, Wenji Wu

Exascale science translates to big data. In the case of the Large Hadron Collider (LHC), the data is not only immense, it is also globally distributed. Fermilab is host to the LHC Compact Muon Solenoid (CMS) experiment's US Tier-1 Center. It must deal with both scaling and wide-area distribution challenges in processing its CMS data. This poster will describe the ongoing network-related R&D activities at Fermilab as a mosaic of efforts that combine to facilitate big data processing and movement.

百亿亿次科学转化为大数据。以大型强子对撞机(LHC)为例，数据不仅是巨大的，而且是全球分布的。费米实验室是大型强子对撞机紧凑型介子螺线管(CMS)实验的美国一级中心的所在地。在处理CMS数据时，必须同时解决规模和广域分布的问题。这张海报将描述费米实验室正在进行的与网络相关的研发活动，将其描述为促进大数据处理和移动的一系列努力。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀