线性代数软件大规模加速多核计算*

IF 16.3 1区数学 Q1 MATHEMATICS Acta Numerica Pub Date : 2016-05-01 DOI:10.1017/S0962492916000015

A. Abdelfattah, H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, A. YarKhan

{"title":"线性代数软件大规模加速多核计算*","authors":"A. Abdelfattah, H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, A. YarKhan","doi":"10.1017/S0962492916000015","DOIUrl":null,"url":null,"abstract":"Many crucial scientific computing applications, ranging from national security to medical advances, rely on high-performance linear algebra algorithms and technologies, underscoring their importance and broad impact. Here we present the state-of-the-art design and implementation practices for the acceleration of the predominant linear algebra algorithms on large-scale accelerated multicore systems. Examples are given with fundamental dense linear algebra algorithms – from the LU, QR, Cholesky, and LDLT factorizations needed for solving linear systems of equations, to eigenvalue and singular value decomposition (SVD) problems. The implementations presented are readily available via the open-source PLASMA and MAGMA libraries, which represent the next generation modernization of the popular LAPACK library for accelerated multicore systems. To generate the extreme level of parallelism needed for the efficient use of these systems, algorithms of interest are redesigned and then split into well-chosen computational tasks. The task execution is scheduled over the computational components of a hybrid system of multicore CPUs with GPU accelerators and/or Xeon Phi coprocessors, using either static scheduling or light-weight runtime systems. The use of light-weight runtime systems keeps scheduling overheads low, similar to static scheduling, while enabling the expression of parallelism through sequential-like code. This simplifies the development effort and allows exploration of the unique strengths of the various hardware components. Finally, we emphasize the development of innovative linear algebra algorithms using three technologies – mixed precision arithmetic, batched operations, and asynchronous iterations – that are currently of high interest for accelerated multicore systems.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"25 1","pages":"1 - 160"},"PeriodicalIF":16.3000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/S0962492916000015","citationCount":"14","resultStr":"{\"title\":\"Linear algebra software for large-scale accelerated multicore computing*\",\"authors\":\"A. Abdelfattah, H. Anzt, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, A. YarKhan\",\"doi\":\"10.1017/S0962492916000015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many crucial scientific computing applications, ranging from national security to medical advances, rely on high-performance linear algebra algorithms and technologies, underscoring their importance and broad impact. Here we present the state-of-the-art design and implementation practices for the acceleration of the predominant linear algebra algorithms on large-scale accelerated multicore systems. Examples are given with fundamental dense linear algebra algorithms – from the LU, QR, Cholesky, and LDLT factorizations needed for solving linear systems of equations, to eigenvalue and singular value decomposition (SVD) problems. The implementations presented are readily available via the open-source PLASMA and MAGMA libraries, which represent the next generation modernization of the popular LAPACK library for accelerated multicore systems. To generate the extreme level of parallelism needed for the efficient use of these systems, algorithms of interest are redesigned and then split into well-chosen computational tasks. The task execution is scheduled over the computational components of a hybrid system of multicore CPUs with GPU accelerators and/or Xeon Phi coprocessors, using either static scheduling or light-weight runtime systems. The use of light-weight runtime systems keeps scheduling overheads low, similar to static scheduling, while enabling the expression of parallelism through sequential-like code. This simplifies the development effort and allows exploration of the unique strengths of the various hardware components. Finally, we emphasize the development of innovative linear algebra algorithms using three technologies – mixed precision arithmetic, batched operations, and asynchronous iterations – that are currently of high interest for accelerated multicore systems.\",\"PeriodicalId\":48863,\"journal\":{\"name\":\"Acta Numerica\",\"volume\":\"25 1\",\"pages\":\"1 - 160\"},\"PeriodicalIF\":16.3000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1017/S0962492916000015\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Numerica\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1017/S0962492916000015\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Numerica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/S0962492916000015","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 14

摘要

从国家安全到医学进步，许多关键的科学计算应用都依赖于高性能线性代数算法和技术，这凸显了它们的重要性和广泛影响。在这里，我们提出了在大规模加速多核系统上加速主流线性代数算法的最先进的设计和实现实践。给出了基本的密集线性代数算法的例子-从求解线性方程组所需的LU, QR, Cholesky和LDLT分解，到特征值和奇异值分解(SVD)问题。本文提供的实现可以通过开源的PLASMA和MAGMA库获得，它们代表了用于加速多核系统的流行LAPACK库的下一代现代化。为了产生高效使用这些系统所需的极端并行性，需要重新设计感兴趣的算法，然后将其拆分为精心选择的计算任务。任务执行在多核cpu与GPU加速器和/或Xeon Phi协处理器混合系统的计算组件上进行调度，使用静态调度或轻量级运行时系统。轻量级运行时系统的使用使调度开销保持在较低的水平，类似于静态调度，同时支持通过类似顺序的代码来表达并行性。这简化了开发工作，并允许探索各种硬件组件的独特优势。最后，我们强调了使用三种技术的创新线性代数算法的发展-混合精度算法，批处理操作和异步迭代-这是目前对加速多核系统非常感兴趣的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Linear algebra software for large-scale accelerated multicore computing*

Many crucial scientific computing applications, ranging from national security to medical advances, rely on high-performance linear algebra algorithms and technologies, underscoring their importance and broad impact. Here we present the state-of-the-art design and implementation practices for the acceleration of the predominant linear algebra algorithms on large-scale accelerated multicore systems. Examples are given with fundamental dense linear algebra algorithms – from the LU, QR, Cholesky, and LDLT factorizations needed for solving linear systems of equations, to eigenvalue and singular value decomposition (SVD) problems. The implementations presented are readily available via the open-source PLASMA and MAGMA libraries, which represent the next generation modernization of the popular LAPACK library for accelerated multicore systems. To generate the extreme level of parallelism needed for the efficient use of these systems, algorithms of interest are redesigned and then split into well-chosen computational tasks. The task execution is scheduled over the computational components of a hybrid system of multicore CPUs with GPU accelerators and/or Xeon Phi coprocessors, using either static scheduling or light-weight runtime systems. The use of light-weight runtime systems keeps scheduling overheads low, similar to static scheduling, while enabling the expression of parallelism through sequential-like code. This simplifies the development effort and allows exploration of the unique strengths of the various hardware components. Finally, we emphasize the development of innovative linear algebra algorithms using three technologies – mixed precision arithmetic, batched operations, and asynchronous iterations – that are currently of high interest for accelerated multicore systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Acta Numerica MATHEMATICS-

CiteScore

26.00

自引率

0.70%

发文量

期刊介绍： Acta Numerica stands as the preeminent mathematics journal, ranking highest in both Impact Factor and MCQ metrics. This annual journal features a collection of review articles that showcase survey papers authored by prominent researchers in numerical analysis, scientific computing, and computational mathematics. These papers deliver comprehensive overviews of recent advances, offering state-of-the-art techniques and analyses. Encompassing the entirety of numerical analysis, the articles are crafted in an accessible style, catering to researchers at all levels and serving as valuable teaching aids for advanced instruction. The broad subject areas covered include computational methods in linear algebra, optimization, ordinary and partial differential equations, approximation theory, stochastic analysis, nonlinear dynamical systems, as well as the application of computational techniques in science and engineering. Acta Numerica also delves into the mathematical theory underpinning numerical methods, making it a versatile and authoritative resource in the field of mathematics.

期刊最新文献

Splitting methods for differential equations Adaptive finite element methods The geometry of monotone operator splitting methods Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning Optimal experimental design: Formulations and computations