Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning
Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms applied on data mapped onto rectangular grids. However, not all scientific applications conform to this pattern, i.e. plane wave Density Functional Theory codes require multi-dimensional Fourier transforms applied on data represented as batches of spheres. Typically, the implementations for this use case are hand-coded and tailored for the requirements of each application. In this work, we present the Fastest Fourier Transform from Berkeley (FFTB) a distributed framework that offers flexible implementations for both regular/non-regular data grids and batched/non-batched transforms. We provide a flexible implementations with a user-friendly API that captures most of the use cases. Furthermore, we provide implementations for both CPU and GPU platforms, showing that our approach offers improved execution time and scalability on the HP Cray EX supercomputer. In addition, we outline the need for flexible implementations for different use cases of the software package.
多维傅立叶变换是关键的数学构件,广泛应用于材料科学、物理学、化学甚至机器学习等领域。在过去几年中,针对分布式多维傅立叶变换开发了大量软件包。大多数变体试图为映射到矩形网格上的数据提供单次变换的高效实现。然而,并非所有科学应用都符合这种模式,例如,平面波密度函数论代码需要对以球体批次表示的数据进行多维傅里叶变换。通常情况下,这种用例的实现都是手工编码,并根据每个应用的需求量身定制。在这项工作中,我们提出了伯克利最快傅立叶变换(FFTB),这是一个分布式框架,可为规则/非规则数据网格和成批/非成批变换提供灵活的实现方法。我们通过用户友好的应用程序接口(API)提供了灵活的实现方式,可以满足大多数用例。此外,我们还提供了 CPU 和 GPU 平台的实现方法,表明我们的方法在 HP Cray EX 超级计算机上的执行时间和可扩展性都得到了改善。此外,我们还概述了针对软件包的不同用例进行灵活实现的必要性。
{"title":"Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes","authors":"Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning","doi":"arxiv-2406.05577","DOIUrl":"https://doi.org/arxiv-2406.05577","url":null,"abstract":"Multi-dimensional Fourier transforms are key mathematical building blocks\u0000that appear in a wide range of applications from materials science, physics,\u0000chemistry and even machine learning. Over the past years, a multitude of\u0000software packages targeting distributed multi-dimensional Fourier transforms\u0000have been developed. Most variants attempt to offer efficient implementations\u0000for single transforms applied on data mapped onto rectangular grids. However,\u0000not all scientific applications conform to this pattern, i.e. plane wave\u0000Density Functional Theory codes require multi-dimensional Fourier transforms\u0000applied on data represented as batches of spheres. Typically, the\u0000implementations for this use case are hand-coded and tailored for the\u0000requirements of each application. In this work, we present the Fastest Fourier\u0000Transform from Berkeley (FFTB) a distributed framework that offers flexible\u0000implementations for both regular/non-regular data grids and batched/non-batched\u0000transforms. We provide a flexible implementations with a user-friendly API that\u0000captures most of the use cases. Furthermore, we provide implementations for\u0000both CPU and GPU platforms, showing that our approach offers improved execution\u0000time and scalability on the HP Cray EX supercomputer. In addition, we outline\u0000the need for flexible implementations for different use cases of the software\u0000package.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article presents svds-C, an open-source and high-performance C program for accurately and robustly computing truncated SVD, e.g. computing several largest singular values and corresponding singular vectors. We have re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS and multi-thread computing to obtain the parallel program named svds-C. svds-C running on shared-memory computer consumes less time and memory than svds thanks to careful implementation of multi-thread parallelization and memory management. Numerical experiments on different test cases which are synthetically generated or directly from real world datasets show that, svds-C runs remarkably faster than svds with averagely 4.7X and at most 12X speedup for 16-thread parallel computing on a computer with Intel CPU, while preserving same accuracy and consuming about half memory space. Experimental results also demonstrate that svds-C has similar advantages over svds on the computer with AMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD on computing time and robustness.
本文介绍的 svds-C 是一个开源的高性能 C 程序,用于精确、稳健地计算截断 SVD,例如计算几个最大奇异值和相应的奇异向量。由于精心实现了多线程并行化和内存管理,在共享内存计算机上运行的svds-C比svdst消耗更少的时间和内存。在不同测试用例上进行的数值实验表明,svds-C 运行速度明显快于 svds,在英特尔 CPU 计算机上进行 16 线程并行计算时,平均速度提高了 4.7 倍,最多提高了 12 倍,同时保持了相同的精度,内存空间消耗约为 svds 的一半。实验结果还证明,在使用 AMD CPU 的计算机上,svds-C 与 svds 相比具有类似的优势,并且在计算时间和鲁棒性方面优于其他最先进的截断 SVD 算法。
{"title":"svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition","authors":"Xu Feng, Wenjian Yu, Yuyang Xie","doi":"arxiv-2405.18966","DOIUrl":"https://doi.org/arxiv-2405.18966","url":null,"abstract":"This article presents svds-C, an open-source and high-performance C program\u0000for accurately and robustly computing truncated SVD, e.g. computing several\u0000largest singular values and corresponding singular vectors. We have\u0000re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS\u0000and multi-thread computing to obtain the parallel program named svds-C. svds-C\u0000running on shared-memory computer consumes less time and memory than svds\u0000thanks to careful implementation of multi-thread parallelization and memory\u0000management. Numerical experiments on different test cases which are\u0000synthetically generated or directly from real world datasets show that, svds-C\u0000runs remarkably faster than svds with averagely 4.7X and at most 12X speedup\u0000for 16-thread parallel computing on a computer with Intel CPU, while preserving\u0000same accuracy and consuming about half memory space. Experimental results also\u0000demonstrate that svds-C has similar advantages over svds on the computer with\u0000AMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD on\u0000computing time and robustness.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zachary J. Wegert, Jordi Manyer, Connor Mallon, Santiago Badia, Vivien J. Challis
In this paper we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package is written in Julia and uses the Gridap package ecosystem for parallel finite element assembly from arbitrary weak formulations of partial differential equation (PDEs) along with the scalable solvers from the Portable and Extendable Toolkit for Scientific Computing (PETSc). The resulting user interface is intuitive and easy-to-use, allowing for the implementation of a wide range of topology optimisation problems with a syntax that is near one-to-one with the mathematical notation. Furthermore, we implement automatic differentiation to help mitigate the bottleneck associated with the analytic derivation of sensitivities for complex problems. GridapTopOpt is capable of solving a range of benchmark and research topology optimisation problems with large numbers of degrees of freedom. This educational article demonstrates the usability and versatility of the package by describing the formulation and step-by-step implementation of several distinct topology optimisation problems. The driver scripts for these problems are provided and the package source code is available at https://github$.$com/zjwegert/GridapTopOpt.jl.
在本文中,我们介绍了 GridapTopOpt,这是一个可扩展的框架,用于基于水平集的拓扑优化,可随时在个人计算机或高性能计算集群上分布。该软件包由 Julia 编写,使用 Gridap 软件包生态系统从偏微分方程(PDE)的任意弱公式中进行并行有限元组装,同时使用科学计算便携式可扩展工具包(PETSc)中的可扩展求解器。由此产生的用户界面直观易用,允许使用与数学符号接近一一对应的语法实现各种拓扑优化问题。此外,我们还实现了自动微分,以帮助减轻复杂问题的敏感性分析推导所带来的瓶颈。GridapTopOpt 能够解决一系列具有大量自由度的基准和研究拓扑优化问题。这篇教育文章通过描述几个不同拓扑优化问题的制定和逐步实施,展示了软件包的可用性和通用性。本文提供了这些问题的驱动程序脚本,软件包源代码可在 https://github$.$com/zjwegert/GridapTopOpt.jl 上获取。
{"title":"GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisation","authors":"Zachary J. Wegert, Jordi Manyer, Connor Mallon, Santiago Badia, Vivien J. Challis","doi":"arxiv-2405.10478","DOIUrl":"https://doi.org/arxiv-2405.10478","url":null,"abstract":"In this paper we present GridapTopOpt, an extendable framework for level\u0000set-based topology optimisation that can be readily distributed across a\u0000personal computer or high-performance computing cluster. The package is written\u0000in Julia and uses the Gridap package ecosystem for parallel finite element\u0000assembly from arbitrary weak formulations of partial differential equation\u0000(PDEs) along with the scalable solvers from the Portable and Extendable Toolkit\u0000for Scientific Computing (PETSc). The resulting user interface is intuitive and\u0000easy-to-use, allowing for the implementation of a wide range of topology\u0000optimisation problems with a syntax that is near one-to-one with the\u0000mathematical notation. Furthermore, we implement automatic differentiation to\u0000help mitigate the bottleneck associated with the analytic derivation of\u0000sensitivities for complex problems. GridapTopOpt is capable of solving a range\u0000of benchmark and research topology optimisation problems with large numbers of\u0000degrees of freedom. This educational article demonstrates the usability and\u0000versatility of the package by describing the formulation and step-by-step\u0000implementation of several distinct topology optimisation problems. The driver\u0000scripts for these problems are provided and the package source code is\u0000available at https://github$.$com/zjwegert/GridapTopOpt.jl.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces the design and implementation of PyOptInterface, a modeling language for mathematical optimization embedded in Python programming language. PyOptInterface uses lightweight and compact data structure to bridge high-level entities in optimization models like variables and constraints to internal indices of optimizers efficiently. It supports a variety of optimization solvers and a range of common problem classes. We provide benchmarks to exhibit the competitive performance of PyOptInterface compared with other state-of-the-art modeling languages.
{"title":"PyOptInterface: Design and implementation of an efficient modeling language for mathematical optimization","authors":"Yue Yang, Chenhui Lin, Luo Xu, Wenchuan Wu","doi":"arxiv-2405.10130","DOIUrl":"https://doi.org/arxiv-2405.10130","url":null,"abstract":"This paper introduces the design and implementation of PyOptInterface, a\u0000modeling language for mathematical optimization embedded in Python programming\u0000language. PyOptInterface uses lightweight and compact data structure to bridge\u0000high-level entities in optimization models like variables and constraints to\u0000internal indices of optimizers efficiently. It supports a variety of\u0000optimization solvers and a range of common problem classes. We provide\u0000benchmarks to exhibit the competitive performance of PyOptInterface compared\u0000with other state-of-the-art modeling languages.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In shared-memory parallel automatic differentiation, shared inputs among simultaneous thread-local preaccumulations lead to data races if Jacobians are accumulated with a single, shared vector of adjoint variables. In this work, we discuss the benefits and tradeoffs of re-enabling such preaccumulations by a transition to suitable local adjoint variables. In particular, we assess the performance of mapped local adjoints in discrete adjoint computations in the multiphysics simulation suite SU2.
{"title":"Local Adjoints for Simultaneous Preaccumulations with Shared Inputs","authors":"Johannes Blühdorn, Nicolas R. Gauger","doi":"arxiv-2405.07819","DOIUrl":"https://doi.org/arxiv-2405.07819","url":null,"abstract":"In shared-memory parallel automatic differentiation, shared inputs among\u0000simultaneous thread-local preaccumulations lead to data races if Jacobians are\u0000accumulated with a single, shared vector of adjoint variables. In this work, we\u0000discuss the benefits and tradeoffs of re-enabling such preaccumulations by a\u0000transition to suitable local adjoint variables. In particular, we assess the\u0000performance of mapped local adjoints in discrete adjoint computations in the\u0000multiphysics simulation suite SU2.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger
The open-source multiphysics suite SU2 features discrete adjoints by means of operator overloading automatic differentiation (AD). While both primal and discrete adjoint solvers support MPI parallelism, hybrid parallelism using both MPI and OpenMP has only been introduced for the primal solvers so far. In this work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP parallelism, marks a key step in this endeavour. We identify the affected parts of SU2's advanced AD workflow and discuss the required changes and their tradeoffs. Detailed performance studies compare MPI parallel and hybrid parallel discrete adjoints in terms of memory and runtime and unveil key performance characteristics. We showcase the effectiveness of performance optimizations and highlight perspectives for future improvements. At the same time, this study demonstrates the applicability of OpDiLib in a large code base and its scalability on large test cases, providing valuable insights for future applications both within and beyond SU2.
开源多物理场套件 SU2 通过操作符重载自动微分(AD)技术实现了离散邻接。虽然基元求解器和离散邻接求解器都支持 MPI 并行,但使用 MPI 和 OpenMP 的混合并行迄今只在基元求解器中引入过。在这项工作中,我们启用了混合并行离散邻接求解器。将 SU2 与 OpDiLib(运算符重载 AD 工具的附加组件,可将 AD 扩展到 OpenMP 并行性)耦合是这一努力的关键一步。我们确定了 SU2 高级 AD 工作流程中受影响的部分,并讨论了所需的更改及其代价。详细的性能研究比较了内存和运行时间方面的 MPI 并行和混合并行离散邻接,并揭示了关键性能特征。我们展示了性能优化的有效性,并强调了未来改进的前景。同时,本研究还展示了 OpDiLib 在大型代码库中的适用性及其在大型测试案例中的可扩展性,为 SU2 内外的未来应用提供了宝贵的见解。
{"title":"Hybrid Parallel Discrete Adjoints in SU2","authors":"Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger","doi":"arxiv-2405.06056","DOIUrl":"https://doi.org/arxiv-2405.06056","url":null,"abstract":"The open-source multiphysics suite SU2 features discrete adjoints by means of\u0000operator overloading automatic differentiation (AD). While both primal and\u0000discrete adjoint solvers support MPI parallelism, hybrid parallelism using both\u0000MPI and OpenMP has only been introduced for the primal solvers so far. In this\u0000work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with\u0000OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP\u0000parallelism, marks a key step in this endeavour. We identify the affected parts\u0000of SU2's advanced AD workflow and discuss the required changes and their\u0000tradeoffs. Detailed performance studies compare MPI parallel and hybrid\u0000parallel discrete adjoints in terms of memory and runtime and unveil key\u0000performance characteristics. We showcase the effectiveness of performance\u0000optimizations and highlight perspectives for future improvements. At the same\u0000time, this study demonstrates the applicability of OpDiLib in a large code base\u0000and its scalability on large test cases, providing valuable insights for future\u0000applications both within and beyond SU2.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"208 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat
Sparse tensor operations are gaining attention in emerging applications such as social networks, deep learning, diagnosis, crime, and review analysis. However, a major obstacle for research in sparse tensor operations is the deficiency of a broad-scale sparse tensor dataset. Another challenge in sparse tensor operations is examining the sparse tensor features, which are not only important for revealing its nonzero pattern but also have a significant impact on determining the best-suited storage format, the decomposition algorithm, and the reordering methods. However, due to the large sizes of real tensors, even extracting these features becomes costly without caution. To address these gaps in the literature, we have developed a smart sparse tensor generator that mimics the substantial features of real sparse tensors. Moreover, we propose various methods for efficiently extracting an extensive set of features for sparse tensors. The effectiveness of our generator is validated through the quality of features and the performance of decomposition in the generated tensors. Both the sparse tensor feature extractor and the tensor generator are open source with all the artifacts available at https://github.com/sparcityeu/feaTen and https://github.com/sparcityeu/genTen, respectively.
{"title":"A Sparse Tensor Generator with Efficient Feature Extraction","authors":"Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat","doi":"arxiv-2405.04944","DOIUrl":"https://doi.org/arxiv-2405.04944","url":null,"abstract":"Sparse tensor operations are gaining attention in emerging applications such\u0000as social networks, deep learning, diagnosis, crime, and review analysis.\u0000However, a major obstacle for research in sparse tensor operations is the\u0000deficiency of a broad-scale sparse tensor dataset. Another challenge in sparse\u0000tensor operations is examining the sparse tensor features, which are not only\u0000important for revealing its nonzero pattern but also have a significant impact\u0000on determining the best-suited storage format, the decomposition algorithm, and\u0000the reordering methods. However, due to the large sizes of real tensors, even\u0000extracting these features becomes costly without caution. To address these gaps\u0000in the literature, we have developed a smart sparse tensor generator that\u0000mimics the substantial features of real sparse tensors. Moreover, we propose\u0000various methods for efficiently extracting an extensive set of features for\u0000sparse tensors. The effectiveness of our generator is validated through the\u0000quality of features and the performance of decomposition in the generated\u0000tensors. Both the sparse tensor feature extractor and the tensor generator are\u0000open source with all the artifacts available at\u0000https://github.com/sparcityeu/feaTen and https://github.com/sparcityeu/genTen,\u0000respectively.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices. However, due to its low computational intensity, its performance is typically limited by the available memory bandwidth. By optimizing the storage representation of the data within such matrices, this limitation can be lifted and the performance increased. This applies not only to hierarchical matrices but for also for other low-rank approximation schemes, e.g. block low-rank matrices.
{"title":"Performance of H-Matrix-Vector Multiplication with Floating Point Compression","authors":"Ronald Kriemann","doi":"arxiv-2405.03456","DOIUrl":"https://doi.org/arxiv-2405.03456","url":null,"abstract":"Matrix-vector multiplication forms the basis of many iterative solution\u0000algorithms and as such is an important algorithm also for hierarchical\u0000matrices. However, due to its low computational intensity, its performance is\u0000typically limited by the available memory bandwidth. By optimizing the storage\u0000representation of the data within such matrices, this limitation can be lifted\u0000and the performance increased. This applies not only to hierarchical matrices\u0000but for also for other low-rank approximation schemes, e.g. block low-rank\u0000matrices.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This contribution examines the capabilities of the Python ecosystem to solve nonlinear energy minimization problems, with a particular focus on transitioning from traditional MATLAB methods to Python's advanced computational tools, such as automatic differentiation. We demonstrate Python's streamlined approach to minimizing nonlinear energies by analyzing three problem benchmarks - the p-Laplacian, the Ginzburg-Landau model, and the Neo-Hookean hyperelasticity. This approach merely requires the provision of the energy functional itself, making it a simple and efficient way to solve this category of problems. The results show that the implementation is about ten times faster than the MATLAB implementation for large-scale problems. Our findings highlight Python's efficiency and ease of use in scientific computing, establishing it as a preferable choice for implementing sophisticated mathematical models and accelerating the development of numerical simulations.
{"title":"Minimization of Nonlinear Energies in Python Using FEM and Automatic Differentiation Tools","authors":"Michal Béreš, Jan Valdman","doi":"arxiv-2407.04706","DOIUrl":"https://doi.org/arxiv-2407.04706","url":null,"abstract":"This contribution examines the capabilities of the Python ecosystem to solve\u0000nonlinear energy minimization problems, with a particular focus on\u0000transitioning from traditional MATLAB methods to Python's advanced\u0000computational tools, such as automatic differentiation. We demonstrate Python's\u0000streamlined approach to minimizing nonlinear energies by analyzing three\u0000problem benchmarks - the p-Laplacian, the Ginzburg-Landau model, and the\u0000Neo-Hookean hyperelasticity. This approach merely requires the provision of the\u0000energy functional itself, making it a simple and efficient way to solve this\u0000category of problems. The results show that the implementation is about ten\u0000times faster than the MATLAB implementation for large-scale problems. Our\u0000findings highlight Python's efficiency and ease of use in scientific computing,\u0000establishing it as a preferable choice for implementing sophisticated\u0000mathematical models and accelerating the development of numerical simulations.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141571851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
From FORTRAN to NumPy, arrays have revolutionized how we express computation. However, arrays in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world arrays often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the array structures and program control flow they support to better simplify the problem. In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured arrays by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of array structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, graph analytics, and a high-level tensor operator fusion interface.
{"title":"Finch: Sparse and Structured Array Programming with Control Flow","authors":"Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Changwan Hong, Saman Amarasinghe","doi":"arxiv-2404.16730","DOIUrl":"https://doi.org/arxiv-2404.16730","url":null,"abstract":"From FORTRAN to NumPy, arrays have revolutionized how we express computation.\u0000However, arrays in these, and almost all prominent systems, can only handle\u0000dense rectilinear integer grids. Real world arrays often contain underlying\u0000structure, such as sparsity, runs of repeated values, or symmetry. Support for\u0000structured data is fragmented and incomplete. Existing frameworks limit the\u0000array structures and program control flow they support to better simplify the\u0000problem. In this work, we propose a new programming language, Finch, which supports\u0000both flexible control flow and diverse data structures. Finch facilitates a\u0000programming model which resolves the challenges of computing over structured\u0000arrays by combining control flow and data structures into a common\u0000representation where they can be co-optimized. Finch automatically specializes\u0000control flow to data so that performance engineers can focus on experimenting\u0000with many algorithms. Finch supports a familiar programming language of loops,\u0000statements, ifs, breaks, etc., over a wide variety of array structures, such as\u0000sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch\u0000reliably utilizes the key properties of structure, such as structural zeros,\u0000repeated values, or clustered non-zeros. We show that this leads to dramatic\u0000speedups in operations such as SpMV and SpGEMM, image processing, graph\u0000analytics, and a high-level tensor operator fusion interface.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}