首页 > 最新文献

arXiv - CS - Mathematical Software最新文献

英文 中文
Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes 平面波密度函数论代码的灵活多维 FFT
Pub Date : 2024-06-08 DOI: arxiv-2406.05577
Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning
Multi-dimensional Fourier transforms are key mathematical building blocksthat appear in a wide range of applications from materials science, physics,chemistry and even machine learning. Over the past years, a multitude ofsoftware packages targeting distributed multi-dimensional Fourier transformshave been developed. Most variants attempt to offer efficient implementationsfor single transforms applied on data mapped onto rectangular grids. However,not all scientific applications conform to this pattern, i.e. plane waveDensity Functional Theory codes require multi-dimensional Fourier transformsapplied on data represented as batches of spheres. Typically, theimplementations for this use case are hand-coded and tailored for therequirements of each application. In this work, we present the Fastest FourierTransform from Berkeley (FFTB) a distributed framework that offers flexibleimplementations for both regular/non-regular data grids and batched/non-batchedtransforms. We provide a flexible implementations with a user-friendly API thatcaptures most of the use cases. Furthermore, we provide implementations forboth CPU and GPU platforms, showing that our approach offers improved executiontime and scalability on the HP Cray EX supercomputer. In addition, we outlinethe need for flexible implementations for different use cases of the softwarepackage.
多维傅立叶变换是关键的数学构件,广泛应用于材料科学、物理学、化学甚至机器学习等领域。在过去几年中,针对分布式多维傅立叶变换开发了大量软件包。大多数变体试图为映射到矩形网格上的数据提供单次变换的高效实现。然而,并非所有科学应用都符合这种模式,例如,平面波密度函数论代码需要对以球体批次表示的数据进行多维傅里叶变换。通常情况下,这种用例的实现都是手工编码,并根据每个应用的需求量身定制。在这项工作中,我们提出了伯克利最快傅立叶变换(FFTB),这是一个分布式框架,可为规则/非规则数据网格和成批/非成批变换提供灵活的实现方法。我们通过用户友好的应用程序接口(API)提供了灵活的实现方式,可以满足大多数用例。此外,我们还提供了 CPU 和 GPU 平台的实现方法,表明我们的方法在 HP Cray EX 超级计算机上的执行时间和可扩展性都得到了改善。此外,我们还概述了针对软件包的不同用例进行灵活实现的必要性。
{"title":"Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes","authors":"Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning","doi":"arxiv-2406.05577","DOIUrl":"https://doi.org/arxiv-2406.05577","url":null,"abstract":"Multi-dimensional Fourier transforms are key mathematical building blocks\u0000that appear in a wide range of applications from materials science, physics,\u0000chemistry and even machine learning. Over the past years, a multitude of\u0000software packages targeting distributed multi-dimensional Fourier transforms\u0000have been developed. Most variants attempt to offer efficient implementations\u0000for single transforms applied on data mapped onto rectangular grids. However,\u0000not all scientific applications conform to this pattern, i.e. plane wave\u0000Density Functional Theory codes require multi-dimensional Fourier transforms\u0000applied on data represented as batches of spheres. Typically, the\u0000implementations for this use case are hand-coded and tailored for the\u0000requirements of each application. In this work, we present the Fastest Fourier\u0000Transform from Berkeley (FFTB) a distributed framework that offers flexible\u0000implementations for both regular/non-regular data grids and batched/non-batched\u0000transforms. We provide a flexible implementations with a user-friendly API that\u0000captures most of the use cases. Furthermore, we provide implementations for\u0000both CPU and GPU platforms, showing that our approach offers improved execution\u0000time and scalability on the HP Cray EX supercomputer. In addition, we outline\u0000the need for flexible implementations for different use cases of the software\u0000package.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition svds-C:计算截断奇异值分解的多线程 C 代码
Pub Date : 2024-05-29 DOI: arxiv-2405.18966
Xu Feng, Wenjian Yu, Yuyang Xie
This article presents svds-C, an open-source and high-performance C programfor accurately and robustly computing truncated SVD, e.g. computing severallargest singular values and corresponding singular vectors. We havere-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLASand multi-thread computing to obtain the parallel program named svds-C. svds-Crunning on shared-memory computer consumes less time and memory than svdsthanks to careful implementation of multi-thread parallelization and memorymanagement. Numerical experiments on different test cases which aresynthetically generated or directly from real world datasets show that, svds-Cruns remarkably faster than svds with averagely 4.7X and at most 12X speedupfor 16-thread parallel computing on a computer with Intel CPU, while preservingsame accuracy and consuming about half memory space. Experimental results alsodemonstrate that svds-C has similar advantages over svds on the computer withAMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD oncomputing time and robustness.
本文介绍的 svds-C 是一个开源的高性能 C 程序,用于精确、稳健地计算截断 SVD,例如计算几个最大奇异值和相应的奇异向量。由于精心实现了多线程并行化和内存管理,在共享内存计算机上运行的svds-C比svdst消耗更少的时间和内存。在不同测试用例上进行的数值实验表明,svds-C 运行速度明显快于 svds,在英特尔 CPU 计算机上进行 16 线程并行计算时,平均速度提高了 4.7 倍,最多提高了 12 倍,同时保持了相同的精度,内存空间消耗约为 svds 的一半。实验结果还证明,在使用 AMD CPU 的计算机上,svds-C 与 svds 相比具有类似的优势,并且在计算时间和鲁棒性方面优于其他最先进的截断 SVD 算法。
{"title":"svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition","authors":"Xu Feng, Wenjian Yu, Yuyang Xie","doi":"arxiv-2405.18966","DOIUrl":"https://doi.org/arxiv-2405.18966","url":null,"abstract":"This article presents svds-C, an open-source and high-performance C program\u0000for accurately and robustly computing truncated SVD, e.g. computing several\u0000largest singular values and corresponding singular vectors. We have\u0000re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS\u0000and multi-thread computing to obtain the parallel program named svds-C. svds-C\u0000running on shared-memory computer consumes less time and memory than svds\u0000thanks to careful implementation of multi-thread parallelization and memory\u0000management. Numerical experiments on different test cases which are\u0000synthetically generated or directly from real world datasets show that, svds-C\u0000runs remarkably faster than svds with averagely 4.7X and at most 12X speedup\u0000for 16-thread parallel computing on a computer with Intel CPU, while preserving\u0000same accuracy and consuming about half memory space. Experimental results also\u0000demonstrate that svds-C has similar advantages over svds on the computer with\u0000AMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD on\u0000computing time and robustness.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisation GridapTopOpt.jl:基于水平集拓扑优化的可扩展 Julia 工具箱
Pub Date : 2024-05-17 DOI: arxiv-2405.10478
Zachary J. Wegert, Jordi Manyer, Connor Mallon, Santiago Badia, Vivien J. Challis
In this paper we present GridapTopOpt, an extendable framework for levelset-based topology optimisation that can be readily distributed across apersonal computer or high-performance computing cluster. The package is writtenin Julia and uses the Gridap package ecosystem for parallel finite elementassembly from arbitrary weak formulations of partial differential equation(PDEs) along with the scalable solvers from the Portable and Extendable Toolkitfor Scientific Computing (PETSc). The resulting user interface is intuitive andeasy-to-use, allowing for the implementation of a wide range of topologyoptimisation problems with a syntax that is near one-to-one with themathematical notation. Furthermore, we implement automatic differentiation tohelp mitigate the bottleneck associated with the analytic derivation ofsensitivities for complex problems. GridapTopOpt is capable of solving a rangeof benchmark and research topology optimisation problems with large numbers ofdegrees of freedom. This educational article demonstrates the usability andversatility of the package by describing the formulation and step-by-stepimplementation of several distinct topology optimisation problems. The driverscripts for these problems are provided and the package source code isavailable at https://github$.$com/zjwegert/GridapTopOpt.jl.
在本文中,我们介绍了 GridapTopOpt,这是一个可扩展的框架,用于基于水平集的拓扑优化,可随时在个人计算机或高性能计算集群上分布。该软件包由 Julia 编写,使用 Gridap 软件包生态系统从偏微分方程(PDE)的任意弱公式中进行并行有限元组装,同时使用科学计算便携式可扩展工具包(PETSc)中的可扩展求解器。由此产生的用户界面直观易用,允许使用与数学符号接近一一对应的语法实现各种拓扑优化问题。此外,我们还实现了自动微分,以帮助减轻复杂问题的敏感性分析推导所带来的瓶颈。GridapTopOpt 能够解决一系列具有大量自由度的基准和研究拓扑优化问题。这篇教育文章通过描述几个不同拓扑优化问题的制定和逐步实施,展示了软件包的可用性和通用性。本文提供了这些问题的驱动程序脚本,软件包源代码可在 https://github$.$com/zjwegert/GridapTopOpt.jl 上获取。
{"title":"GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisation","authors":"Zachary J. Wegert, Jordi Manyer, Connor Mallon, Santiago Badia, Vivien J. Challis","doi":"arxiv-2405.10478","DOIUrl":"https://doi.org/arxiv-2405.10478","url":null,"abstract":"In this paper we present GridapTopOpt, an extendable framework for level\u0000set-based topology optimisation that can be readily distributed across a\u0000personal computer or high-performance computing cluster. The package is written\u0000in Julia and uses the Gridap package ecosystem for parallel finite element\u0000assembly from arbitrary weak formulations of partial differential equation\u0000(PDEs) along with the scalable solvers from the Portable and Extendable Toolkit\u0000for Scientific Computing (PETSc). The resulting user interface is intuitive and\u0000easy-to-use, allowing for the implementation of a wide range of topology\u0000optimisation problems with a syntax that is near one-to-one with the\u0000mathematical notation. Furthermore, we implement automatic differentiation to\u0000help mitigate the bottleneck associated with the analytic derivation of\u0000sensitivities for complex problems. GridapTopOpt is capable of solving a range\u0000of benchmark and research topology optimisation problems with large numbers of\u0000degrees of freedom. This educational article demonstrates the usability and\u0000versatility of the package by describing the formulation and step-by-step\u0000implementation of several distinct topology optimisation problems. The driver\u0000scripts for these problems are provided and the package source code is\u0000available at https://github$.$com/zjwegert/GridapTopOpt.jl.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyOptInterface: Design and implementation of an efficient modeling language for mathematical optimization PyOptInterface:设计和实现高效的数学优化建模语言
Pub Date : 2024-05-16 DOI: arxiv-2405.10130
Yue Yang, Chenhui Lin, Luo Xu, Wenchuan Wu
This paper introduces the design and implementation of PyOptInterface, amodeling language for mathematical optimization embedded in Python programminglanguage. PyOptInterface uses lightweight and compact data structure to bridgehigh-level entities in optimization models like variables and constraints tointernal indices of optimizers efficiently. It supports a variety ofoptimization solvers and a range of common problem classes. We providebenchmarks to exhibit the competitive performance of PyOptInterface comparedwith other state-of-the-art modeling languages.
本文介绍了 PyOptInterface 的设计与实现,这是一种嵌入 Python 编程语言的数学优化建模语言。PyOptInterface 使用轻量级和紧凑的数据结构,将优化模型中的高层实体(如变量和约束)高效地连接到优化器的内部索引。它支持各种优化求解器和一系列常见问题类别。我们提供的基准测试表明,与其他最先进的建模语言相比,PyOptInterface 的性能极具竞争力。
{"title":"PyOptInterface: Design and implementation of an efficient modeling language for mathematical optimization","authors":"Yue Yang, Chenhui Lin, Luo Xu, Wenchuan Wu","doi":"arxiv-2405.10130","DOIUrl":"https://doi.org/arxiv-2405.10130","url":null,"abstract":"This paper introduces the design and implementation of PyOptInterface, a\u0000modeling language for mathematical optimization embedded in Python programming\u0000language. PyOptInterface uses lightweight and compact data structure to bridge\u0000high-level entities in optimization models like variables and constraints to\u0000internal indices of optimizers efficiently. It supports a variety of\u0000optimization solvers and a range of common problem classes. We provide\u0000benchmarks to exhibit the competitive performance of PyOptInterface compared\u0000with other state-of-the-art modeling languages.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Adjoints for Simultaneous Preaccumulations with Shared Inputs 共享输入同时预积累的局部相邻关系
Pub Date : 2024-05-13 DOI: arxiv-2405.07819
Johannes Blühdorn, Nicolas R. Gauger
In shared-memory parallel automatic differentiation, shared inputs amongsimultaneous thread-local preaccumulations lead to data races if Jacobians areaccumulated with a single, shared vector of adjoint variables. In this work, wediscuss the benefits and tradeoffs of re-enabling such preaccumulations by atransition to suitable local adjoint variables. In particular, we assess theperformance of mapped local adjoints in discrete adjoint computations in themultiphysics simulation suite SU2.
在共享内存并行自动微分中,如果雅各布数是用单个共享的辅助变量向量累积的,那么同时进行的线程本地预累积之间的共享输入就会导致数据竞赛。在这项工作中,我们讨论了通过转换为合适的本地辅助变量来重新启用这种预累积的好处和权衡。特别是,我们在多物理场仿真套件 SU2 中评估了离散邻接计算中映射局部邻接的性能。
{"title":"Local Adjoints for Simultaneous Preaccumulations with Shared Inputs","authors":"Johannes Blühdorn, Nicolas R. Gauger","doi":"arxiv-2405.07819","DOIUrl":"https://doi.org/arxiv-2405.07819","url":null,"abstract":"In shared-memory parallel automatic differentiation, shared inputs among\u0000simultaneous thread-local preaccumulations lead to data races if Jacobians are\u0000accumulated with a single, shared vector of adjoint variables. In this work, we\u0000discuss the benefits and tradeoffs of re-enabling such preaccumulations by a\u0000transition to suitable local adjoint variables. In particular, we assess the\u0000performance of mapped local adjoints in discrete adjoint computations in the\u0000multiphysics simulation suite SU2.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Parallel Discrete Adjoints in SU2 SU2 中的混合并行离散邻接
Pub Date : 2024-05-09 DOI: arxiv-2405.06056
Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger
The open-source multiphysics suite SU2 features discrete adjoints by means ofoperator overloading automatic differentiation (AD). While both primal anddiscrete adjoint solvers support MPI parallelism, hybrid parallelism using bothMPI and OpenMP has only been introduced for the primal solvers so far. In thiswork, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 withOpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMPparallelism, marks a key step in this endeavour. We identify the affected partsof SU2's advanced AD workflow and discuss the required changes and theirtradeoffs. Detailed performance studies compare MPI parallel and hybridparallel discrete adjoints in terms of memory and runtime and unveil keyperformance characteristics. We showcase the effectiveness of performanceoptimizations and highlight perspectives for future improvements. At the sametime, this study demonstrates the applicability of OpDiLib in a large code baseand its scalability on large test cases, providing valuable insights for futureapplications both within and beyond SU2.
开源多物理场套件 SU2 通过操作符重载自动微分(AD)技术实现了离散邻接。虽然基元求解器和离散邻接求解器都支持 MPI 并行,但使用 MPI 和 OpenMP 的混合并行迄今只在基元求解器中引入过。在这项工作中,我们启用了混合并行离散邻接求解器。将 SU2 与 OpDiLib(运算符重载 AD 工具的附加组件,可将 AD 扩展到 OpenMP 并行性)耦合是这一努力的关键一步。我们确定了 SU2 高级 AD 工作流程中受影响的部分,并讨论了所需的更改及其代价。详细的性能研究比较了内存和运行时间方面的 MPI 并行和混合并行离散邻接,并揭示了关键性能特征。我们展示了性能优化的有效性,并强调了未来改进的前景。同时,本研究还展示了 OpDiLib 在大型代码库中的适用性及其在大型测试案例中的可扩展性,为 SU2 内外的未来应用提供了宝贵的见解。
{"title":"Hybrid Parallel Discrete Adjoints in SU2","authors":"Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger","doi":"arxiv-2405.06056","DOIUrl":"https://doi.org/arxiv-2405.06056","url":null,"abstract":"The open-source multiphysics suite SU2 features discrete adjoints by means of\u0000operator overloading automatic differentiation (AD). While both primal and\u0000discrete adjoint solvers support MPI parallelism, hybrid parallelism using both\u0000MPI and OpenMP has only been introduced for the primal solvers so far. In this\u0000work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with\u0000OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP\u0000parallelism, marks a key step in this endeavour. We identify the affected parts\u0000of SU2's advanced AD workflow and discuss the required changes and their\u0000tradeoffs. Detailed performance studies compare MPI parallel and hybrid\u0000parallel discrete adjoints in terms of memory and runtime and unveil key\u0000performance characteristics. We showcase the effectiveness of performance\u0000optimizations and highlight perspectives for future improvements. At the same\u0000time, this study demonstrates the applicability of OpDiLib in a large code base\u0000and its scalability on large test cases, providing valuable insights for future\u0000applications both within and beyond SU2.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"208 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Sparse Tensor Generator with Efficient Feature Extraction 具有高效特征提取功能的稀疏张量生成器
Pub Date : 2024-05-08 DOI: arxiv-2405.04944
Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat
Sparse tensor operations are gaining attention in emerging applications suchas social networks, deep learning, diagnosis, crime, and review analysis.However, a major obstacle for research in sparse tensor operations is thedeficiency of a broad-scale sparse tensor dataset. Another challenge in sparsetensor operations is examining the sparse tensor features, which are not onlyimportant for revealing its nonzero pattern but also have a significant impacton determining the best-suited storage format, the decomposition algorithm, andthe reordering methods. However, due to the large sizes of real tensors, evenextracting these features becomes costly without caution. To address these gapsin the literature, we have developed a smart sparse tensor generator thatmimics the substantial features of real sparse tensors. Moreover, we proposevarious methods for efficiently extracting an extensive set of features forsparse tensors. The effectiveness of our generator is validated through thequality of features and the performance of decomposition in the generatedtensors. Both the sparse tensor feature extractor and the tensor generator areopen source with all the artifacts available athttps://github.com/sparcityeu/feaTen and https://github.com/sparcityeu/genTen,respectively.
稀疏张量运算在社交网络、深度学习、诊断、犯罪和评论分析等新兴应用领域越来越受到关注。然而,稀疏张量运算研究的一个主要障碍是缺乏大规模的稀疏张量数据集。稀疏张量运算的另一个挑战是研究稀疏张量特征,这不仅对揭示其非零模式非常重要,而且对确定最合适的存储格式、分解算法和重排序方法也有重大影响。然而,由于实张量的尺寸很大,即使提取这些特征也会变得代价高昂。为了填补这些文献空白,我们开发了一种智能稀疏张量生成器,它可以模拟真实稀疏张量的实质性特征。此外,我们还提出了多种有效提取稀疏张量大量特征的方法。我们通过生成张量的特征质量和分解性能验证了生成器的有效性。稀疏张量特征提取器和张量生成器都是开放源代码的,所有工件都可以在https://github.com/sparcityeu/feaTen 和 https://github.com/sparcityeu/genTen,respectively。
{"title":"A Sparse Tensor Generator with Efficient Feature Extraction","authors":"Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat","doi":"arxiv-2405.04944","DOIUrl":"https://doi.org/arxiv-2405.04944","url":null,"abstract":"Sparse tensor operations are gaining attention in emerging applications such\u0000as social networks, deep learning, diagnosis, crime, and review analysis.\u0000However, a major obstacle for research in sparse tensor operations is the\u0000deficiency of a broad-scale sparse tensor dataset. Another challenge in sparse\u0000tensor operations is examining the sparse tensor features, which are not only\u0000important for revealing its nonzero pattern but also have a significant impact\u0000on determining the best-suited storage format, the decomposition algorithm, and\u0000the reordering methods. However, due to the large sizes of real tensors, even\u0000extracting these features becomes costly without caution. To address these gaps\u0000in the literature, we have developed a smart sparse tensor generator that\u0000mimics the substantial features of real sparse tensors. Moreover, we propose\u0000various methods for efficiently extracting an extensive set of features for\u0000sparse tensors. The effectiveness of our generator is validated through the\u0000quality of features and the performance of decomposition in the generated\u0000tensors. Both the sparse tensor feature extractor and the tensor generator are\u0000open source with all the artifacts available at\u0000https://github.com/sparcityeu/feaTen and https://github.com/sparcityeu/genTen,\u0000respectively.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of H-Matrix-Vector Multiplication with Floating Point Compression 使用浮点压缩的 H 矩阵-矢量乘法性能
Pub Date : 2024-05-06 DOI: arxiv-2405.03456
Ronald Kriemann
Matrix-vector multiplication forms the basis of many iterative solutionalgorithms and as such is an important algorithm also for hierarchicalmatrices. However, due to its low computational intensity, its performance istypically limited by the available memory bandwidth. By optimizing the storagerepresentation of the data within such matrices, this limitation can be liftedand the performance increased. This applies not only to hierarchical matricesbut for also for other low-rank approximation schemes, e.g. block low-rankmatrices.
矩阵向量乘法是许多迭代求解算法的基础,因此也是分层矩阵的重要算法。然而,由于其计算强度较低,其性能通常受到可用内存带宽的限制。通过优化此类矩阵中数据的存储表示,可以解除这种限制并提高性能。这不仅适用于分层矩阵,也适用于其他低秩近似方案,如块低秩矩阵。
{"title":"Performance of H-Matrix-Vector Multiplication with Floating Point Compression","authors":"Ronald Kriemann","doi":"arxiv-2405.03456","DOIUrl":"https://doi.org/arxiv-2405.03456","url":null,"abstract":"Matrix-vector multiplication forms the basis of many iterative solution\u0000algorithms and as such is an important algorithm also for hierarchical\u0000matrices. However, due to its low computational intensity, its performance is\u0000typically limited by the available memory bandwidth. By optimizing the storage\u0000representation of the data within such matrices, this limitation can be lifted\u0000and the performance increased. This applies not only to hierarchical matrices\u0000but for also for other low-rank approximation schemes, e.g. block low-rank\u0000matrices.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimization of Nonlinear Energies in Python Using FEM and Automatic Differentiation Tools 使用有限元和自动微分工具在 Python 中最小化非线性能量
Pub Date : 2024-05-03 DOI: arxiv-2407.04706
Michal Béreš, Jan Valdman
This contribution examines the capabilities of the Python ecosystem to solvenonlinear energy minimization problems, with a particular focus ontransitioning from traditional MATLAB methods to Python's advancedcomputational tools, such as automatic differentiation. We demonstrate Python'sstreamlined approach to minimizing nonlinear energies by analyzing threeproblem benchmarks - the p-Laplacian, the Ginzburg-Landau model, and theNeo-Hookean hyperelasticity. This approach merely requires the provision of theenergy functional itself, making it a simple and efficient way to solve thiscategory of problems. The results show that the implementation is about tentimes faster than the MATLAB implementation for large-scale problems. Ourfindings highlight Python's efficiency and ease of use in scientific computing,establishing it as a preferable choice for implementing sophisticatedmathematical models and accelerating the development of numerical simulations.
这篇论文研究了 Python 生态系统解决非线性能量最小化问题的能力,尤其关注从传统的 MATLAB 方法到 Python 高级计算工具(如自动微分)的过渡。我们通过分析三个基准问题--p-Laplacian、Ginzburg-Landau 模型和Neo-Hookean 超弹性--展示了 Python 简化的非线性能量最小化方法。这种方法只需要提供能量函数本身,因此是解决这类问题的一种简单而有效的方法。结果表明,在解决大规模问题时,该方法的实现速度比 MATLAB 的实现速度快约十倍。我们的发现凸显了 Python 在科学计算中的高效性和易用性,使其成为实现复杂数学模型和加速数值模拟开发的首选。
{"title":"Minimization of Nonlinear Energies in Python Using FEM and Automatic Differentiation Tools","authors":"Michal Béreš, Jan Valdman","doi":"arxiv-2407.04706","DOIUrl":"https://doi.org/arxiv-2407.04706","url":null,"abstract":"This contribution examines the capabilities of the Python ecosystem to solve\u0000nonlinear energy minimization problems, with a particular focus on\u0000transitioning from traditional MATLAB methods to Python's advanced\u0000computational tools, such as automatic differentiation. We demonstrate Python's\u0000streamlined approach to minimizing nonlinear energies by analyzing three\u0000problem benchmarks - the p-Laplacian, the Ginzburg-Landau model, and the\u0000Neo-Hookean hyperelasticity. This approach merely requires the provision of the\u0000energy functional itself, making it a simple and efficient way to solve this\u0000category of problems. The results show that the implementation is about ten\u0000times faster than the MATLAB implementation for large-scale problems. Our\u0000findings highlight Python's efficiency and ease of use in scientific computing,\u0000establishing it as a preferable choice for implementing sophisticated\u0000mathematical models and accelerating the development of numerical simulations.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141571851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finch: Sparse and Structured Array Programming with Control Flow 芬奇稀疏和结构化数组编程与控制流
Pub Date : 2024-04-25 DOI: arxiv-2404.16730
Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Changwan Hong, Saman Amarasinghe
From FORTRAN to NumPy, arrays have revolutionized how we express computation.However, arrays in these, and almost all prominent systems, can only handledense rectilinear integer grids. Real world arrays often contain underlyingstructure, such as sparsity, runs of repeated values, or symmetry. Support forstructured data is fragmented and incomplete. Existing frameworks limit thearray structures and program control flow they support to better simplify theproblem. In this work, we propose a new programming language, Finch, which supportsboth flexible control flow and diverse data structures. Finch facilitates aprogramming model which resolves the challenges of computing over structuredarrays by combining control flow and data structures into a commonrepresentation where they can be co-optimized. Finch automatically specializescontrol flow to data so that performance engineers can focus on experimentingwith many algorithms. Finch supports a familiar programming language of loops,statements, ifs, breaks, etc., over a wide variety of array structures, such assparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finchreliably utilizes the key properties of structure, such as structural zeros,repeated values, or clustered non-zeros. We show that this leads to dramaticspeedups in operations such as SpMV and SpGEMM, image processing, graphanalytics, and a high-level tensor operator fusion interface.
从 FORTRAN 到 NumPy,数组彻底改变了我们表达计算的方式。然而,这些系统以及几乎所有著名系统中的数组只能处理直线整数网格。现实世界中的数组通常包含底层结构,例如稀疏性、重复值运行或对称性。对结构化数据的支持既零散又不完整。为了更好地简化问题,现有框架限制了所支持的数组结构和程序控制流。在这项工作中,我们提出了一种新的编程语言 Finch,它同时支持灵活的控制流和多样化的数据结构。Finch 将控制流和数据结构结合到一个共同的表示形式中,从而解决了在结构化数组上进行计算所面临的挑战。Finch 自动将控制流专门化为数据,这样性能工程师就可以专注于试验多种算法。Finch 支持由循环、语句、ifs、断点等组成的熟悉编程语言,并支持多种数组结构,如稀疏性、运行长度编码、对称性、三角形、填充或块。Finch 能可靠地利用结构的关键属性,如结构零、重复值或聚类非零。我们的研究表明,这极大地加快了 SpMV 和 SpGEMM、图像处理、图形分析以及高级张量算子融合界面等操作的速度。
{"title":"Finch: Sparse and Structured Array Programming with Control Flow","authors":"Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Changwan Hong, Saman Amarasinghe","doi":"arxiv-2404.16730","DOIUrl":"https://doi.org/arxiv-2404.16730","url":null,"abstract":"From FORTRAN to NumPy, arrays have revolutionized how we express computation.\u0000However, arrays in these, and almost all prominent systems, can only handle\u0000dense rectilinear integer grids. Real world arrays often contain underlying\u0000structure, such as sparsity, runs of repeated values, or symmetry. Support for\u0000structured data is fragmented and incomplete. Existing frameworks limit the\u0000array structures and program control flow they support to better simplify the\u0000problem. In this work, we propose a new programming language, Finch, which supports\u0000both flexible control flow and diverse data structures. Finch facilitates a\u0000programming model which resolves the challenges of computing over structured\u0000arrays by combining control flow and data structures into a common\u0000representation where they can be co-optimized. Finch automatically specializes\u0000control flow to data so that performance engineers can focus on experimenting\u0000with many algorithms. Finch supports a familiar programming language of loops,\u0000statements, ifs, breaks, etc., over a wide variety of array structures, such as\u0000sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch\u0000reliably utilizes the key properties of structure, such as structural zeros,\u0000repeated values, or clustered non-zeros. We show that this leads to dramatic\u0000speedups in operations such as SpMV and SpGEMM, image processing, graph\u0000analytics, and a high-level tensor operator fusion interface.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Mathematical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1