首页 > 最新文献

ACM Transactions on Mathematical Software最新文献

英文 中文
Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition 基于缓存无关Hilbert曲线的矩阵转置阻塞方案
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3555353
João Nuno Ferreira Alves, Luís Manuel Silveira Russo, Alexandre Francisco

This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in O(nm) time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.

本文提出了一个快速SIMD Hilbert空间填充曲线生成器,它支持一种新的缓存无关阻塞方案技术,应用于一般矩阵的移位。在高性能计算库中发现的矩阵操作通常是基于主机微处理器规范参数化的,以最大限度地减少不同内存层次中的数据移动。缓参无关算法的性能不依赖于这种参数化。这种类型的算法提供了一种优雅且可移植的解决方案,以解决现代处理器中缺乏标准化的问题。我们的解决方案包括一个迭代阻塞方案,该方案利用希尔伯特空间填充曲线的位置保持特性来最小化任何内存层次中的数据移动。该方案在O(nm)的时间和空间内遍历输入矩阵,改善了矩阵算法固有的内存局部性差的行为。与最先进的方法相比,将该技术应用于矩阵移位问题取得了具有竞争力的结果。我们的解决方案在采用标准软件预取技术后,性能优于英特尔MKL版本。
{"title":"Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition","authors":"João Nuno Ferreira Alves, Luís Manuel Silveira Russo, Alexandre Francisco","doi":"https://dl.acm.org/doi/10.1145/3555353","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555353","url":null,"abstract":"<p>This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in <i>O(nm)</i> time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"39 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Remark on Algorithm 1010: Boosting Efficiency in Solving Quartic Equations with No Compromise in Accuracy 算法1010:在不影响精度的情况下提高求解四次方程的效率
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3564270
Cristiano De Michele

We present a correction and an improvement to Algorithm 1010 [A. Orellana and C. De Michele 2020].

本文对算法1010 [a]进行了修正和改进。Orellana and C. De Michele 2020]。
{"title":"Remark on Algorithm 1010: Boosting Efficiency in Solving Quartic Equations with No Compromise in Accuracy","authors":"Cristiano De Michele","doi":"https://dl.acm.org/doi/10.1145/3564270","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3564270","url":null,"abstract":"<p>We present a correction and an improvement to Algorithm 1010 [A. Orellana and C. De Michele 2020].</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"34 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado 基于Sacado的新兴多核体系结构c++代码自动识别
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3560262
Eric Phipps, Roger Pawlowski, Christian Trott

Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.

自动微分(AD)是一种众所周知的技术,用于评估在计算机上实现的计算的解析导数,有许多软件工具可将AD技术整合到复杂的应用程序中。然而,随着多核cpu、gpu和加速器等新出现的多核计算架构的普及,AD面临的一个日益严峻的挑战是如何有效区分并行计算。在这项工作中,我们使用广泛使用的Sacado AD软件包,探索了这些体系结构上基于操作符重载的c++代码的前向模式。特别地,我们利用了Kokkos,这是一个c++工具,提供了用于实现并行计算的api,可移植到各种新兴架构中。我们描述了在使用Kokkos区分这些体系结构的代码时出现的挑战,以及克服这些挑战的两种方法,这些方法确保了最佳的内存访问模式,并在派生计算中暴露了细粒度并行性的额外维度。我们描述了几个计算实验的结果,这些实验证明了该方法在一些当代CPU和GPU架构上的性能。然后,我们总结了这些技术在离散偏微分方程系统模拟中的应用。
{"title":"Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado","authors":"Eric Phipps, Roger Pawlowski, Christian Trott","doi":"https://dl.acm.org/doi/10.1145/3560262","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3560262","url":null,"abstract":"<p>Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"105 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIRECTGO: A New DIRECT-Type MATLAB Toolbox for Derivative-Free Global Optimization DIRECTGO:一种用于无导数全局优化的新型直接型MATLAB工具箱
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3559755
Linas Stripinis, Remigijus Paulavičius

In this work, we introduce DIRECTGO, a new MATLAB toolbox for derivative-free global optimization. DIRECTGO collects various deterministic derivative-free DIRECT-type algorithms for box-constrained, generally constrained, and problems with hidden constraints. Each sequential algorithm is implemented in two ways: using static and dynamic data structures for more efficient information storage and organization. Furthermore, parallel schemes are applied to some promising algorithms within DIRECTGO. The toolbox is equipped with a graphical user interface (GUI), ensuring the user-friendly use of all functionalities available in DIRECTGO. Available features are demonstrated in detailed computational studies using a comprehensive DIRECTGOLib v1.0 library of global optimization test problems. Additionally, 11 classical engineering design problems illustrate the potential of DIRECTGO to solve challenging real-world problems. Finally, the appendix gives examples of accompanying MATLAB programs and provides a synopsis of its use on the test problems with box and general constraints.

在这项工作中,我们介绍了DIRECTGO,一个新的用于无导数全局优化的MATLAB工具箱。DIRECTGO收集了各种确定性无导数的direct型算法,用于盒约束、一般约束和隐藏约束问题。每个顺序算法都以两种方式实现:使用静态和动态数据结构来更有效地存储和组织信息。此外,并行方案还应用于DIRECTGO中一些有前途的算法。该工具箱配备了图形用户界面(GUI),确保用户友好地使用DIRECTGO中可用的所有功能。使用全面的DIRECTGOLib v1.0库的全局优化测试问题,在详细的计算研究中展示了可用的功能。此外,11个经典工程设计问题说明了DIRECTGO在解决具有挑战性的现实问题方面的潜力。最后,附录给出了随附的MATLAB程序示例,并简要介绍了MATLAB在具有框约束和一般约束的测试问题中的应用。
{"title":"DIRECTGO: A New DIRECT-Type MATLAB Toolbox for Derivative-Free Global Optimization","authors":"Linas Stripinis, Remigijus Paulavičius","doi":"https://dl.acm.org/doi/10.1145/3559755","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3559755","url":null,"abstract":"<p>In this work, we introduce <monospace>DIRECTGO</monospace>, a new <monospace>MATLAB</monospace> toolbox for derivative-free global optimization. <monospace>DIRECTGO</monospace> collects various deterministic derivative-free <monospace>DIRECT</monospace>-type algorithms for box-constrained, generally constrained, and problems with hidden constraints. Each sequential algorithm is implemented in two ways: using static and dynamic data structures for more efficient information storage and organization. Furthermore, parallel schemes are applied to some promising algorithms within <monospace>DIRECTGO</monospace>. The toolbox is equipped with a graphical user interface (GUI), ensuring the user-friendly use of all functionalities available in <monospace>DIRECTGO</monospace>. Available features are demonstrated in detailed computational studies using a comprehensive <monospace>DIRECTGOLib v1.0</monospace> library of global optimization test problems. Additionally, 11 classical engineering design problems illustrate the potential of <monospace>DIRECTGO</monospace> to solve challenging real-world problems. Finally, the appendix gives examples of accompanying <monospace>MATLAB</monospace> programs and provides a synopsis of its use on the test problems with box and general constraints.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"52 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Waveform Relaxation with Asynchronous Time-integration 异步时间积分的波形松弛
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3569578
Peter Meisrimel, Philipp Birken

We consider Waveform Relaxation (WR) methods for parallel and partitioned time-integration of surface-coupled multiphysics problems. WR allows independent time-discretizations on independent and adaptive time-grids, while maintaining high time-integration orders. Classical WR methods such as Jacobi or Gauss-Seidel WR are typically either parallel or converge quickly.

We present a novel parallel WR method utilizing asynchronous communication techniques to get both properties. Classical WR methods exchange discrete functions after time-integration of a subproblem. We instead asynchronously exchange time-point solutions during time-integration and directly incorporate all new information in the interpolants. We show both continuous and time-discrete convergence in a framework that generalizes existing linear WR convergence theory. An algorithm for choosing optimal relaxation in our new WR method is presented.

Convergence is demonstrated in two conjugate heat transfer examples. Our new method shows an improved performance over classical WR methods. In one example, we show a partitioned coupling of the compressible Euler equations with a nonlinear heat equation, with subproblems implemented using the open source libraries DUNE and FEniCS.

研究了多物理场表面耦合问题的并行时间积分和分段时间积分的波形松弛方法。WR允许在独立和自适应时间网格上进行独立的时间离散,同时保持高时间积分顺序。经典WR方法如Jacobi或Gauss-Seidel WR通常是并行或快速收敛的。我们提出了一种新的并行WR方法,利用异步通信技术来获得这两个属性。经典WR方法在对子问题进行时间积分后交换离散函数。我们在时间积分期间异步交换时间点解,并直接将所有新信息合并到插值中。我们在一个推广现有线性WR收敛理论的框架中证明了连续和时间离散的收敛性。提出了一种选择最优松弛的算法。通过两个共轭传热实例证明了收敛性。与传统的WR方法相比,我们的新方法的性能得到了提高。在一个示例中,我们展示了可压缩欧拉方程与非线性热方程的分区耦合,并使用开源库DUNE和FEniCS实现了子问题。
{"title":"Waveform Relaxation with Asynchronous Time-integration","authors":"Peter Meisrimel, Philipp Birken","doi":"https://dl.acm.org/doi/10.1145/3569578","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3569578","url":null,"abstract":"<p>We consider Waveform Relaxation (WR) methods for parallel and partitioned time-integration of surface-coupled multiphysics problems. WR allows independent time-discretizations on independent and adaptive time-grids, while maintaining high time-integration orders. Classical WR methods such as Jacobi or Gauss-Seidel WR are typically either parallel or converge quickly.</p><p>We present a novel parallel WR method utilizing asynchronous communication techniques to get both properties. Classical WR methods exchange discrete functions after time-integration of a subproblem. We instead asynchronously exchange time-point solutions during time-integration and directly incorporate all new information in the interpolants. We show both continuous and time-discrete convergence in a framework that generalizes existing linear WR convergence theory. An algorithm for choosing optimal relaxation in our new WR method is presented. </p><p>Convergence is demonstrated in two conjugate heat transfer examples. Our new method shows an improved performance over classical WR methods. In one example, we show a partitioned coupling of the compressible Euler equations with a nonlinear heat equation, with subproblems implemented using the open source libraries <monospace>DUNE</monospace> and <monospace>FEniCS</monospace>.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"75 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1034: An Accelerated Algorithm to Compute the Qn Robust Statistic, with Corrections to Constants 算法1034:一种计算Qn鲁棒统计量的加速算法,并对常数进行校正
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-16 DOI: 10.1145/3576920
Thierry Fahmy
The robust scale estimator Qn developed by Croux and Rousseeuw [3], for the computation of which they provided a deterministic algorithm, has proven to be very useful in several domains including in quality management and time series analysis. It has interesting mathematical (50% breakdown, 82% Asymptotic Relative Efficiency) and computing (O(nlogn) time, O(n) space) properties. While working on a faster algorithm to compute Qn, we have discovered an error in the computation of the d constant, and as a consequence in the dn constants that are used to scale the statistic for consistency with the variance of a normal sample. These errors have been reproduced in several articles including in the International Standard Organisation 13,528 [12] document. In this article, we fix the errors and present a new approach, which includes a new algorithm, allowing computations to run 1.3 to 4.5 times faster when n grows from 10 to 100,000.
由Croux和Rousseeuw开发的鲁棒尺度估计器Qn,为其计算提供了一种确定性算法,已被证明在质量管理和时间序列分析等多个领域非常有用。它具有有趣的数学(50%击穿,82%渐近相对效率)和计算(O(nlogn)时间,O(n)空间)特性。在研究更快的算法来计算Qn时,我们发现了d常数计算中的一个错误,结果是用于缩放统计量以与正常样本的方差一致的dn常数。这些错误已经在包括国际标准组织13528[12]文件在内的几篇文章中重复出现。在本文中,我们修复了这些错误,并提出了一种新的方法,其中包括一个新的算法,当n从10增加到100,000时,计算速度可以提高1.3到4.5倍。
{"title":"Algorithm 1034: An Accelerated Algorithm to Compute the Qn Robust Statistic, with Corrections to Constants","authors":"Thierry Fahmy","doi":"10.1145/3576920","DOIUrl":"https://doi.org/10.1145/3576920","url":null,"abstract":"The robust scale estimator Qn developed by Croux and Rousseeuw [3], for the computation of which they provided a deterministic algorithm, has proven to be very useful in several domains including in quality management and time series analysis. It has interesting mathematical (50% breakdown, 82% Asymptotic Relative Efficiency) and computing (O(nlogn) time, O(n) space) properties. While working on a faster algorithm to compute Qn, we have discovered an error in the computation of the d constant, and as a consequence in the dn constants that are used to scale the statistic for consistency with the variance of a normal sample. These errors have been reproduced in several articles including in the International Standard Organisation 13,528 [12] document. In this article, we fix the errors and present a new approach, which includes a new algorithm, allowing computations to run 1.3 to 4.5 times faster when n grows from 10 to 100,000.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":"1 - 12"},"PeriodicalIF":2.7,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46597268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm xxx: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures xxx算法:分布式存储器结构上计算随机线性码最小距离的并行实现
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-05 DOI: 10.1145/3573383
G. Quintana-Ortí, Fernando Hernando, F. D. Igual
The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this paper, we introduce a family of implementations of the Brouwer-Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over (mathbb {F}_{2} ) . Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.
线性码的最小距离是信息论中的一个关键概念。因此,它的计算所需的时间对该领域的许多问题都非常重要。在本文中,我们介绍了分布式存储器体系结构的Brouwer-Zimmermann算法的一系列实现,用于计算(mathbb)上随机线性码的最小距离{F}_{2} )。当前的商业和公共领域软件都只在unicore架构或共享内存架构上工作,这在计算中使用的内核/处理器数量方面受到限制。我们的实现专注于分布式内存架构,因此能够在计算最小距离时使用数百甚至数千个内核。我们的实验结果表明,我们的实现比目前广泛使用的实现快得多,甚至高达几个数量级。
{"title":"Algorithm xxx: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures","authors":"G. Quintana-Ortí, Fernando Hernando, F. D. Igual","doi":"10.1145/3573383","DOIUrl":"https://doi.org/10.1145/3573383","url":null,"abstract":"\u0000 The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this paper, we introduce a family of implementations of the Brouwer-Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over\u0000 \u0000 (mathbb {F}_{2} )\u0000 \u0000 . Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.\u0000","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45456030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Array-Aware Matching: Taming the Complexity of Large-Scale Simulation Models 阵列感知匹配:驯服大规模仿真模型的复杂性
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-11-22 DOI: 10.1145/3611661
Massimo Fioravanti, Daniele Cattaneo, F. Terraneo, Silvano Seva, Stefano Cherubin, G. Agosta, F. Casella, A. Leva
Equation-based modelling is a powerful approach to tame the complexity of large-scale simulation problems. Equation-based tools automatically translate models into imperative languages. When confronted with nowadays’ problems, however, well assessed model translation techniques exhibit scalability issues that are particularly severe when models contain very large arrays. In fact, such models can be made very compact by enclosing equations into looping constructs, but reflecting the same compactness into the translated imperative code is nontrivial. In this paper, we face this issue by concentrating on a key step of equations-to-code translation, the equation/variable matching. We first show that an efficient translation of models with (large) arrays needs awareness of their presence, by defining a figure of merit to measure how much the looping constructs are preserved along the translation. We then show that the said figure of merit allows to define an optimal array-aware matching, and as our main result, that the so stated optimal array-aware matching problem is NP-complete. As an additional result, we propose a heuristic algorithm capable of performing array-aware matching in polynomial time. The proposed algorithm can be proficiently used by model translator developers in the implementation of efficient tools for large-scale system simulation.
基于方程的建模是抑制大规模模拟问题复杂性的一种强大方法。基于公式的工具会自动将模型转换为命令式语言。然而,当遇到当今的问题时,经过良好评估的模型转换技术会表现出可扩展性问题,当模型包含非常大的数组时,这种问题尤其严重。事实上,通过将方程封装到循环结构中,可以使此类模型变得非常紧凑,但将同样的紧凑性反映到翻译的命令式代码中是不平凡的。在本文中,我们通过集中讨论方程到代码转换的一个关键步骤来解决这个问题,即方程/变量匹配。我们首先表明,具有(大)数组的模型的有效翻译需要意识到它们的存在,通过定义一个优值来衡量在翻译过程中保留了多少循环结构。然后,我们证明了所述品质因数允许定义最优阵列感知匹配,并且作为我们的主要结果,所述最优阵列感知匹配问题是NP完全的。作为另一个结果,我们提出了一种启发式算法,能够在多项式时间内执行阵列感知匹配。模型翻译器开发人员可以熟练地使用所提出的算法来实现大规模系统仿真的高效工具。
{"title":"Array-Aware Matching: Taming the Complexity of Large-Scale Simulation Models","authors":"Massimo Fioravanti, Daniele Cattaneo, F. Terraneo, Silvano Seva, Stefano Cherubin, G. Agosta, F. Casella, A. Leva","doi":"10.1145/3611661","DOIUrl":"https://doi.org/10.1145/3611661","url":null,"abstract":"Equation-based modelling is a powerful approach to tame the complexity of large-scale simulation problems. Equation-based tools automatically translate models into imperative languages. When confronted with nowadays’ problems, however, well assessed model translation techniques exhibit scalability issues that are particularly severe when models contain very large arrays. In fact, such models can be made very compact by enclosing equations into looping constructs, but reflecting the same compactness into the translated imperative code is nontrivial. In this paper, we face this issue by concentrating on a key step of equations-to-code translation, the equation/variable matching. We first show that an efficient translation of models with (large) arrays needs awareness of their presence, by defining a figure of merit to measure how much the looping constructs are preserved along the translation. We then show that the said figure of merit allows to define an optimal array-aware matching, and as our main result, that the so stated optimal array-aware matching problem is NP-complete. As an additional result, we propose a heuristic algorithm capable of performing array-aware matching in polynomial time. The proposed algorithm can be proficiently used by model translator developers in the implementation of efficient tools for large-scale system simulation.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 25"},"PeriodicalIF":2.7,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42067557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algorithm 1031: MQSI—Monotone Quintic Spline Interpolation 算法1031:MQSI——单调五次样条插值
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-11-01 DOI: 10.1145/3570157
T. Lux, L.T. Watson, Tyler H. Chang, W. Thacker
MQSI is a Fortran 2003 subroutine for constructing monotone quintic spline interpolants to univariate monotone data. Using sharp theoretical monotonicity constraints, first and second derivative estimates at data provided by a quadratic facet model are refined to produce a univariate C2 monotone interpolant. Algorithm and implementation details, complexity and sensitivity analyses, usage information, a brief performance study, and comparisons with other spline approaches are included.
MQSI是一个Fortran 2003子程序,用于构造单变量单调数据的单调五次样条插值。使用尖锐的理论单调性约束,对二次面模型提供的数据进行一阶和二阶导数估计,以产生单变量C2单调插值。包括算法和实现细节、复杂性和灵敏度分析、使用信息、简短的性能研究以及与其他样条方法的比较。
{"title":"Algorithm 1031: MQSI—Monotone Quintic Spline Interpolation","authors":"T. Lux, L.T. Watson, Tyler H. Chang, W. Thacker","doi":"10.1145/3570157","DOIUrl":"https://doi.org/10.1145/3570157","url":null,"abstract":"MQSI is a Fortran 2003 subroutine for constructing monotone quintic spline interpolants to univariate monotone data. Using sharp theoretical monotonicity constraints, first and second derivative estimates at data provided by a quadratic facet model are refined to produce a univariate C2 monotone interpolant. Algorithm and implementation details, complexity and sensitivity analyses, usage information, a brief performance study, and comparisons with other spline approaches are included.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 17"},"PeriodicalIF":2.7,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45404093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Algorithm 1032: Bi-cubic Splines for Polyhedral Control Nets 算法1032:多面体控制网的双三次样条
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-10-31 DOI: 10.1145/3570158
J. Peters, K. Lo, K. Karčiauskas
For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where n ≠ 4 quadrilateral faces join around an interior vertex, n-gon configurations, where 2n quadrilaterals surround an n-gon, polar configurations where a cone of n triangles meeting at a vertex is surrounded by a ribbon of n quadrilaterals, and three types of T-junctions where two quad-strips merge into one. The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.
对于控制网概述了一大类拓扑多面体,而不仅仅是张量积网格,双三次多面体样条形成了一个分段多项式,一阶可微空间,将一个函数与每个顶点相关联。类似于张量积样条,得到的光滑表面近似于多面体。可接受的多面体控制网由网格状布局的四边形面组成,星形结构(n≠4个四边形面围绕一个内部顶点连接),n形结构(2n个四边形围绕一个n形),极形结构(n个三角形的圆锥在一个顶点会合,被n个四边形的带包围),以及三种类型的t形结(两个四边形合并为一个)。多面体样条的双立方块沿其断行具有匹配的导数,可能在已知变量变化之后。这些块以bernstein - bsamzier形式表示,其系数线性依赖于多面体控制网,因此评估、微分、积分、矩等并不比标准张量积样条花费更多。双三次多面体样条既可用于几何建模,也可用于几何上的函数计算。虽然多面体样条不能通过控制网的细化提供嵌套细化,但多面体样条支持曲面光滑对象的工程分析。粗网通常就足够了,因为样条可以有效地模拟曲线特征。算法1032是一个c++库,具有输入输出示例对和IGES输出选择。
{"title":"Algorithm 1032: Bi-cubic Splines for Polyhedral Control Nets","authors":"J. Peters, K. Lo, K. Karčiauskas","doi":"10.1145/3570158","DOIUrl":"https://doi.org/10.1145/3570158","url":null,"abstract":"For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where n ≠ 4 quadrilateral faces join around an interior vertex, n-gon configurations, where 2n quadrilaterals surround an n-gon, polar configurations where a cone of n triangles meeting at a vertex is surrounded by a ribbon of n quadrilaterals, and three types of T-junctions where two quad-strips merge into one. The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 12"},"PeriodicalIF":2.7,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41729465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
ACM Transactions on Mathematical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1