arXiv - CS - Mathematical Software最新文献

英文中文

Semidefinite Programming by Projective Cutting Planes 投影切割平面的半定规划

arXiv - CS - Mathematical Software

Pub Date : 2023-11-15 DOI: arxiv-2311.09365

Daniel Porumbel

Seeking tighter relaxations of combinatorial optimization problems,semidefinite programming is a generalization of linear programming that offersbetter bounds and is still polynomially solvable. Yet, in practice, asemidefinite program is still significantly harder to solve than a similar-sizeLinear Program (LP). It is well-known that a semidefinite program can bewritten as an LP with infinitely-many cuts that could be solved by repeatedseparation in a Cutting-Planes scheme; this approach is likely to end up infailure. We proposed in [Projective Cutting-Planes, Daniel Porumbel, SiamJournal on Optimization, 2020] the Projective Cutting-Planes method thatupgrades t he well-known separation sub-problem to the projection sub-problem:given a feasible $y$ inside a polytope $P$ and a direction $d$, find themaximum $t^*$ so that $y+t^*din P$. Using this new sub-problem, one cangenerate a sequence of both inner and outer solutions that converge to theoptimum over $P$. This paper shows that the projection sub-problem can besolved very efficiently in a semidefinite programming context, enabling theresulting method to compete very well with state-of-the-art semidefiniteoptimization software (refined over decades). Results suggest it may thefastest method for matrix sizes larger than $2000times 2000$.

寻求组合优化问题的更紧密松弛，半定规划是线性规划的推广，它提供了更好的边界，并且仍然是多项式可解的。然而，在实践中，半确定程序仍然比类似规模的线性程序(LP)更难求解。众所周知，半定规划可以写成具有无限多个切口的LP，该LP可以通过切割平面格式中的重复分离来求解;这种方法很可能以失败告终。我们在[射影切割-平面，Daniel Porumbel, SiamJournal on Optimization, 2020]中提出了将分离子问题升级为投影子问题的射影切割-平面方法:给定多面体$P$内的可行$y$和方向$d$，找到最大$t^*$，使得$y+t^*d在P$中。利用这个新的子问题，我们可以得到一个内外解的序列，它们收敛于P$上的最优解。本文表明，在半确定规划环境中，投影子问题可以非常有效地解决，使所得到的方法能够与最先进的半确定优化软件(经过几十年的改进)竞争。结果表明，它可能是矩阵大小大于$2000 × 2000$的最快方法。

{"title":"Semidefinite Programming by Projective Cutting Planes","authors":"Daniel Porumbel","doi":"arxiv-2311.09365","DOIUrl":"https://doi.org/arxiv-2311.09365","url":null,"abstract":"Seeking tighter relaxations of combinatorial optimization problems,\u0000semidefinite programming is a generalization of linear programming that offers\u0000better bounds and is still polynomially solvable. Yet, in practice, a\u0000semidefinite program is still significantly harder to solve than a similar-size\u0000Linear Program (LP). It is well-known that a semidefinite program can be\u0000written as an LP with infinitely-many cuts that could be solved by repeated\u0000separation in a Cutting-Planes scheme; this approach is likely to end up in\u0000failure. We proposed in [Projective Cutting-Planes, Daniel Porumbel, Siam\u0000Journal on Optimization, 2020] the Projective Cutting-Planes method that\u0000upgrades t he well-known separation sub-problem to the projection sub-problem:\u0000given a feasible $y$ inside a polytope $P$ and a direction $d$, find the\u0000maximum $t^*$ so that $y+t^*din P$. Using this new sub-problem, one can\u0000generate a sequence of both inner and outer solutions that converge to the\u0000optimum over $P$. This paper shows that the projection sub-problem can be\u0000solved very efficiently in a semidefinite programming context, enabling the\u0000resulting method to compete very well with state-of-the-art semidefinite\u0000optimization software (refined over decades). Results suggest it may the\u0000fastest method for matrix sizes larger than $2000times 2000$.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"15 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Case Study in Analytic Protocol Analysis in ACL2 ACL2中分析协议分析的案例研究

arXiv - CS - Mathematical Software

Pub Date : 2023-11-15 DOI: arxiv-2311.08855

Max von HippelNortheastern University, Panagiotis ManoliosNortheastern University, Kenneth L. McMillanUniversity of Texas at Austin, Cristina Nita-RotaruNortheastern University, Lenore ZuckUniversity of Illinois Chicago

When verifying computer systems we sometimes want to study their asymptoticbehaviors, i.e., how they behave in the long run. In such cases, we need realanalysis, the area of mathematics that deals with limits and the foundations ofcalculus. In a prior work, we used real analysis in ACL2s to study theasymptotic behavior of the RTO computation, commonly used in congestion controlalgorithms across the Internet. One key component in our RTO computationanalysis was proving in ACL2s that for all alpha in [0, 1), the limit as napproaches infinity of alpha raised to n is zero. Whereas the most obviousproof strategy involves the logarithm, whose codomain includes irrationals, bydefault ACL2 only supports rationals, which forced us to take a non-standardapproach. In this paper, we explore different approaches to proving the aboveresult in ACL2(r) and ACL2s, from the perspective of a relatively new user toeach. We also contextualize the theorem by showing how it allowed us to proveimportant asymptotic properties of the RTO computation. Finally, we discusstradeoffs between the various proof strategies and directions for futureresearch.

在验证计算机系统时，我们有时想研究它们的渐近行为，即它们在长期运行中的行为。在这种情况下，我们需要真正的分析，这是数学中处理极限和微积分基础的领域。在之前的工作中，我们在ACL2s中使用实态分析来研究RTO计算的渐近行为，RTO计算通常用于互联网上的拥塞控制算法。我们的RTO计算分析中的一个关键部分是在ACL2s中证明，对于[0,1)中的所有α， α趋近于无穷时的极限为0。虽然最明显的证明策略涉及对数，其上域包括无理数，但默认情况下ACL2只支持有理数，这迫使我们采取非标准方法。在本文中，我们从一个相对较新的用户的角度，探讨了在ACL2(r)和ACL2s中证明上述结果的不同方法。我们还通过展示它如何允许我们证明RTO计算的重要渐近性质来将定理上下文化。最后，我们讨论了各种证明策略之间的权衡和未来研究的方向。

{"title":"A Case Study in Analytic Protocol Analysis in ACL2","authors":"Max von HippelNortheastern University, Panagiotis ManoliosNortheastern University, Kenneth L. McMillanUniversity of Texas at Austin, Cristina Nita-RotaruNortheastern University, Lenore ZuckUniversity of Illinois Chicago","doi":"arxiv-2311.08855","DOIUrl":"https://doi.org/arxiv-2311.08855","url":null,"abstract":"When verifying computer systems we sometimes want to study their asymptotic\u0000behaviors, i.e., how they behave in the long run. In such cases, we need real\u0000analysis, the area of mathematics that deals with limits and the foundations of\u0000calculus. In a prior work, we used real analysis in ACL2s to study the\u0000asymptotic behavior of the RTO computation, commonly used in congestion control\u0000algorithms across the Internet. One key component in our RTO computation\u0000analysis was proving in ACL2s that for all alpha in [0, 1), the limit as n\u0000approaches infinity of alpha raised to n is zero. Whereas the most obvious\u0000proof strategy involves the logarithm, whose codomain includes irrationals, by\u0000default ACL2 only supports rationals, which forced us to take a non-standard\u0000approach. In this paper, we explore different approaches to proving the above\u0000result in ACL2(r) and ACL2s, from the perspective of a relatively new user to\u0000each. We also contextualize the theorem by showing how it allowed us to prove\u0000important asymptotic properties of the RTO computation. Finally, we discuss\u0000tradeoffs between the various proof strategies and directions for future\u0000research.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"17 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors 在英特尔、AMD和富士通处理器上批量、小矩阵和矩形矩阵乘法的缓存优化和性能建模

arXiv - CS - Mathematical Software

Pub Date : 2023-11-11 DOI: arxiv-2311.07602

Sameer Deshmukh, Rio Yokota, George Bosilca

Factorization and multiplication of dense matrices and tensors are critical,yet extremely expensive pieces of the scientific toolbox. Careful use of lowrank approximation can drastically reduce the computation and memoryrequirements of these operations. In addition to a lower arithmetic complexity,such methods can, by their structure, be designed to efficiently exploit modernhardware architectures. The majority of existing work relies on batched BLASlibraries to handle the computation of many small dense matrices. We show thatthrough careful analysis of the cache utilization, register accumulation usingSIMD registers and a redesign of the implementation, one can achievesignificantly higher throughput for these types of batched low-rank matricesacross a large range of block and batch sizes. We test our algorithm on 3 CPUsusing diverse ISAs -- the Fujitsu A64FX using ARM SVE, the Intel Xeon 6148using AVX-512 and AMD EPYC 7502 using AVX-2, and show that our new batchingmethodology is able to obtain more than twice the throughput of vendoroptimized libraries for all CPU architectures and problem sizes.

密集矩阵和张量的因式分解和乘法是科学工具箱中至关重要但又极其昂贵的部分。仔细使用低秩近似可以大大减少这些操作的计算和内存需求。除了较低的算术复杂度外，这些方法还可以通过其结构设计来有效地利用现代硬件体系结构。现有的大部分工作依赖于批处理的BLASlibraries来处理许多小的密集矩阵的计算。我们表明，通过仔细分析缓存利用率、使用simd寄存器的寄存器积累和重新设计实现，可以在大范围的块和批大小中实现这些类型的批处理低秩矩阵的显着更高的吞吐量。我们使用不同的isa在3个CPU上测试我们的算法-使用ARM SVE的富士通A64FX，使用AVX-512的英特尔至强6148和使用AVX-2的AMD EPYC 7502，并表明我们的新批处理方法能够获得超过两倍的吞吐量供应商优化库对于所有CPU架构和问题大小。

{"title":"Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors","authors":"Sameer Deshmukh, Rio Yokota, George Bosilca","doi":"arxiv-2311.07602","DOIUrl":"https://doi.org/arxiv-2311.07602","url":null,"abstract":"Factorization and multiplication of dense matrices and tensors are critical,\u0000yet extremely expensive pieces of the scientific toolbox. Careful use of low\u0000rank approximation can drastically reduce the computation and memory\u0000requirements of these operations. In addition to a lower arithmetic complexity,\u0000such methods can, by their structure, be designed to efficiently exploit modern\u0000hardware architectures. The majority of existing work relies on batched BLAS\u0000libraries to handle the computation of many small dense matrices. We show that\u0000through careful analysis of the cache utilization, register accumulation using\u0000SIMD registers and a redesign of the implementation, one can achieve\u0000significantly higher throughput for these types of batched low-rank matrices\u0000across a large range of block and batch sizes. We test our algorithm on 3 CPUs\u0000using diverse ISAs -- the Fujitsu A64FX using ARM SVE, the Intel Xeon 6148\u0000using AVX-512 and AMD EPYC 7502 using AVX-2, and show that our new batching\u0000methodology is able to obtain more than twice the throughput of vendor\u0000optimized libraries for all CPU architectures and problem sizes.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Framework for Global Non-Convex Polynomial Optimization with Nonlinear Polynomial Constraints 具有非线性多项式约束的全局非凸多项式优化的有效框架

arXiv - CS - Mathematical Software

Pub Date : 2023-11-03 DOI: arxiv-2311.02037

Mitchell Tong Harris, Pierre-David Letourneau, Dalton Jones, M. Harper Langston

We present an efficient framework for solving constrained global non-convexpolynomial optimization problems. We prove the existence of an equivalentnonlinear reformulation of such problems that possesses essentially no spuriouslocal minima. We show through numerical experiments that polynomial scaling indimension and degree is achievable for computing the optimal value and locationof previously intractable global constrained polynomial optimization problemsin high dimension.

我们提出了一个求解受限全局非凸多项式优化问题的有效框架。我们证明了这类问题的等效非线性重表述的存在性，它基本上不具有伪局部极小值。我们通过数值实验证明，对于先前难以解决的高维全局约束多项式优化问题的最优值和最优位置的计算，多项式标度和度是可以实现的。

引用次数: 0

$O(N)$ distributed direct factorization of structured dense matrices using runtime systems 基于运行时系统的结构化密集矩阵的O(N)分布直接分解

arXiv - CS - Mathematical Software

Pub Date : 2023-11-02 DOI: arxiv-2311.00921

Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca

Structured dense matrices result from boundary integral problems inelectrostatics and geostatistics, and also Schur complements in sparsepreconditioners such as multi-frontal methods. Exploiting the structure of suchmatrices can reduce the time for dense direct factorization from $O(N^3)$ to$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rankmatrix format that can be factorized using a Cholesky-like algorithm called ULVfactorization. The HSS-ULV algorithm is highly parallel because it removes thedependency on trailing sub-matrices at each HSS level. However, a key mergestep that links two successive HSS levels remains a challenge for efficientparallelization. In this paper, we use an asynchronous runtime system PaRSECwith the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, bothstate-of-the-art implementations of dense direct low rank factorization, andachieve up to 2x better factorization time for matrices arising from a diverseset of applications on up to 128 nodes of Fugaku for similar or better accuracyfor all the problems that we survey.

结构密集矩阵来源于静电学和地统计学中的边界积分问题，也来源于稀疏预处理中的Schur互补，如多正面方法。利用这种矩阵的结构可以将密集直接分解的时间从$O(N^3)$减少到$O(N)$。层次半可分(HSS)矩阵就是这样一种低秩矩阵格式，可以使用称为ULVfactorization的类cholesky算法进行分解。HSS- ulv算法是高度并行的，因为它消除了对每个HSS级别的尾子矩阵的依赖。然而，连接两个连续HSS级别的关键合并步骤仍然是有效并行化的挑战。在本文中，我们使用了一个异步运行时系统parsecs与HSS-ULV算法。我们将我们的工作与STRUMPACK和LORAPO进行了比较，两者都是最先进的密集直接低秩分解实现，并且在多达128个Fugaku节点上对各种应用产生的矩阵实现了高达2倍的分解时间，对于我们调查的所有问题具有相似或更好的准确性。

{"title":"$O(N)$ distributed direct factorization of structured dense matrices using runtime systems","authors":"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca","doi":"arxiv-2311.00921","DOIUrl":"https://doi.org/arxiv-2311.00921","url":null,"abstract":"Structured dense matrices result from boundary integral problems in\u0000electrostatics and geostatistics, and also Schur complements in sparse\u0000preconditioners such as multi-frontal methods. Exploiting the structure of such\u0000matrices can reduce the time for dense direct factorization from $O(N^3)$ to\u0000$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\u0000matrix format that can be factorized using a Cholesky-like algorithm called ULV\u0000factorization. The HSS-ULV algorithm is highly parallel because it removes the\u0000dependency on trailing sub-matrices at each HSS level. However, a key merge\u0000step that links two successive HSS levels remains a challenge for efficient\u0000parallelization. In this paper, we use an asynchronous runtime system PaRSEC\u0000with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\u0000state-of-the-art implementations of dense direct low rank factorization, and\u0000achieve up to 2x better factorization time for matrices arising from a diverse\u0000set of applications on up to 128 nodes of Fugaku for similar or better accuracy\u0000for all the problems that we survey.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU 深度学习稀疏矩阵核在Intel Max系列GPU上的性能优化

arXiv - CS - Mathematical Software

Pub Date : 2023-11-01 DOI: arxiv-2311.00368

Mohammad Zubair, Christoph Bauinger

In this paper, we focus on three sparse matrix operations that are relevantfor machine learning applications, namely, the sparse-dense matrixmultiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM),and the composition of the SDDMM with SPMM, also termed as FusedMM. We developoptimized implementations for SPMM, SDDMM, and FusedMM operations utilizingIntel oneAPI's Explicit SIMD (ESIMD) SYCL extension API. In contrast to CUDA orSYCL, the ESIMD API enables the writing of explicitly vectorized kernel code.Sparse matrix algorithms implemented with the ESIMD API achieved performanceclose to the peak of the targeted Intel Data Center GPU. We compare ourperformance results to Intel's oneMKL library on Intel GPUs and to a recentCUDA implementation for the sparse matrix operations on NVIDIA's V100 GPU anddemonstrate that our implementations for sparse matrix operations outperformeither.

在本文中，我们重点研究了与机器学习应用相关的三种稀疏矩阵运算，即稀疏密集矩阵乘法(SPMM)，采样密集密集矩阵乘法(SDDMM)以及SDDMM与SPMM的组合，也称为FusedMM。我们利用英特尔oneAPI的显式SIMD (ESIMD) SYCL扩展API开发了SPMM, SDDMM和FusedMM操作的优化实现。与CUDA或sycl相比，ESIMD API支持编写显式向量化的内核代码。使用ESIMD API实现的稀疏矩阵算法的性能接近目标英特尔数据中心GPU的峰值。我们将我们的性能结果与英特尔GPU上的英特尔oneMKL库和NVIDIA V100 GPU上的稀疏矩阵操作的最近cuda实现进行了比较，并证明我们的稀疏矩阵操作实现的性能优于两者。

引用次数: 0

NoMoPy: Noise Modeling in Python NoMoPy: Python中的噪声建模

arXiv - CS - Mathematical Software

Pub Date : 2023-10-31 DOI: arxiv-2311.00084

Dylan Albrecht, N. Tobias Jacobson

NoMoPy is a code for fitting, analyzing, and generating noise modeled as ahidden Markov model (HMM) or, more generally, factorial hidden Markov model(FHMM). This code, written in Python, implements approximate and exactexpectation maximization (EM) algorithms for performing the parameterestimation process, model selection procedures via cross-validation, andparameter confidence region estimation. Here, we describe in detail thefunctionality implemented in NoMoPy and provide examples of its use andperformance on example problems.

NoMoPy是一个用于拟合、分析和生成噪声的代码，该噪声建模为隐马尔可夫模型(HMM)，或者更一般地说，是阶乘隐马尔可夫模型(FHMM)。这段代码用Python编写，实现了近似和精确期望最大化(EM)算法，用于执行参数估计过程、通过交叉验证的模型选择过程和参数置信区域估计。在这里，我们详细描述了在NoMoPy中实现的功能，并提供了它在示例问题上的使用和性能示例。

引用次数: 0

Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices 多层次低秩矩阵的因子拟合、秩分配与划分

arXiv - CS - Mathematical Software

Pub Date : 2023-10-30 DOI: arxiv-2310.19214

Tetiana Parshakova, Trevor Hastie, Eric Darve, Stephen Boyd

We consider multilevel low rank (MLR) matrices, defined as a row and columnpermutation of a sum of matrices, each one a block diagonal refinement of theprevious one, with all blocks low rank given in factored form. MLR matricesextend low rank matrices but share many of their properties, such as the totalstorage required and complexity of matrix-vector multiplication. We addressthree problems that arise in fitting a given matrix by an MLR matrix in theFrobenius norm. The first problem is factor fitting, where we adjust thefactors of the MLR matrix. The second is rank allocation, where we choose theranks of the blocks in each level, subject to the total rank having a givenvalue, which preserves the total storage needed for the MLR matrix. The finalproblem is to choose the hierarchical partition of rows and columns, along withthe ranks and factors. This paper is accompanied by an open source package thatimplements the proposed methods.

我们考虑多层低秩(MLR)矩阵，定义为矩阵和的行和列置换，每个矩阵都是前一个矩阵的块对角细化，所有低秩块都以因子形式给出。MLR矩阵是对低秩矩阵的扩展，但与它们的许多特性相同，例如所需的总存储空间和矩阵-向量乘法的复杂性。我们解决了在frobenius范数中的MLR矩阵拟合给定矩阵时出现的三个问题。第一个问题是因子拟合，我们调整MLR矩阵的因子。第二个是秩分配，我们在每个级别中选择块的秩，服从具有给定值的总秩，这保留了MLR矩阵所需的总存储空间。最后一个问题是选择行和列的分层划分，以及秩和因子。本文附带了一个实现所提出方法的开源包。

引用次数: 0

A Survey of Methods for Estimating Hurst Exponent of Time Sequence 时间序列赫斯特指数估计方法综述

arXiv - CS - Mathematical Software

Pub Date : 2023-10-29 DOI: arxiv-2310.19051

Hong-Yan Zhang, Zhi-Qiang Feng, Si-Yu Feng, Yu Zhou

The Hurst exponent is a significant indicator for characterizing theself-similarity and long-term memory properties of time sequences. It has wideapplications in physics, technologies, engineering, mathematics, statistics,economics, psychology and so on. Currently, available methods for estimatingthe Hurst exponent of time sequences can be divided into different categories:time-domain methods and spectrum-domain methods based on the representation oftime sequence, linear regression methods and Bayesian methods based onparameter estimation methods. Although various methods are discussed inliterature, there are still some deficiencies: the descriptions of theestimation algorithms are just mathematics-oriented and the pseudo-codes aremissing; the effectiveness and accuracy of the estimation algorithms are notclear; the classification of estimation methods is not considered and there isa lack of guidance for selecting the estimation methods. In this work, theemphasis is put on thirteen dominant methods for estimating the Hurst exponent.For the purpose of decreasing the difficulty of implementing the estimationmethods with computer programs, the mathematical principles are discussedbriefly and the pseudo-codes of algorithms are presented with necessarydetails. It is expected that the survey could help the researchers to select,implement and apply the estimation algorithms of interest in practicalsituations in an easy way.

赫斯特指数是表征时间序列的自相似性和长时记忆特性的重要指标。它在物理、技术、工程、数学、统计学、经济学、心理学等领域都有广泛的应用。目前可用的估计时间序列Hurst指数的方法可分为基于时间序列表示的时域方法和谱域方法，基于参数估计的线性回归方法和贝叶斯方法。虽然文献中讨论了各种方法，但仍存在一些不足:对估计算法的描述只是面向数学的，缺少伪码;估计算法的有效性和准确性尚不明确;没有考虑估计方法的分类，缺乏对估计方法选择的指导。在这项工作中，重点放在估计赫斯特指数的十三种主要方法上。为了降低用计算机程序实现估计方法的难度，本文简要讨论了估计方法的数学原理，并给出了算法的伪代码，给出了必要的细节。期望该调查能够帮助研究人员在实际情况中简单地选择、实现和应用感兴趣的估计算法。

{"title":"A Survey of Methods for Estimating Hurst Exponent of Time Sequence","authors":"Hong-Yan Zhang, Zhi-Qiang Feng, Si-Yu Feng, Yu Zhou","doi":"arxiv-2310.19051","DOIUrl":"https://doi.org/arxiv-2310.19051","url":null,"abstract":"The Hurst exponent is a significant indicator for characterizing the\u0000self-similarity and long-term memory properties of time sequences. It has wide\u0000applications in physics, technologies, engineering, mathematics, statistics,\u0000economics, psychology and so on. Currently, available methods for estimating\u0000the Hurst exponent of time sequences can be divided into different categories:\u0000time-domain methods and spectrum-domain methods based on the representation of\u0000time sequence, linear regression methods and Bayesian methods based on\u0000parameter estimation methods. Although various methods are discussed in\u0000literature, there are still some deficiencies: the descriptions of the\u0000estimation algorithms are just mathematics-oriented and the pseudo-codes are\u0000missing; the effectiveness and accuracy of the estimation algorithms are not\u0000clear; the classification of estimation methods is not considered and there is\u0000a lack of guidance for selecting the estimation methods. In this work, the\u0000emphasis is put on thirteen dominant methods for estimating the Hurst exponent.\u0000For the purpose of decreasing the difficulty of implementing the estimation\u0000methods with computer programs, the mathematical principles are discussed\u0000briefly and the pseudo-codes of algorithms are presented with necessary\u0000details. It is expected that the survey could help the researchers to select,\u0000implement and apply the estimation algorithms of interest in practical\u0000situations in an easy way.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"16 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tackling the Matrix Multiplication Micro-kernel Generation with Exo 用Exo处理矩阵乘法微核生成

arXiv - CS - Mathematical Software

Pub Date : 2023-10-26 DOI: arxiv-2310.17408

Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez

The optimization of the matrix multiplication (or GEMM) has been a needduring the last decades. This operation is considered the flagship of currentlinear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of itswidespread use in a large variety of scientific applications. The GEMM isusually implemented following the GotoBLAS philosophy, which tiles the GEMMoperands and uses a series of nested loops for performance improvement. Theseapproaches extract the maximum computational power of the architectures throughsmall pieces of hardware-oriented, high-performance code called micro-kernel.However, this approach forces developers to generate, with a non-negligibleeffort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generatingmicro-kernels with the Exo compiler that performs close to (or even betterthan) manually developed microkernels written with intrinsic functions orassembly language. Our solution also improves the portability of the generatedcode, since a hardware target is fully specified by a concise library-baseddescription of its instructions.

在过去的几十年中，矩阵乘法(GEMM)的优化一直是一种需求。该操作被认为是当前线性代数库(如BLIS、OpenBLAS或Intel OneAPI)的旗舰，因为它在各种科学应用中广泛使用。GEMM通常遵循GotoBLAS理念实现，该理念将gemmooperands进行平装，并使用一系列嵌套循环来提高性能。这些方法通过称为微内核的面向硬件的小块高性能代码提取体系结构的最大计算能力。然而，这种方法迫使开发人员付出不可忽视的努力，为每个新硬件生成一个专用的微内核。在这项工作中，我们提出了一个逐步使用Exo编译器生成微内核的过程，该编译器的性能接近(甚至优于)用内在函数或汇编语言编写的手动开发的微内核。我们的解决方案还提高了生成代码的可移植性，因为硬件目标完全由其指令的简洁的基于库的描述指定。

{"title":"Tackling the Matrix Multiplication Micro-kernel Generation with Exo","authors":"Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez","doi":"arxiv-2310.17408","DOIUrl":"https://doi.org/arxiv-2310.17408","url":null,"abstract":"The optimization of the matrix multiplication (or GEMM) has been a need\u0000during the last decades. This operation is considered the flagship of current\u0000linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its\u0000widespread use in a large variety of scientific applications. The GEMM is\u0000usually implemented following the GotoBLAS philosophy, which tiles the GEMM\u0000operands and uses a series of nested loops for performance improvement. These\u0000approaches extract the maximum computational power of the architectures through\u0000small pieces of hardware-oriented, high-performance code called micro-kernel.\u0000However, this approach forces developers to generate, with a non-negligible\u0000effort, a dedicated micro-kernel for each new hardware. In this work, we present a step-by-step procedure for generating\u0000micro-kernels with the Exo compiler that performs close to (or even better\u0000than) manually developed microkernels written with intrinsic functions or\u0000assembly language. Our solution also improves the portability of the generated\u0000code, since a hardware target is fully specified by a concise library-based\u0000description of its instructions.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Mathematical Software

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀