The BTPE algorithm (Binomial, Triangle, Parallelogram, Exponential) of Kachitvichyanukul and Schmeiser is one of the faster and more widely utilized algorithms for generating binomial random variates. Cicirello's open source Java library, $rhomu$, includes an implementation of BTPE as well as a variety of other random number related utilities. In this report, I explore the average case runtime of the BTPE algorithm when generating random values from binomial distribution $B(n,p)$. Beginning with Kachitvichyanukul and Schmeiser's formula for the expected number of acceptance-rejection sampling iterations, I analyze the limit behavior as $n$ approaches infinity, and show that the average runtime of BTPE converges to a constant. I instrument the open source Java implementation from the $rhomu$ library to experimentally validate the analysis.
{"title":"On the Average Runtime of an Open Source Binomial Random Variate Generation Algorithm","authors":"Vincent A. Cicirello","doi":"arxiv-2403.11018","DOIUrl":"https://doi.org/arxiv-2403.11018","url":null,"abstract":"The BTPE algorithm (Binomial, Triangle, Parallelogram, Exponential) of\u0000Kachitvichyanukul and Schmeiser is one of the faster and more widely utilized\u0000algorithms for generating binomial random variates. Cicirello's open source\u0000Java library, $rhomu$, includes an implementation of BTPE as well as a\u0000variety of other random number related utilities. In this report, I explore the\u0000average case runtime of the BTPE algorithm when generating random values from\u0000binomial distribution $B(n,p)$. Beginning with Kachitvichyanukul and\u0000Schmeiser's formula for the expected number of acceptance-rejection sampling\u0000iterations, I analyze the limit behavior as $n$ approaches infinity, and show\u0000that the average runtime of BTPE converges to a constant. I instrument the open\u0000source Java implementation from the $rhomu$ library to experimentally\u0000validate the analysis.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140172911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexej Moskovka, Talal Rahman, Jan Valdman, Jon Eivind Vatne
When writing high-performance code for numerical computation in a scripting language like MATLAB, it is crucial to have the operations in a large for-loop vectorized. If not, the code becomes too slow to use, even for a moderately large problem. However, in the process of vectorizing, the code often loses its original structure and becomes less readable. This is particularly true in the case of a finite element implementation, even though finite element methods are inherently structured. A basic remedy to this is the separation of the vectorization part from the mathematics part of the code, which is easily achieved through building the code on top of the basic linear algebra subprograms that are already vectorized codes, an idea that has been used in a series of papers over the last fifteen years, developing codes that are fast and still structured and readable. We discuss the vectorized basic linear algebra package and introduce a formalism using multi-linear algebra to explain and define formally the functions in the package, as well as MATLAB pagetime functions. We provide examples from computations of varying complexity, including the computation of normal vectors, volumes, and finite element methods. Benchmarking shows that we also get fast computations. Using the library, we can write codes that closely follow our mathematical thinking, making writing, following, reusing, and extending the code easier.
在 MATLAB 等脚本语言中编写高性能数值计算代码时,将大型 for 循环中的操作矢量化至关重要。否则,即使是处理中等规模的问题,代码也会变得太慢而无法使用。然而,在矢量化的过程中,代码往往会失去原有的结构,可读性也会降低。这在有限元实现中尤为明显,尽管有限元方法本身就是结构化的。解决这一问题的基本方法是将代码的矢量化部分与数学部分分离,这可以通过在已经是矢量化代码的基本线性代数子程序之上构建代码来轻松实现,过去 15 年中的一系列论文都采用了这一思路,开发出了既快速又具有结构性和可读性的代码。我们讨论了矢量化基本线性代数软件包,并介绍了一种使用多线性代数的形式主义,以正式解释和定义软件包中的函数以及 MATLAB 分页函数。我们提供了不同复杂度的计算实例,包括法向量、体积和有限元素方法的计算。基准测试表明,我们也能获得快速计算。使用该库,我们可以编写紧跟数学思维的代码,使代码的编写、跟踪、重用和扩展变得更加容易。
{"title":"On a vectorized basic linear algebra package for prototyping codes in MATLAB","authors":"Alexej Moskovka, Talal Rahman, Jan Valdman, Jon Eivind Vatne","doi":"arxiv-2404.16039","DOIUrl":"https://doi.org/arxiv-2404.16039","url":null,"abstract":"When writing high-performance code for numerical computation in a scripting\u0000language like MATLAB, it is crucial to have the operations in a large for-loop\u0000vectorized. If not, the code becomes too slow to use, even for a moderately\u0000large problem. However, in the process of vectorizing, the code often loses its\u0000original structure and becomes less readable. This is particularly true in the\u0000case of a finite element implementation, even though finite element methods are\u0000inherently structured. A basic remedy to this is the separation of the\u0000vectorization part from the mathematics part of the code, which is easily\u0000achieved through building the code on top of the basic linear algebra\u0000subprograms that are already vectorized codes, an idea that has been used in a\u0000series of papers over the last fifteen years, developing codes that are fast\u0000and still structured and readable. We discuss the vectorized basic linear\u0000algebra package and introduce a formalism using multi-linear algebra to explain\u0000and define formally the functions in the package, as well as MATLAB pagetime\u0000functions. We provide examples from computations of varying complexity,\u0000including the computation of normal vectors, volumes, and finite element\u0000methods. Benchmarking shows that we also get fast computations. Using the\u0000library, we can write codes that closely follow our mathematical thinking,\u0000making writing, following, reusing, and extending the code easier.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A learning rate scheduler is a predefined set of instructions for varying search stepsizes during model training processes. This paper introduces a new logarithmic method using harsh restarting of step sizes through stochastic gradient descent. Cyclical log annealing implements the restart pattern more aggressively to maybe allow the usage of more greedy algorithms on the online convex optimization framework. The algorithm was tested on the CIFAR-10 image datasets, and seemed to perform analogously with cosine annealing on large transformer-enhanced residual neural networks. Future experiments would involve testing the scheduler in generative adversarial networks and finding the best parameters for the scheduler with more experiments.
{"title":"Cyclical Log Annealing as a Learning Rate Scheduler","authors":"Philip Naveen","doi":"arxiv-2403.14685","DOIUrl":"https://doi.org/arxiv-2403.14685","url":null,"abstract":"A learning rate scheduler is a predefined set of instructions for varying\u0000search stepsizes during model training processes. This paper introduces a new\u0000logarithmic method using harsh restarting of step sizes through stochastic\u0000gradient descent. Cyclical log annealing implements the restart pattern more\u0000aggressively to maybe allow the usage of more greedy algorithms on the online\u0000convex optimization framework. The algorithm was tested on the CIFAR-10 image\u0000datasets, and seemed to perform analogously with cosine annealing on large\u0000transformer-enhanced residual neural networks. Future experiments would involve\u0000testing the scheduler in generative adversarial networks and finding the best\u0000parameters for the scheduler with more experiments.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Calculating the inverse of $k$-diagonal circulant matrices and cyclic banded matrices is a more challenging problem than calculating their determinants. Algorithms that directly involve or specify linear or quadratic complexity for the inverses of these two types of matrices are rare. This paper presents two fast algorithms that can compute the complexity of a $k$-diagonal circulant matrix within complexity $O(k^3 log n+k^4)+kn$, and for $k$-diagonal cyclic banded matrices it is $O(k^3 n+k^5)+kn^2$. Since $k$ is generally much smaller than $n$, the cost of these two algorithms can be approximated as $kn$ and $kn^2$.
{"title":"Efficient Calculations for k-diagonal Circulant Matrices and Cyclic Banded Matrices","authors":"Chen Wang, Chao Wang","doi":"arxiv-2403.05048","DOIUrl":"https://doi.org/arxiv-2403.05048","url":null,"abstract":"Calculating the inverse of $k$-diagonal circulant matrices and cyclic banded\u0000matrices is a more challenging problem than calculating their determinants.\u0000Algorithms that directly involve or specify linear or quadratic complexity for\u0000the inverses of these two types of matrices are rare. This paper presents two\u0000fast algorithms that can compute the complexity of a $k$-diagonal circulant\u0000matrix within complexity $O(k^3 log n+k^4)+kn$, and for $k$-diagonal cyclic\u0000banded matrices it is $O(k^3 n+k^5)+kn^2$. Since $k$ is generally much smaller\u0000than $n$, the cost of these two algorithms can be approximated as $kn$ and\u0000$kn^2$.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140097483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mustafa Hajij, Mathilde Papillon, Florian Frantzen, Jens Agerberg, Ibrahem AlJabea, Ruben Ballester, Claudio Battiloro, Guillermo Bernárdez, Tolga Birdal, Aiden Brent, Peter Chin, Sergio Escalera, Simone Fiorellino, Odin Hoff Gardaa, Gurusankar Gopalakrishnan, Devendra Govil, Josef Hoppe, Maneel Reddy Karri, Jude Khouja, Manuel Lecha, Neal Livesay, Jan Meißner, Soham Mukherjee, Alexander Nikitin, Theodore Papamarkou, Jaro Prílepok, Karthikeyan Natesan Ramamurthy, Paul Rosen, Aldo Guzmán-Sáenz, Alessandro Salatiello, Shreyas N. Samaga, Simone Scardapane, Michael T. Schaub, Luca Scofano, Indro Spinelli, Lev Telyatnikov, Quang Truong, Robin Walters, Maosheng Yang, Olga Zaghen, Ghada Zamzmi, Ali Zia, Nina Miolane
We introduce topox, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. topox consists of three packages: toponetx facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; topoembedx provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; topomodelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of topox is available under MIT license at https://github.com/pyt-team.
{"title":"TopoX: A Suite of Python Packages for Machine Learning on Topological Domains","authors":"Mustafa Hajij, Mathilde Papillon, Florian Frantzen, Jens Agerberg, Ibrahem AlJabea, Ruben Ballester, Claudio Battiloro, Guillermo Bernárdez, Tolga Birdal, Aiden Brent, Peter Chin, Sergio Escalera, Simone Fiorellino, Odin Hoff Gardaa, Gurusankar Gopalakrishnan, Devendra Govil, Josef Hoppe, Maneel Reddy Karri, Jude Khouja, Manuel Lecha, Neal Livesay, Jan Meißner, Soham Mukherjee, Alexander Nikitin, Theodore Papamarkou, Jaro Prílepok, Karthikeyan Natesan Ramamurthy, Paul Rosen, Aldo Guzmán-Sáenz, Alessandro Salatiello, Shreyas N. Samaga, Simone Scardapane, Michael T. Schaub, Luca Scofano, Indro Spinelli, Lev Telyatnikov, Quang Truong, Robin Walters, Maosheng Yang, Olga Zaghen, Ghada Zamzmi, Ali Zia, Nina Miolane","doi":"arxiv-2402.02441","DOIUrl":"https://doi.org/arxiv-2402.02441","url":null,"abstract":"We introduce topox, a Python software suite that provides reliable and\u0000user-friendly building blocks for computing and machine learning on topological\u0000domains that extend graphs: hypergraphs, simplicial, cellular, path and\u0000combinatorial complexes. topox consists of three packages: toponetx facilitates\u0000constructing and computing on these domains, including working with nodes,\u0000edges and higher-order cells; topoembedx provides methods to embed topological\u0000domains into vector spaces, akin to popular graph-based embedding algorithms\u0000such as node2vec; topomodelx is built on top of PyTorch and offers a\u0000comprehensive toolbox of higher-order message passing functions for neural\u0000networks on topological domains. The extensively documented and unit-tested\u0000source code of topox is available under MIT license at\u0000https://github.com/pyt-team.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The covariance matrix adaptation evolution strategy (CMA-ES) has been highly effective in black-box continuous optimization, as demonstrated by its success in both benchmark problems and various real-world applications. To address the need for an accessible yet potent tool in this domain, we developed cmaes, a simple and practical Python library for CMA-ES. cmaes is characterized by its simplicity, offering intuitive use and high code readability. This makes it suitable for quickly using CMA-ES, as well as for educational purposes and seamless integration into other libraries. Despite its simplistic design, cmaes maintains enhanced functionality. It incorporates recent advancements in CMA-ES, such as learning rate adaptation for challenging scenarios, transfer learning, and mixed-integer optimization capabilities. These advanced features are accessible through a user-friendly API, ensuring that cmaes can be easily adopted in practical applications. We regard cmaes as the first choice for a Python CMA-ES library among practitioners. The software is available under the MIT license at https://github.com/CyberAgentAILab/cmaes.
{"title":"cmaes : A Simple yet Practical Python Library for CMA-ES","authors":"Masahiro Nomura, Masashi Shibata","doi":"arxiv-2402.01373","DOIUrl":"https://doi.org/arxiv-2402.01373","url":null,"abstract":"The covariance matrix adaptation evolution strategy (CMA-ES) has been highly\u0000effective in black-box continuous optimization, as demonstrated by its success\u0000in both benchmark problems and various real-world applications. To address the\u0000need for an accessible yet potent tool in this domain, we developed cmaes, a\u0000simple and practical Python library for CMA-ES. cmaes is characterized by its\u0000simplicity, offering intuitive use and high code readability. This makes it\u0000suitable for quickly using CMA-ES, as well as for educational purposes and\u0000seamless integration into other libraries. Despite its simplistic design, cmaes\u0000maintains enhanced functionality. It incorporates recent advancements in\u0000CMA-ES, such as learning rate adaptation for challenging scenarios, transfer\u0000learning, and mixed-integer optimization capabilities. These advanced features\u0000are accessible through a user-friendly API, ensuring that cmaes can be easily\u0000adopted in practical applications. We regard cmaes as the first choice for a\u0000Python CMA-ES library among practitioners. The software is available under the\u0000MIT license at https://github.com/CyberAgentAILab/cmaes.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"178 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine learning technologies because they are interesting for numerous methods. The field of machine learning holds the potential for substantial advancements across various domains, as exemplified by recent breakthroughs in Large Language Models (LLMs). However, despite the growing interest, persistent concerns include issues related to reproducibility and energy consumption. Reproducibility is crucial for robust scientific inquiry and explainability, while energy efficiency underscores the imperative to conserve finite global resources. This study delves into the investigation of whether the leading Pseudo-Random Number Generators (PRNGs) employed in machine learning languages, libraries, and frameworks uphold statistical quality and numerical reproducibility when compared to the original C implementation of the respective PRNG algorithms. Additionally, we aim to evaluate the time efficiency and energy consumption of various implementations. Our experiments encompass Python, NumPy, TensorFlow, and PyTorch, utilizing the Mersenne Twister, PCG, and Philox algorithms. Remarkably, we verified that the temporal performance of machine learning technologies closely aligns with that of C-based implementations, with instances of achieving even superior performances. On the other hand, it is noteworthy that ML technologies consumed only 10% more energy than their C-implementation counterparts. However, while statistical quality was found to be comparable, achieving numerical reproducibility across different platforms for identical seeds and algorithms was not achieved.
伪随机数发生器(PRNG)在机器学习技术中无处不在,因为它们对许多方法都很有趣。机器学习领域有可能在各个领域取得重大进展,最近在大型语言模型(LLMs)方面取得的突破就是例证。可重复性对于科学探索的稳健性和可解释性至关重要,而能效则强调了保护有限的全球资源的必要性。本研究深入探讨了机器学习语言、程序库和框架中使用的主要伪随机数发生器(PRNG)与相关 PRNG 算法的原始 C 语言实现相比,是否能够保证统计质量和数值可重复性。此外,我们还旨在评估各种实现的时间效率和能耗。我们的实验涵盖 Python、NumPy、TensorFlow 和 PyTorch,使用了梅森孪生、PCG 和 Philox 算法。值得注意的是,我们验证了机器学习技术的时间性能与基于C的实现非常接近,甚至有实现更优性能的实例。另一方面,值得注意的是,机器学习技术的能耗仅比基于 C 实现的技术高 10%。不过,虽然统计质量不相上下,但在不同平台上使用相同的种子和算法却无法实现数值上的可重复性。
{"title":"Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations","authors":"Benjamin Antunes, David R. C Hill","doi":"arxiv-2401.17345","DOIUrl":"https://doi.org/arxiv-2401.17345","url":null,"abstract":"Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine\u0000learning technologies because they are interesting for numerous methods. The\u0000field of machine learning holds the potential for substantial advancements\u0000across various domains, as exemplified by recent breakthroughs in Large\u0000Language Models (LLMs). However, despite the growing interest, persistent\u0000concerns include issues related to reproducibility and energy consumption.\u0000Reproducibility is crucial for robust scientific inquiry and explainability,\u0000while energy efficiency underscores the imperative to conserve finite global\u0000resources. This study delves into the investigation of whether the leading\u0000Pseudo-Random Number Generators (PRNGs) employed in machine learning languages,\u0000libraries, and frameworks uphold statistical quality and numerical\u0000reproducibility when compared to the original C implementation of the\u0000respective PRNG algorithms. Additionally, we aim to evaluate the time\u0000efficiency and energy consumption of various implementations. Our experiments\u0000encompass Python, NumPy, TensorFlow, and PyTorch, utilizing the Mersenne\u0000Twister, PCG, and Philox algorithms. Remarkably, we verified that the temporal\u0000performance of machine learning technologies closely aligns with that of\u0000C-based implementations, with instances of achieving even superior\u0000performances. On the other hand, it is noteworthy that ML technologies consumed\u0000only 10% more energy than their C-implementation counterparts. However, while\u0000statistical quality was found to be comparable, achieving numerical\u0000reproducibility across different platforms for identical seeds and algorithms\u0000was not achieved.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139657139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-validation is a widely used technique for assessing the performance of predictive models on unseen data. Many predictive models, such as Kernel-Based Partial Least-Squares (PLS) models, require the computation of $mathbf{X}^{mathbf{T}}mathbf{X}$ and $mathbf{X}^{mathbf{T}}mathbf{Y}$ using only training set samples from the input and output matrices, $mathbf{X}$ and $mathbf{Y}$, respectively. In this work, we present three algorithms that efficiently compute these matrices. The first one allows no column-wise preprocessing. The second one allows column-wise centering around the training set means. The third one allows column-wise centering and column-wise scaling around the training set means and standard deviations. Demonstrating correctness and superior computational complexity, they offer significant cross-validation speedup compared with straight-forward cross-validation and previous work on fast cross-validation - all without data leakage. Their suitability for parallelization is highlighted with an open-source Python implementation combining our algorithms with Improved Kernel PLS.
{"title":"Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $mathbf{X}^mathbf{T}mathbf{X}$ and $mathbf{X}^mathbf{T}mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments","authors":"Ole-Christian Galbo Engstrøm","doi":"arxiv-2401.13185","DOIUrl":"https://doi.org/arxiv-2401.13185","url":null,"abstract":"Cross-validation is a widely used technique for assessing the performance of\u0000predictive models on unseen data. Many predictive models, such as Kernel-Based\u0000Partial Least-Squares (PLS) models, require the computation of\u0000$mathbf{X}^{mathbf{T}}mathbf{X}$ and $mathbf{X}^{mathbf{T}}mathbf{Y}$\u0000using only training set samples from the input and output matrices,\u0000$mathbf{X}$ and $mathbf{Y}$, respectively. In this work, we present three\u0000algorithms that efficiently compute these matrices. The first one allows no\u0000column-wise preprocessing. The second one allows column-wise centering around\u0000the training set means. The third one allows column-wise centering and\u0000column-wise scaling around the training set means and standard deviations.\u0000Demonstrating correctness and superior computational complexity, they offer\u0000significant cross-validation speedup compared with straight-forward\u0000cross-validation and previous work on fast cross-validation - all without data\u0000leakage. Their suitability for parallelization is highlighted with an\u0000open-source Python implementation combining our algorithms with Improved Kernel\u0000PLS.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139559558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We examine a class of geometric theorems on cyclic 2n-gons. We prove that if we take n disjoint pairs of sides, each pair separated by an even number of polygon sides, then there is a linear combination of the angles between those sides which is constant. We present a formula for the linear combination, which provides a theorem statement in terms of those angles. We describe a program which uses this result to generate new geometry proof problems and their solutions.
我们研究了关于循环 2n 边形的一类几何定理。我们证明,如果取 n 对互不相交的边,每对边之间隔着偶数条多边形边,那么这些边之间的角的线性组合是常数。我们给出了线性组合的公式,并用这些角给出了定理说明。我们描述了一个程序,该程序利用这一结果生成新的几何证明问题及其解答。
{"title":"Theorem Discovery Amongst Cyclic Polygons","authors":"Philip ToddSaltire Software","doi":"arxiv-2401.13002","DOIUrl":"https://doi.org/arxiv-2401.13002","url":null,"abstract":"We examine a class of geometric theorems on cyclic 2n-gons. We prove that if\u0000we take n disjoint pairs of sides, each pair separated by an even number of\u0000polygon sides, then there is a linear combination of the angles between those\u0000sides which is constant. We present a formula for the linear combination, which\u0000provides a theorem statement in terms of those angles. We describe a program\u0000which uses this result to generate new geometry proof problems and their\u0000solutions.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139559561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neil Zhao, Emilee Brockner, Asia Winslow, Megan Seraydarian
The classic question of whether one should walk or run in the rain to remain the least wet has inspired a myriad of solutions ranging from physically performing test runs in raining conditions to mathematically modeling human movement through rain. This manuscript approaches the classical problem by simulating movement through rainfall using MATLAB. Our simulation was generalizable to include snowfall as well. An increase in walking speed resulted in a corresponding decrease in raindrop and snowflake collisions. When raindrops or snowflakes were given a horizontal movement vector due to wind, a local minimum in collisions was achieved when moving in parallel with the same horizontal speed as the raindrop; no local minimum was detected with antiparallel movement. In general, our simulation revealed that the faster one moves, the drier one remains.
{"title":"A Simulation of Optimal Dryness When Moving in the Rain or Snow Using MATLAB","authors":"Neil Zhao, Emilee Brockner, Asia Winslow, Megan Seraydarian","doi":"arxiv-2401.12023","DOIUrl":"https://doi.org/arxiv-2401.12023","url":null,"abstract":"The classic question of whether one should walk or run in the rain to remain\u0000the least wet has inspired a myriad of solutions ranging from physically\u0000performing test runs in raining conditions to mathematically modeling human\u0000movement through rain. This manuscript approaches the classical problem by\u0000simulating movement through rainfall using MATLAB. Our simulation was\u0000generalizable to include snowfall as well. An increase in walking speed\u0000resulted in a corresponding decrease in raindrop and snowflake collisions. When\u0000raindrops or snowflakes were given a horizontal movement vector due to wind, a\u0000local minimum in collisions was achieved when moving in parallel with the same\u0000horizontal speed as the raindrop; no local minimum was detected with\u0000antiparallel movement. In general, our simulation revealed that the faster one\u0000moves, the drier one remains.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139559994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}