In this paper, we give a stable and efficient method for fixing self-intersections and non-manifold parts in a given embedded simplicial complex. In addition, we show how symmetric properties can be used for further optimisation. We prove an initialisation criterion for computation of the outer hull of an embedded simplicial complex. To regularise the outer hull of the retriangulated surface, we present a method to remedy non-manifold edges and points. We also give a modification of the outer hull algorithm to determine chambers of complexes which gives rise to many new insights. All of these methods have applications in many areas, for example in 3D-printing, artistic realisations of 3D models or fixing errors introduced by scanning equipment applied for tomography. Implementations of the proposed algorithms are given in the computer algebra system GAP4. For verification of our methods, we use a data-set of highly self-intersecting symmetric icosahedra.
{"title":"A Framework for Self-Intersecting Surfaces (SOS): Symmetric Optimisation for Stability","authors":"Christian Amend, Tom Goertzen","doi":"arxiv-2312.02113","DOIUrl":"https://doi.org/arxiv-2312.02113","url":null,"abstract":"In this paper, we give a stable and efficient method for fixing\u0000self-intersections and non-manifold parts in a given embedded simplicial\u0000complex. In addition, we show how symmetric properties can be used for further\u0000optimisation. We prove an initialisation criterion for computation of the outer\u0000hull of an embedded simplicial complex. To regularise the outer hull of the\u0000retriangulated surface, we present a method to remedy non-manifold edges and\u0000points. We also give a modification of the outer hull algorithm to determine\u0000chambers of complexes which gives rise to many new insights. All of these\u0000methods have applications in many areas, for example in 3D-printing, artistic\u0000realisations of 3D models or fixing errors introduced by scanning equipment\u0000applied for tomography. Implementations of the proposed algorithms are given in\u0000the computer algebra system GAP4. For verification of our methods, we use a\u0000data-set of highly self-intersecting symmetric icosahedra.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"18 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the initialization of a supercomputer job, no useful calculations are performed. A high proportion of initialization time results in idle computing resources and less computational efficiency. Certain methods and algorithms combining jobs into groups are used to optimize scheduling of jobs with high initialization proportion. The article considers the influence of the scale ratio setting in algorithm for the job groups formation, on the performance metrics of the workload manager. The study was carried out on the developed by authors Aleabased workload manager model. The model makes it possible to conduct a large number of experiments in reasonable time without losing the accuracy of the simulation. We performed a series of experiments involving various characteristics of the workload. The article represents the results of a study of the scale ratio influence on efficiency metrics for different initialization time proportions and input workflows with varying intensity and homogeneity. The presented results allow the workload managers administrators to set a scale ratio that provides an appropriate balance with contradictory efficiency metrics.
{"title":"Scale Ratio Tuning of Group Based Job Scheduling in HPC Systems","authors":"Lyakhovets D. S., Baranov A. V., Telegin P. N","doi":"arxiv-2311.17889","DOIUrl":"https://doi.org/arxiv-2311.17889","url":null,"abstract":"During the initialization of a supercomputer job, no useful calculations are\u0000performed. A high proportion of initialization time results in idle computing\u0000resources and less computational efficiency. Certain methods and algorithms\u0000combining jobs into groups are used to optimize scheduling of jobs with high\u0000initialization proportion. The article considers the influence of the scale\u0000ratio setting in algorithm for the job groups formation, on the performance\u0000metrics of the workload manager. The study was carried out on the developed by\u0000authors Aleabased workload manager model. The model makes it possible to\u0000conduct a large number of experiments in reasonable time without losing the\u0000accuracy of the simulation. We performed a series of experiments involving\u0000various characteristics of the workload. The article represents the results of\u0000a study of the scale ratio influence on efficiency metrics for different\u0000initialization time proportions and input workflows with varying intensity and\u0000homogeneity. The presented results allow the workload managers administrators\u0000to set a scale ratio that provides an appropriate balance with contradictory\u0000efficiency metrics.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce Lineax, a library bringing linear solves and linear least-squares to the JAX+Equinox scientific computing ecosystem. Lineax uses general linear operators, and unifies linear solves and least-squares into a single, autodifferentiable API. Solvers and operators are user-extensible, without requiring the user to implement any custom derivative rules to get differentiability. Lineax is available at https://github.com/google/lineax.
{"title":"Lineax: unified linear solves and linear least-squares in JAX and Equinox","authors":"Jason Rader, Terry Lyons, Patrick Kidger","doi":"arxiv-2311.17283","DOIUrl":"https://doi.org/arxiv-2311.17283","url":null,"abstract":"We introduce Lineax, a library bringing linear solves and linear\u0000least-squares to the JAX+Equinox scientific computing ecosystem. Lineax uses\u0000general linear operators, and unifies linear solves and least-squares into a\u0000single, autodifferentiable API. Solvers and operators are user-extensible,\u0000without requiring the user to implement any custom derivative rules to get\u0000differentiability. Lineax is available at https://github.com/google/lineax.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we develop a high-precision satellite orbit determination model for satellites orbiting the Earth. Solving this model entails numerically integrating the differential equation of motion governing a two-body system, employing Fehlberg's formulation and the Runge-Kutta class of embedded integrators with adaptive stepsize control. Relevant primary perturbing forces included in this mathematical model are the full force gravitational field model, Earth's atmospheric drag, third body gravitational effects and solar radiation pressure. Development of the high-precision model required accounting for the perturbing influences of Earth radiation pressure, Earth tides and relativistic effects. The model is then implemented to obtain a high-fidelity Earth orbiting satellite propagator, namely the Satellite Ephemeris Determiner (SED), which is comparable to the popular High Precision Orbit Propagator (HPOP). The architecture of SED, the methodology employed, and the numerical results obtained are presented.
{"title":"Mathematical Modelling and a Numerical Solution for High Precision Satellite Ephemeris Determination","authors":"Aravind Gundakaram, Abhirath Sangala, Aditya Sai Ellendula, Prachi Kansal, Lanii Lakshitaa, Suchir Reddy Punuru, Nethra Naveen, Sanjitha Jaggumantri","doi":"arxiv-2311.15028","DOIUrl":"https://doi.org/arxiv-2311.15028","url":null,"abstract":"In this paper, we develop a high-precision satellite orbit determination\u0000model for satellites orbiting the Earth. Solving this model entails numerically\u0000integrating the differential equation of motion governing a two-body system,\u0000employing Fehlberg's formulation and the Runge-Kutta class of embedded\u0000integrators with adaptive stepsize control. Relevant primary perturbing forces\u0000included in this mathematical model are the full force gravitational field\u0000model, Earth's atmospheric drag, third body gravitational effects and solar\u0000radiation pressure. Development of the high-precision model required accounting\u0000for the perturbing influences of Earth radiation pressure, Earth tides and\u0000relativistic effects. The model is then implemented to obtain a high-fidelity\u0000Earth orbiting satellite propagator, namely the Satellite Ephemeris Determiner\u0000(SED), which is comparable to the popular High Precision Orbit Propagator\u0000(HPOP). The architecture of SED, the methodology employed, and the numerical\u0000results obtained are presented.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"14 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Seyfi, Amin Banitalebi-Dehkordi, Zirui Zhou, Yong Zhang
Combinatorial optimization finds an optimal solution within a discrete set of variables and constraints. The field has seen tremendous progress both in research and industry. With the success of deep learning in the past decade, a recent trend in combinatorial optimization has been to improve state-of-the-art combinatorial optimization solvers by replacing key heuristic components with machine learning (ML) models. In this paper, we investigate two essential aspects of machine learning algorithms for combinatorial optimization: temporal characteristics and attention. We argue that for the task of variable selection in the branch-and-bound (B&B) algorithm, incorporating the temporal information as well as the bipartite graph attention improves the solver's performance. We support our claims with intuitions and numerical results over several standard datasets used in the literature and competitions. Code is available at: https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=047c6cf2-8463-40d7-b92f-7b2ca998e935
{"title":"Exact Combinatorial Optimization with Temporo-Attentional Graph Neural Networks","authors":"Mehdi Seyfi, Amin Banitalebi-Dehkordi, Zirui Zhou, Yong Zhang","doi":"arxiv-2311.13843","DOIUrl":"https://doi.org/arxiv-2311.13843","url":null,"abstract":"Combinatorial optimization finds an optimal solution within a discrete set of\u0000variables and constraints. The field has seen tremendous progress both in\u0000research and industry. With the success of deep learning in the past decade, a\u0000recent trend in combinatorial optimization has been to improve state-of-the-art\u0000combinatorial optimization solvers by replacing key heuristic components with\u0000machine learning (ML) models. In this paper, we investigate two essential\u0000aspects of machine learning algorithms for combinatorial optimization: temporal\u0000characteristics and attention. We argue that for the task of variable selection\u0000in the branch-and-bound (B&B) algorithm, incorporating the temporal information\u0000as well as the bipartite graph attention improves the solver's performance. We\u0000support our claims with intuitions and numerical results over several standard\u0000datasets used in the literature and competitions. Code is available at:\u0000https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=047c6cf2-8463-40d7-b92f-7b2ca998e935","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saeedeh Akbari Rokn Abadi, Melika Honarmand, Ali Hajialinaghi, Somayyeh Koohi
Background: Sequence comparison is essential in bioinformatics, serving various purposes such as taxonomy, functional inference, and drug discovery. The traditional method of aligning sequences for comparison is time-consuming, especially with large datasets. To overcome this, alignment-free methods have emerged as an alternative approach, prioritizing comparison scores over alignment itself. These methods directly compare sequences without the need for alignment. However, accurately representing the relationships between sequences is a significant challenge in the design of these tools. Methods:One of the alignment-free comparison approaches utilizes the frequency of fixed-length substrings, known as K-mers, which serves as the foundation for many sequence comparison methods. However, a challenge arises in these methods when increasing the length of the substring (K), as it leads to an exponential growth in the number of possible states. In this work, we explore the PC-mer method, which utilizes a more limited set of words that experience slower growth 2^k instead of 4^k compared to K. We conducted a comparison of sequences and evaluated how the reduced input vector size influenced the performance of the PC-mer method. Results: For the evaluation, we selected the Clustal Omega method as our reference approach, alongside three alignment-free methods: kmacs, FFP, and alfpy (word count). These methods also leverage the frequency of K-mers. We applied all five methods to 9 datasets for comprehensive analysis. The results were compared using phylogenetic trees and metrics such as Robinson-Foulds and normalized quartet distance (nQD). Conclusion: Our findings indicate that, unlike reducing the input features in other alignment-independent methods, the PC-mer method exhibits competitive performance when compared to the aforementioned methods especially when input sequences are very varied.
{"title":"An Assessment of PC-mer's Performance in Alignment-Free Phylogenetic Tree Construction","authors":"Saeedeh Akbari Rokn Abadi, Melika Honarmand, Ali Hajialinaghi, Somayyeh Koohi","doi":"arxiv-2311.12898","DOIUrl":"https://doi.org/arxiv-2311.12898","url":null,"abstract":"Background: Sequence comparison is essential in bioinformatics, serving\u0000various purposes such as taxonomy, functional inference, and drug discovery.\u0000The traditional method of aligning sequences for comparison is time-consuming,\u0000especially with large datasets. To overcome this, alignment-free methods have\u0000emerged as an alternative approach, prioritizing comparison scores over\u0000alignment itself. These methods directly compare sequences without the need for\u0000alignment. However, accurately representing the relationships between sequences\u0000is a significant challenge in the design of these tools. Methods:One of the\u0000alignment-free comparison approaches utilizes the frequency of fixed-length\u0000substrings, known as K-mers, which serves as the foundation for many sequence\u0000comparison methods. However, a challenge arises in these methods when\u0000increasing the length of the substring (K), as it leads to an exponential\u0000growth in the number of possible states. In this work, we explore the PC-mer\u0000method, which utilizes a more limited set of words that experience slower\u0000growth 2^k instead of 4^k compared to K. We conducted a comparison of sequences\u0000and evaluated how the reduced input vector size influenced the performance of\u0000the PC-mer method. Results: For the evaluation, we selected the Clustal Omega\u0000method as our reference approach, alongside three alignment-free methods:\u0000kmacs, FFP, and alfpy (word count). These methods also leverage the frequency\u0000of K-mers. We applied all five methods to 9 datasets for comprehensive\u0000analysis. The results were compared using phylogenetic trees and metrics such\u0000as Robinson-Foulds and normalized quartet distance (nQD). Conclusion: Our\u0000findings indicate that, unlike reducing the input features in other\u0000alignment-independent methods, the PC-mer method exhibits competitive\u0000performance when compared to the aforementioned methods especially when input\u0000sequences are very varied.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sara Faghih-Naini, Vadym Aizinger, Sebastian Kuckuk, Richard Angersbach, Harald Köstler
Heterogeneous computing and exploiting integrated CPU-GPU architectures has become a clear current trend since the flattening of Moore's Law. In this work, we propose a numerical and algorithmic re-design of a p-adaptive quadrature-free discontinuous Galerkin method (DG) for the shallow water equations (SWE). Our new approach separates the computations of the non-adaptive (lower-order) and adaptive (higher-order) parts of the discretization form each other. Thereby, we can overlap computations of the lower-order and the higher-order DG solution components. Furthermore, we investigate execution times of main computational kernels and use automatic code generation to optimize their distribution between the CPU and GPU. Several setups, including a prototype of a tsunami simulation in a tide-driven flow scenario, are investigated, and the results show that significant performance improvements can be achieved in suitable setups.
{"title":"p-adaptive discontinuous Galerkin method for the shallow water equations on heterogeneous computing architectures","authors":"Sara Faghih-Naini, Vadym Aizinger, Sebastian Kuckuk, Richard Angersbach, Harald Köstler","doi":"arxiv-2311.11348","DOIUrl":"https://doi.org/arxiv-2311.11348","url":null,"abstract":"Heterogeneous computing and exploiting integrated CPU-GPU architectures has\u0000become a clear current trend since the flattening of Moore's Law. In this work,\u0000we propose a numerical and algorithmic re-design of a p-adaptive\u0000quadrature-free discontinuous Galerkin method (DG) for the shallow water\u0000equations (SWE). Our new approach separates the computations of the\u0000non-adaptive (lower-order) and adaptive (higher-order) parts of the\u0000discretization form each other. Thereby, we can overlap computations of the\u0000lower-order and the higher-order DG solution components. Furthermore, we\u0000investigate execution times of main computational kernels and use automatic\u0000code generation to optimize their distribution between the CPU and GPU. Several\u0000setups, including a prototype of a tsunami simulation in a tide-driven flow\u0000scenario, are investigated, and the results show that significant performance\u0000improvements can be achieved in suitable setups.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert van de Geijn, Maggie Myers, RuQing G. Xu, Devin Matthews
We apply the FLAME methodology to derive algorithms hand in hand with their proofs of correctness for the computation of the $ L T L^T $ decomposition (with and without pivoting) of a skew-symmetric matrix. The approach yields known as well as new algorithms, presented using the FLAME notation. A number of BLAS-like primitives are exposed at the core of blocked algorithms that can attain high performance. The insights can be easily extended to yield algorithms for computing the $ L T L^T $ decomposition of a symmetric matrix.
我们应用FLAME方法推导了斜对称矩阵的$ L T L^T $分解(有或没有旋转)计算的算法及其正确性证明。该方法产生了已知的以及使用FLAME符号表示的新算法。许多类似blas的原语暴露在阻塞算法的核心,可以获得高性能。这些见解可以很容易地扩展到计算对称矩阵的L^T分解的yield算法。
{"title":"Deriving Algorithms for Triangular Tridiagonalization a (Skew-)Symmetric Matrix","authors":"Robert van de Geijn, Maggie Myers, RuQing G. Xu, Devin Matthews","doi":"arxiv-2311.10700","DOIUrl":"https://doi.org/arxiv-2311.10700","url":null,"abstract":"We apply the FLAME methodology to derive algorithms hand in hand with their\u0000proofs of correctness for the computation of the $ L T L^T $ decomposition\u0000(with and without pivoting) of a skew-symmetric matrix. The approach yields\u0000known as well as new algorithms, presented using the FLAME notation. A number\u0000of BLAS-like primitives are exposed at the core of blocked algorithms that can\u0000attain high performance. The insights can be easily extended to yield\u0000algorithms for computing the $ L T L^T $ decomposition of a symmetric matrix.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"12 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexis Toumi, Richie Yeung, Boldizsár Poór, Giovanni de Felice
DisCoPy is a Python toolkit for computing with monoidal categories. It comes with two flexible data structures for string diagrams: the first one for planar monoidal categories based on lists of layers, the second one for symmetric monoidal categories based on cospans of hypergraphs. Algorithms for functor application then allow to translate string diagrams into code for numerical computation, be it differentiable, probabilistic or quantum. This report gives an overview of the library and the new developments released in its version 1.0. In particular, we showcase the implementation of diagram equality for a large fragment of the hierarchy of graphical languages for monoidal categories, as well as a new syntax for defining string diagrams as Python functions.
{"title":"DisCoPy: the Hierarchy of Graphical Languages in Python","authors":"Alexis Toumi, Richie Yeung, Boldizsár Poór, Giovanni de Felice","doi":"arxiv-2311.10608","DOIUrl":"https://doi.org/arxiv-2311.10608","url":null,"abstract":"DisCoPy is a Python toolkit for computing with monoidal categories. It comes\u0000with two flexible data structures for string diagrams: the first one for planar\u0000monoidal categories based on lists of layers, the second one for symmetric\u0000monoidal categories based on cospans of hypergraphs. Algorithms for functor\u0000application then allow to translate string diagrams into code for numerical\u0000computation, be it differentiable, probabilistic or quantum. This report gives\u0000an overview of the library and the new developments released in its version\u00001.0. In particular, we showcase the implementation of diagram equality for a\u0000large fragment of the hierarchy of graphical languages for monoidal categories,\u0000as well as a new syntax for defining string diagrams as Python functions.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We demonstrate a multiplication method based on numbers represented as set of polynomial radix 2 indices stored as an integer list. The 'polynomial integer index multiplication' method is a set of algorithms implemented in python code. We demonstrate the method to be faster than both the Number Theoretic Transform (NTT) and Karatsuba for multiplication within a certain bit range. Also implemented in python code for comparison purposes with the polynomial radix 2 integer method. We demonstrate that it is possible to express any integer or real number as a list of integer indices, representing a finite series in base two. The finite series of integer index representation of a number can then be stored and distributed across multiple CPUs / GPUs. We show that operations of addition and multiplication can be applied as two's complement additions operating on the index integer representations and can be fully distributed across a given CPU / GPU architecture. We demonstrate fully distributed arithmetic operations such that the 'polynomial integer index multiplication' method overcomes the current limitation of parallel multiplication methods. Ie, the need to share common core memory and common disk for the calculation of results and intermediate results.
{"title":"Fast multiplication by two's complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation","authors":"Mark Stocks","doi":"arxiv-2311.09922","DOIUrl":"https://doi.org/arxiv-2311.09922","url":null,"abstract":"We demonstrate a multiplication method based on numbers represented as set of\u0000polynomial radix 2 indices stored as an integer list. The 'polynomial integer\u0000index multiplication' method is a set of algorithms implemented in python code.\u0000We demonstrate the method to be faster than both the Number Theoretic Transform\u0000(NTT) and Karatsuba for multiplication within a certain bit range. Also\u0000implemented in python code for comparison purposes with the polynomial radix 2\u0000integer method. We demonstrate that it is possible to express any integer or\u0000real number as a list of integer indices, representing a finite series in base\u0000two. The finite series of integer index representation of a number can then be\u0000stored and distributed across multiple CPUs / GPUs. We show that operations of\u0000addition and multiplication can be applied as two's complement additions\u0000operating on the index integer representations and can be fully distributed\u0000across a given CPU / GPU architecture. We demonstrate fully distributed\u0000arithmetic operations such that the 'polynomial integer index multiplication'\u0000method overcomes the current limitation of parallel multiplication methods. Ie,\u0000the need to share common core memory and common disk for the calculation of\u0000results and intermediate results.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}