{"title":"Data-driven linear complexity low-rank approximation of general kernel matrices: A geometric approach","authors":"Difeng Cai, Edmond Chow, Yuanzhe Xi","doi":"10.1002/nla.2519","DOIUrl":null,"url":null,"abstract":"A general, <i>rectangular</i> kernel matrix may be defined as <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0001\" display=\"inline\" location=\"graphic/nla2519-math-0001.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<msub>\n<mrow>\n<mi>K</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n<mi>j</mi>\n</mrow>\n</msub>\n<mo>=</mo>\n<mi>κ</mi>\n<mo stretchy=\"false\">(</mo>\n<msub>\n<mrow>\n<mi>x</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n</mrow>\n</msub>\n<mo>,</mo>\n<msub>\n<mrow>\n<mi>y</mi>\n</mrow>\n<mrow>\n<mi>j</mi>\n</mrow>\n</msub>\n<mo stretchy=\"false\">)</mo>\n</mrow>\n$$ {K}_{ij}=\\kappa \\left({x}_i,{y}_j\\right) $$</annotation>\n</semantics></math> where <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0002\" display=\"inline\" location=\"graphic/nla2519-math-0002.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>κ</mi>\n<mo stretchy=\"false\">(</mo>\n<mi>x</mi>\n<mo>,</mo>\n<mi>y</mi>\n<mo stretchy=\"false\">)</mo>\n</mrow>\n$$ \\kappa \\left(x,y\\right) $$</annotation>\n</semantics></math> is a kernel function and where <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0003\" display=\"inline\" location=\"graphic/nla2519-math-0003.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>X</mi>\n<mo>=</mo>\n<msubsup>\n<mrow>\n<mo stretchy=\"false\">{</mo>\n<msub>\n<mrow>\n<mi>x</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n</mrow>\n</msub>\n<mo stretchy=\"false\">}</mo>\n</mrow>\n<mrow>\n<mi>i</mi>\n<mo>=</mo>\n<mn>1</mn>\n</mrow>\n<mrow>\n<mi>m</mi>\n</mrow>\n</msubsup>\n</mrow>\n$$ X={\\left\\{{x}_i\\right\\}}_{i=1}^m $$</annotation>\n</semantics></math> and <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0004\" display=\"inline\" location=\"graphic/nla2519-math-0004.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>Y</mi>\n<mo>=</mo>\n<msubsup>\n<mrow>\n<mo stretchy=\"false\">{</mo>\n<msub>\n<mrow>\n<mi>y</mi>\n</mrow>\n<mrow>\n<mi>i</mi>\n</mrow>\n</msub>\n<mo stretchy=\"false\">}</mo>\n</mrow>\n<mrow>\n<mi>i</mi>\n<mo>=</mo>\n<mn>1</mn>\n</mrow>\n<mrow>\n<mi>n</mi>\n</mrow>\n</msubsup>\n</mrow>\n$$ Y={\\left\\{{y}_i\\right\\}}_{i=1}^n $$</annotation>\n</semantics></math> are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0005\" display=\"inline\" location=\"graphic/nla2519-math-0005.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>X</mi>\n</mrow>\n$$ X $$</annotation>\n</semantics></math> and <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0006\" display=\"inline\" location=\"graphic/nla2519-math-0006.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>Y</mi>\n</mrow>\n$$ Y $$</annotation>\n</semantics></math> are large and are arbitrarily distributed, such as away from each other, “intermingled”, identical, and so forth. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0007\" display=\"inline\" location=\"graphic/nla2519-math-0007.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>X</mi>\n</mrow>\n$$ X $$</annotation>\n</semantics></math> corresponds to the training data and <math altimg=\"urn:x-wiley:nla:media:nla2519:nla2519-math-0008\" display=\"inline\" location=\"graphic/nla2519-math-0008.png\" overflow=\"scroll\">\n<semantics>\n<mrow>\n<mi>Y</mi>\n</mrow>\n$$ Y $$</annotation>\n</semantics></math> corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linearly with respect to the size of data for a fixed approximation rank. The main idea in this paper is to <i>geometrically</i> select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.","PeriodicalId":49731,"journal":{"name":"Numerical Linear Algebra with Applications","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Numerical Linear Algebra with Applications","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/nla.2519","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 2
Abstract
A general, rectangular kernel matrix may be defined as where is a kernel function and where and are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points and are large and are arbitrarily distributed, such as away from each other, “intermingled”, identical, and so forth. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where corresponds to the training data and corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linearly with respect to the size of data for a fixed approximation rank. The main idea in this paper is to geometrically select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.
期刊介绍:
Manuscripts submitted to Numerical Linear Algebra with Applications should include large-scale broad-interest applications in which challenging computational results are integral to the approach investigated and analysed. Manuscripts that, in the Editor’s view, do not satisfy these conditions will not be accepted for review.
Numerical Linear Algebra with Applications receives submissions in areas that address developing, analysing and applying linear algebra algorithms for solving problems arising in multilinear (tensor) algebra, in statistics, such as Markov Chains, as well as in deterministic and stochastic modelling of large-scale networks, algorithm development, performance analysis or related computational aspects.
Topics covered include: Standard and Generalized Conjugate Gradients, Multigrid and Other Iterative Methods; Preconditioning Methods; Direct Solution Methods; Numerical Methods for Eigenproblems; Newton-like Methods for Nonlinear Equations; Parallel and Vectorizable Algorithms in Numerical Linear Algebra; Application of Methods of Numerical Linear Algebra in Science, Engineering and Economics.