Kesen Wang, Maicon J. Karling, Reinaldo B. Arellano-Valle, Marc G. Genton
The unified skew-t (SUT) is a flexible parametric multivariate distribution that accounts for skewness and heavy tails in the data. A few of its properties can be found scattered in the literature or in a parameterization that does not follow the original one for unified skew-normal (SUN) distributions, yet a systematic study is lacking. In this work, explicit properties of the multivariate SUT distribution are presented, such as its stochastic representations, moments, SUN-scale mixture representation, linear transformation, additivity, marginal distribution, canonical form, quadratic form, conditional distribution, change of latent dimensions, Mardia measures of multivariate skewness and kurtosis, and non-identifiability issue. These results are given in a parametrization that reduces to the original SUN distribution as a sub-model, hence facilitating the use of the SUT for applications. Several models based on the SUT distribution are provided for illustration.
{"title":"Multivariate Unified Skew-t Distributions And Their Properties","authors":"Kesen Wang, Maicon J. Karling, Reinaldo B. Arellano-Valle, Marc G. Genton","doi":"arxiv-2311.18294","DOIUrl":"https://doi.org/arxiv-2311.18294","url":null,"abstract":"The unified skew-t (SUT) is a flexible parametric multivariate distribution\u0000that accounts for skewness and heavy tails in the data. A few of its properties\u0000can be found scattered in the literature or in a parameterization that does not\u0000follow the original one for unified skew-normal (SUN) distributions, yet a\u0000systematic study is lacking. In this work, explicit properties of the\u0000multivariate SUT distribution are presented, such as its stochastic\u0000representations, moments, SUN-scale mixture representation, linear\u0000transformation, additivity, marginal distribution, canonical form, quadratic\u0000form, conditional distribution, change of latent dimensions, Mardia measures of\u0000multivariate skewness and kurtosis, and non-identifiability issue. These\u0000results are given in a parametrization that reduces to the original SUN\u0000distribution as a sub-model, hence facilitating the use of the SUT for\u0000applications. Several models based on the SUT distribution are provided for\u0000illustration.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"90 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parameter identification problems in partial differential equations (PDEs) consist in determining one or more unknown functional parameters in a PDE. Here, the Bayesian nonparametric approach to such problems is considered. Focusing on the representative example of inferring the diffusivity function in an elliptic PDE from noisy observations of the PDE solution, the performance of Bayesian procedures based on Gaussian process priors is investigated. Recent asymptotic theoretical guarantees establishing posterior consistency and convergence rates are reviewed and expanded upon. An implementation of the associated posterior-based inference is provided, and illustrated via a numerical simulation study where two different discretisation strategies are devised. The reproducible code is available at: https://github.com/MattGiord.
{"title":"Bayesian nonparametric inference in PDE models: asymptotic theory and implementation","authors":"Matteo Giordano","doi":"arxiv-2311.18322","DOIUrl":"https://doi.org/arxiv-2311.18322","url":null,"abstract":"Parameter identification problems in partial differential equations (PDEs)\u0000consist in determining one or more unknown functional parameters in a PDE.\u0000Here, the Bayesian nonparametric approach to such problems is considered.\u0000Focusing on the representative example of inferring the diffusivity function in\u0000an elliptic PDE from noisy observations of the PDE solution, the performance of\u0000Bayesian procedures based on Gaussian process priors is investigated. Recent\u0000asymptotic theoretical guarantees establishing posterior consistency and\u0000convergence rates are reviewed and expanded upon. An implementation of the\u0000associated posterior-based inference is provided, and illustrated via a\u0000numerical simulation study where two different discretisation strategies are\u0000devised. The reproducible code is available at: https://github.com/MattGiord.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"84 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arthur Stéphanovitch, Eddie Aamari, Clément Levrard
We provide non asymptotic rates of convergence of the Wasserstein Generative Adversarial networks (WGAN) estimator. We build neural networks classes representing the generators and discriminators which yield a GAN that achieves the minimax optimal rate for estimating a certain probability measure $mu$ with support in $mathbb{R}^p$. The probability $mu$ is considered to be the push forward of the Lebesgue measure on the $d$-dimensional torus $mathbb{T}^d$ by a map $g^star:mathbb{T}^drightarrow mathbb{R}^p$ of smoothness $beta+1$. Measuring the error with the $gamma$-H"older Integral Probability Metric (IPM), we obtain up to logarithmic factors, the minimax optimal rate $O(n^{-frac{beta+gamma}{2beta +d}}vee n^{-frac{1}{2}})$ where $n$ is the sample size, $beta$ determines the smoothness of the target measure $mu$, $gamma$ is the smoothness of the IPM ($gamma=1$ is the Wasserstein case) and $dleq p$ is the intrinsic dimension of $mu$. In the process, we derive a sharp interpolation inequality between H"older IPMs. This novel result of theory of functions spaces generalizes classical interpolation inequalities to the case where the measures involved have densities on different manifolds.
{"title":"Wasserstein GANs are Minimax Optimal Distribution Estimators","authors":"Arthur Stéphanovitch, Eddie Aamari, Clément Levrard","doi":"arxiv-2311.18613","DOIUrl":"https://doi.org/arxiv-2311.18613","url":null,"abstract":"We provide non asymptotic rates of convergence of the Wasserstein Generative\u0000Adversarial networks (WGAN) estimator. We build neural networks classes\u0000representing the generators and discriminators which yield a GAN that achieves\u0000the minimax optimal rate for estimating a certain probability measure $mu$\u0000with support in $mathbb{R}^p$. The probability $mu$ is considered to be the\u0000push forward of the Lebesgue measure on the $d$-dimensional torus\u0000$mathbb{T}^d$ by a map $g^star:mathbb{T}^drightarrow mathbb{R}^p$ of\u0000smoothness $beta+1$. Measuring the error with the $gamma$-H\"older Integral\u0000Probability Metric (IPM), we obtain up to logarithmic factors, the minimax\u0000optimal rate $O(n^{-frac{beta+gamma}{2beta +d}}vee n^{-frac{1}{2}})$\u0000where $n$ is the sample size, $beta$ determines the smoothness of the target\u0000measure $mu$, $gamma$ is the smoothness of the IPM ($gamma=1$ is the\u0000Wasserstein case) and $dleq p$ is the intrinsic dimension of $mu$. In the\u0000process, we derive a sharp interpolation inequality between H\"older IPMs. This\u0000novel result of theory of functions spaces generalizes classical interpolation\u0000inequalities to the case where the measures involved have densities on\u0000different manifolds.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"90 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper examines eight measures of skewness and Mardia measure of kurtosis for skew-elliptical distributions. Multivariate measures of skewness considered include Mardia, Malkovich-Afifi, Isogai, Song, Balakrishnan-Brito-Quiroz, M$acute{o}$ri, Rohatgi and Sz$acute{e}$kely, Kollo and Srivastava measures. We first study the canonical form of skew-elliptical distributions, and then derive exact expressions of all measures of skewness and kurtosis for the family of skew-elliptical distributions, except for Song's measure. Specifically, the formulas of these measures for skew normal, skew $t$, skew logistic, skew Laplace, skew Pearson type II and skew Pearson type VII distributions are obtained. Next, as in Malkovich and Afifi (1973), test statistics based on a random sample are constructed for illustrating the usefulness of the established results. In a Monte Carlo simulation study, different measures of skewness and kurtosis for $2$-dimensional skewed distributions are calculated and compared. Finally, real data is analyzed to demonstrate all the results.
{"title":"An analysis of multivariate measures of skewness and kurtosis of skew-elliptical distributions","authors":"Baishuai Zuo, Narayanaswamy Balakrishnan, Chuancun Yin","doi":"arxiv-2311.18176","DOIUrl":"https://doi.org/arxiv-2311.18176","url":null,"abstract":"This paper examines eight measures of skewness and Mardia measure of kurtosis\u0000for skew-elliptical distributions. Multivariate measures of skewness considered\u0000include Mardia, Malkovich-Afifi, Isogai, Song, Balakrishnan-Brito-Quiroz,\u0000M$acute{o}$ri, Rohatgi and Sz$acute{e}$kely, Kollo and Srivastava measures.\u0000We first study the canonical form of skew-elliptical distributions, and then\u0000derive exact expressions of all measures of skewness and kurtosis for the\u0000family of skew-elliptical distributions, except for Song's measure.\u0000Specifically, the formulas of these measures for skew normal, skew $t$, skew\u0000logistic, skew Laplace, skew Pearson type II and skew Pearson type VII\u0000distributions are obtained. Next, as in Malkovich and Afifi (1973), test\u0000statistics based on a random sample are constructed for illustrating the\u0000usefulness of the established results. In a Monte Carlo simulation study,\u0000different measures of skewness and kurtosis for $2$-dimensional skewed\u0000distributions are calculated and compared. Finally, real data is analyzed to\u0000demonstrate all the results.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"92 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing statistical methods for compositional data analysis are inadequate for many modern applications for two reasons. First, modern compositional datasets, for example in microbiome research, display traits such as high-dimensionality and sparsity that are poorly modelled with traditional approaches. Second, assessing -- in an unbiased way -- how summary statistics of a composition (e.g., racial diversity) affect a response variable is not straightforward. In this work, we propose a framework based on hypothetical data perturbations that addresses both issues. Unlike existing methods for compositional data, we do not transform the data and instead use perturbations to define interpretable statistical functionals on the compositions themselves, which we call average perturbation effects. These average perturbation effects, which can be employed in many applications, naturally account for confounding that biases frequently used marginal dependence analyses. We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization and applying semiparametric estimation techniques. We analyze the proposed estimators empirically on simulated data and demonstrate advantages over existing techniques on US census and microbiome data. For all proposed estimators, we provide confidence intervals with uniform asymptotic coverage guarantees.
{"title":"Perturbation-based Analysis of Compositional Data","authors":"Anton Rask Lundborg, Niklas Pfister","doi":"arxiv-2311.18501","DOIUrl":"https://doi.org/arxiv-2311.18501","url":null,"abstract":"Existing statistical methods for compositional data analysis are inadequate\u0000for many modern applications for two reasons. First, modern compositional\u0000datasets, for example in microbiome research, display traits such as\u0000high-dimensionality and sparsity that are poorly modelled with traditional\u0000approaches. Second, assessing -- in an unbiased way -- how summary statistics\u0000of a composition (e.g., racial diversity) affect a response variable is not\u0000straightforward. In this work, we propose a framework based on hypothetical\u0000data perturbations that addresses both issues. Unlike existing methods for\u0000compositional data, we do not transform the data and instead use perturbations\u0000to define interpretable statistical functionals on the compositions themselves,\u0000which we call average perturbation effects. These average perturbation effects,\u0000which can be employed in many applications, naturally account for confounding\u0000that biases frequently used marginal dependence analyses. We show how average\u0000perturbation effects can be estimated efficiently by deriving a\u0000perturbation-dependent reparametrization and applying semiparametric estimation\u0000techniques. We analyze the proposed estimators empirically on simulated data\u0000and demonstrate advantages over existing techniques on US census and microbiome\u0000data. For all proposed estimators, we provide confidence intervals with uniform\u0000asymptotic coverage guarantees.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"86 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships by utilizing a mixture of linear regression sub-models. The identification of MLR is a fundamental problem, where most of the existing results focus on offline algorithms, rely on independent and identically distributed (i.i.d) data assumptions, and provide local convergence results only. This paper investigates the online identification and data clustering problems for two basic classes of MLRs, by introducing two corresponding new online identification algorithms based on the expectation-maximization (EM) principle. It is shown that both algorithms will converge globally without resorting to the traditional i.i.d data assumptions. The main challenge in our investigation lies in the fact that the gradient of the maximum likelihood function does not have a unique zero, and a key step in our analysis is to establish the stability of the corresponding differential equation in order to apply the celebrated Ljung's ODE method. It is also shown that the within-cluster error and the probability that the new data is categorized into the correct cluster are asymptotically the same as those in the case of known parameters. Finally, numerical simulations are provided to verify the effectiveness of our online algorithms.
{"title":"Global Convergence of Online Identification for Mixed Linear Regression","authors":"Yujing Liu, Zhixin Liu, Lei Guo","doi":"arxiv-2311.18506","DOIUrl":"https://doi.org/arxiv-2311.18506","url":null,"abstract":"Mixed linear regression (MLR) is a powerful model for characterizing\u0000nonlinear relationships by utilizing a mixture of linear regression sub-models.\u0000The identification of MLR is a fundamental problem, where most of the existing\u0000results focus on offline algorithms, rely on independent and identically\u0000distributed (i.i.d) data assumptions, and provide local convergence results\u0000only. This paper investigates the online identification and data clustering\u0000problems for two basic classes of MLRs, by introducing two corresponding new\u0000online identification algorithms based on the expectation-maximization (EM)\u0000principle. It is shown that both algorithms will converge globally without\u0000resorting to the traditional i.i.d data assumptions. The main challenge in our\u0000investigation lies in the fact that the gradient of the maximum likelihood\u0000function does not have a unique zero, and a key step in our analysis is to\u0000establish the stability of the corresponding differential equation in order to\u0000apply the celebrated Ljung's ODE method. It is also shown that the\u0000within-cluster error and the probability that the new data is categorized into\u0000the correct cluster are asymptotically the same as those in the case of known\u0000parameters. Finally, numerical simulations are provided to verify the\u0000effectiveness of our online algorithms.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"84 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper establishes the functional average as an important estimand for causal inference. The significance of the estimand lies in its robustness against traditional issues of confounding. We prove that this robustness holds even when the probability distribution of the outcome, conditional on treatment or some other vector of adjusting variables, differs almost arbitrarily from its counterfactual analogue. This paper also examines possible estimators of the functional average, including the sample mid-range, and proposes a new type of bootstrap for robust statistical inference: the Hoeffding bootstrap. After this, the paper explores a new class of variables, the $mathcal{U}$ class of variables, that simplifies the estimation of functional averages. This class of variables is also used to establish mean exchangeability in some cases and to provide the results of elementary statistical procedures, such as linear regression and the analysis of variance, with causal interpretations. Simulation evidence is provided. The methods of this paper are also applied to a National Health and Nutrition Survey data set to investigate the causal effect of exercise on the blood pressure of adult smokers.
{"title":"The Functional Average Treatment Effect","authors":"Shane Sparkes, Erika Garcia, Lu Zhang","doi":"arxiv-2312.00219","DOIUrl":"https://doi.org/arxiv-2312.00219","url":null,"abstract":"This paper establishes the functional average as an important estimand for\u0000causal inference. The significance of the estimand lies in its robustness\u0000against traditional issues of confounding. We prove that this robustness holds\u0000even when the probability distribution of the outcome, conditional on treatment\u0000or some other vector of adjusting variables, differs almost arbitrarily from\u0000its counterfactual analogue. This paper also examines possible estimators of\u0000the functional average, including the sample mid-range, and proposes a new type\u0000of bootstrap for robust statistical inference: the Hoeffding bootstrap. After\u0000this, the paper explores a new class of variables, the $mathcal{U}$ class of\u0000variables, that simplifies the estimation of functional averages. This class of\u0000variables is also used to establish mean exchangeability in some cases and to\u0000provide the results of elementary statistical procedures, such as linear\u0000regression and the analysis of variance, with causal interpretations.\u0000Simulation evidence is provided. The methods of this paper are also applied to\u0000a National Health and Nutrition Survey data set to investigate the causal\u0000effect of exercise on the blood pressure of adult smokers.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A powerful statistical interpolating concept, which we call emph{fully lifted} (fl), is introduced and presented while establishing a connection between bilinearly indexed random processes and their corresponding fully decoupled (linearly indexed) comparative alternatives. Despite on occasion very involved technical considerations, the final interpolating forms and their underlying relations admit rather elegant expressions that provide conceivably highly desirable and useful tool for further studying various different aspects of random processes and their applications. We also discuss the generality of the considered models and show that they encompass many well known random structures and optimization problems to which then the obtained results automatically apply.
{"title":"Fully lifted interpolating comparisons of bilinearly indexed random processes","authors":"Mihailo Stojnic","doi":"arxiv-2311.18092","DOIUrl":"https://doi.org/arxiv-2311.18092","url":null,"abstract":"A powerful statistical interpolating concept, which we call emph{fully\u0000lifted} (fl), is introduced and presented while establishing a connection\u0000between bilinearly indexed random processes and their corresponding fully\u0000decoupled (linearly indexed) comparative alternatives. Despite on occasion very\u0000involved technical considerations, the final interpolating forms and their\u0000underlying relations admit rather elegant expressions that provide conceivably\u0000highly desirable and useful tool for further studying various different aspects\u0000of random processes and their applications. We also discuss the generality of\u0000the considered models and show that they encompass many well known random\u0000structures and optimization problems to which then the obtained results\u0000automatically apply.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"91 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Treatment-covariate interaction tests are commonly applied by researchers to examine whether the treatment effect varies across patient subgroups defined by baseline characteristics. The objective of this study is to explore treatment-covariate interaction tests involving covariate-adaptive randomization. Without assuming a parametric data generation model, we investigate usual interaction tests and observe that they tend to be conservative: specifically, their limiting rejection probabilities under the null hypothesis do not exceed the nominal level and are typically strictly lower than it. To address this problem, we propose modifications to the usual tests to obtain corresponding exact tests. Moreover, we introduce a novel class of stratified-adjusted interaction tests that are simple, broadly applicable, and more powerful than the usual and modified tests. Our findings are relevant to two types of interaction tests: one involving stratification covariates and the other involving additional covariates that are not used for randomization.
{"title":"Interaction tests with covariate-adaptive randomization","authors":"Likun Zhang, Wei Ma","doi":"arxiv-2311.17445","DOIUrl":"https://doi.org/arxiv-2311.17445","url":null,"abstract":"Treatment-covariate interaction tests are commonly applied by researchers to\u0000examine whether the treatment effect varies across patient subgroups defined by\u0000baseline characteristics. The objective of this study is to explore\u0000treatment-covariate interaction tests involving covariate-adaptive\u0000randomization. Without assuming a parametric data generation model, we\u0000investigate usual interaction tests and observe that they tend to be\u0000conservative: specifically, their limiting rejection probabilities under the\u0000null hypothesis do not exceed the nominal level and are typically strictly\u0000lower than it. To address this problem, we propose modifications to the usual\u0000tests to obtain corresponding exact tests. Moreover, we introduce a novel class\u0000of stratified-adjusted interaction tests that are simple, broadly applicable,\u0000and more powerful than the usual and modified tests. Our findings are relevant\u0000to two types of interaction tests: one involving stratification covariates and\u0000the other involving additional covariates that are not used for randomization.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"92 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the statistical capacity of the classical binary perceptrons with general thresholds $kappa$. After recognizing the connection between the capacity and the bilinearly indexed (bli) random processes, we utilize a recent progress in studying such processes to characterize the capacity. In particular, we rely on emph{fully lifted} random duality theory (fl RDT) established in cite{Stojnicflrdt23} to create a general framework for studying the perceptrons' capacities. Successful underlying numerical evaluations are required for the framework (and ultimately the entire fl RDT machinery) to become fully practically operational. We present results obtained in that directions and uncover that the capacity characterizations are achieved on the second (first non-trivial) level of emph{stationarized} full lifting. The obtained results emph{exactly} match the replica symmetry breaking predictions obtained through statistical physics replica methods in cite{KraMez89}. Most notably, for the famous zero-threshold scenario, $kappa=0$, we uncover the well known $alphaapprox0.8330786$ scaled capacity.
{"title":"Binary perceptrons capacity via fully lifted random duality theory","authors":"Mihailo Stojnic","doi":"arxiv-2312.00073","DOIUrl":"https://doi.org/arxiv-2312.00073","url":null,"abstract":"We study the statistical capacity of the classical binary perceptrons with\u0000general thresholds $kappa$. After recognizing the connection between the\u0000capacity and the bilinearly indexed (bli) random processes, we utilize a recent\u0000progress in studying such processes to characterize the capacity. In\u0000particular, we rely on emph{fully lifted} random duality theory (fl RDT)\u0000established in cite{Stojnicflrdt23} to create a general framework for studying\u0000the perceptrons' capacities. Successful underlying numerical evaluations are\u0000required for the framework (and ultimately the entire fl RDT machinery) to\u0000become fully practically operational. We present results obtained in that\u0000directions and uncover that the capacity characterizations are achieved on the\u0000second (first non-trivial) level of emph{stationarized} full lifting. The\u0000obtained results emph{exactly} match the replica symmetry breaking predictions\u0000obtained through statistical physics replica methods in cite{KraMez89}. Most\u0000notably, for the famous zero-threshold scenario, $kappa=0$, we uncover the\u0000well known $alphaapprox0.8330786$ scaled capacity.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"83 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138521331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}