The strong L2-approximation of occupation time functionals is studied with respect to discrete observations of a d-dimensional cadlag process. Upper bounds on the error are obtained under weak assumptions, generalizing previous results in the literature considerably. The approach relies on regularity for the marginals of the process and applies also to non-Markovian processes, such as fractional Brownian motion. The results are used to approximate occupation times and local times. For Brownian motion, the upper bounds are shown to be sharp up to a log-factor.
{"title":"Approximation of occupation time functionals","authors":"R. Altmeyer","doi":"10.3150/21-BEJ1328","DOIUrl":"https://doi.org/10.3150/21-BEJ1328","url":null,"abstract":"The strong L2-approximation of occupation time functionals is studied with respect to discrete observations of a d-dimensional cadlag process. Upper bounds on the error are obtained under weak assumptions, generalizing previous results in the literature considerably. The approach relies on regularity for the marginals of the process and applies also to non-Markovian processes, such as fractional Brownian motion. The results are used to approximate occupation times and local times. For Brownian motion, the upper bounds are shown to be sharp up to a log-factor.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"27 1","pages":"2714-2739"},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44905777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Let $X_1,dots, X_n$ be independent and identically distributed random vectors in $mathbb{R}^d$. Suppose $mathbb{E} X_1=0$, $mathrm{Cov}(X_1)=I_d$, where $I_d$ is the $dtimes d$ identity matrix. Suppose further that there exist positive constants $t_0$ and $c_0$ such that $mathbb{E} e^{t_0|X_1|}leq c_0x)}{mathbb{P}(|Q^{1/2}Z|>x)}-1 right|leq C left( frac{1+x^5}{det{(Q^{1/2})}n}+frac{x^6}{n}right) quad text{for} dgeq 5 end{equation*} and begin{equation*} left| frac{mathbb{P}(|Q^{1/2}W|>x)}{mathbb{P}(|Q^{1/2}Z|>x)}-1 right|leq C left( frac{1+x^3}{det{(Q^{1/2})}n^{frac{d}{d+1}}}+frac{x^6}{n}right) quad text{for} 1leq dleq 4, end{equation*} where $varepsilon$ and $C$ are positive constants depending only on $d, t_0$, and $c_0$. This is a first extension of Cram'er-type moderate deviation to the multivariate setting with a faster convergence rate than $1/sqrt{n}$. The range of $x=o(n^{1/6})$ for the relative error to vanish and the dimension requirement $dgeq 5$ for the $1/n$ rate are both optimal. We prove our result using a new change of measure, a two-term Edgeworth expansion for the changed measure, and cancellation by symmetry for terms of the order $1/sqrt{n}$.
{"title":"Cramér-type moderate deviation for quadratic forms with a fast rate","authors":"Xiao Fang, Song Liu, Q. Shao","doi":"10.3150/22-bej1549","DOIUrl":"https://doi.org/10.3150/22-bej1549","url":null,"abstract":"Let $X_1,dots, X_n$ be independent and identically distributed random vectors in $mathbb{R}^d$. Suppose $mathbb{E} X_1=0$, $mathrm{Cov}(X_1)=I_d$, where $I_d$ is the $dtimes d$ identity matrix. Suppose further that there exist positive constants $t_0$ and $c_0$ such that $mathbb{E} e^{t_0|X_1|}leq c_0<infty$, where $|cdot|$ denotes the Euclidean norm. Let $W=frac{1}{sqrt{n}}sum_{i=1}^n X_i$ and let $Z$ be a $d$-dimensional standard normal random vector. Let $Q$ be a $dtimes d$ symmetric positive definite matrix whose largest eigenvalue is 1. We prove that for $0leq xleq varepsilon n^{1/6}$, begin{equation*} left| frac{mathbb{P}(|Q^{1/2}W|>x)}{mathbb{P}(|Q^{1/2}Z|>x)}-1 right|leq C left( frac{1+x^5}{det{(Q^{1/2})}n}+frac{x^6}{n}right) quad text{for} dgeq 5 end{equation*} and begin{equation*} left| frac{mathbb{P}(|Q^{1/2}W|>x)}{mathbb{P}(|Q^{1/2}Z|>x)}-1 right|leq C left( frac{1+x^3}{det{(Q^{1/2})}n^{frac{d}{d+1}}}+frac{x^6}{n}right) quad text{for} 1leq dleq 4, end{equation*} where $varepsilon$ and $C$ are positive constants depending only on $d, t_0$, and $c_0$. This is a first extension of Cram'er-type moderate deviation to the multivariate setting with a faster convergence rate than $1/sqrt{n}$. The range of $x=o(n^{1/6})$ for the relative error to vanish and the dimension requirement $dgeq 5$ for the $1/n$ rate are both optimal. We prove our result using a new change of measure, a two-term Edgeworth expansion for the changed measure, and cancellation by symmetry for terms of the order $1/sqrt{n}$.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44915080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This is a complementary proof of partial generalized 4 moment theorem (PG4MT) mentioned and described in “Generalized Four Moment Theorem (G4MT) and its Application to CLT for Spiked Eigenvalues of High-dimensional Covariance Matrices”. Since the G4MT proposed in that paper requires both the matrices X and Y satisfying the assumption maxt,s|uts|2E{|x11|4I(|x11|
{"title":"Partial generalized four moment theorem revisited","authors":"Dandan Jiang, Z. Bai","doi":"10.3150/20-BEJ1310","DOIUrl":"https://doi.org/10.3150/20-BEJ1310","url":null,"abstract":"This is a complementary proof of partial generalized 4 moment theorem (PG4MT) mentioned and described in “Generalized Four Moment Theorem (G4MT) and its Application to CLT for Spiked Eigenvalues of High-dimensional Covariance Matrices”. Since the G4MT proposed in that paper requires both the matrices X and Y satisfying the assumption maxt,s|uts|2E{|x11|4I(|x11|<n)−μ}→0 with the same μ which maybe restrictive in real applications, we proposed a new G4MT, called PG4MT, without proof. After the manuscript posed in ArXiv, the authors received high interests in the proof of PG4MT through private communications and find the PG4MT more general than G4MT, it is necessary to give a detailed proof of it. Moreover, it is found that the PG4MT derives a CLT of spiked eigenvalues of sample covariance matrices which covers the work in Bai and Yao (J. Multivariate Anal. 106 (2012) 167–177) as a special case.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"27 1","pages":"2337-2352"},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44730685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Friedman's chi-square test is a non-parametric statistical test for $rgeq2$ treatments across $nge1$ trials to assess the null hypothesis that there is no treatment effect. We use Stein's method with an exchangeable pair coupling to derive an explicit bound on the distance between the distribution of Friedman's statistic and its limiting chi-square distribution, measured using smooth test functions. Our bound is of the optimal order $n^{-1}$, and also has an optimal dependence on the parameter $r$, in that the bound tends to zero if and only if $r/nrightarrow0$. From this bound, we deduce a Kolmogorov distance bound that decays to zero under the weaker condition $r^{1/2}/nrightarrow0$.
{"title":"Bounds for the chi-square approximation of Friedman’s statistic by Stein’s method","authors":"Robert E. Gaunt, G. Reinert","doi":"10.3150/22-bej1530","DOIUrl":"https://doi.org/10.3150/22-bej1530","url":null,"abstract":"Friedman's chi-square test is a non-parametric statistical test for $rgeq2$ treatments across $nge1$ trials to assess the null hypothesis that there is no treatment effect. We use Stein's method with an exchangeable pair coupling to derive an explicit bound on the distance between the distribution of Friedman's statistic and its limiting chi-square distribution, measured using smooth test functions. Our bound is of the optimal order $n^{-1}$, and also has an optimal dependence on the parameter $r$, in that the bound tends to zero if and only if $r/nrightarrow0$. From this bound, we deduce a Kolmogorov distance bound that decays to zero under the weaker condition $r^{1/2}/nrightarrow0$.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48404404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper, a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that networks which minimize the empirical risk do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.
{"title":"Over-parametrized deep neural networks minimizing the empirical risk do not generalize well","authors":"M. Kohler, A. Krzyżak","doi":"10.3150/21-BEJ1323","DOIUrl":"https://doi.org/10.3150/21-BEJ1323","url":null,"abstract":"Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper, a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that networks which minimize the empirical risk do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"27 1","pages":"2564-2597"},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42450516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study supervised and semi-supervised algorithms in the set-valued classification framework with controlled expected size. While the former methods can use only n labeled samples, the latter are able to make use of N additional unlabeled data. We obtain semi-supervised minimax rates of convergence under the α-margin assumption and a β-Hölder condition on the conditional distribution of labels. Our analysis implies that if no further assumption is made, there is no supervised method that outperforms the semi-supervised estimator proposed in this work – the best achievable rate for any supervised method is O(n−1/2), even if the margin assumption is extremely favorable; on the contrary, the developed semi-supervised estimator can achieve faster O((n/ logn)−(1+α)β/(2β+d)) rate of convergence provided that sufficiently many unlabeled samples are available. We also show that under additional smoothness assumption, supervised methods are able to achieve faster rates and the unlabeled sample cannot improve the rate of convergence. Finally, a numerical study supports our theory and emphasizes the relevance of the assumptions we required from an empirical perspective.
{"title":"Minimax semi-supervised set-valued approach to multi-class classification","authors":"Evgenii Chzhen, Christophe Denis, Mohamed Hebiri","doi":"10.3150/20-BEJ1313","DOIUrl":"https://doi.org/10.3150/20-BEJ1313","url":null,"abstract":"We study supervised and semi-supervised algorithms in the set-valued classification framework with controlled expected size. While the former methods can use only n labeled samples, the latter are able to make use of N additional unlabeled data. We obtain semi-supervised minimax rates of convergence under the α-margin assumption and a β-Hölder condition on the conditional distribution of labels. Our analysis implies that if no further assumption is made, there is no supervised method that outperforms the semi-supervised estimator proposed in this work – the best achievable rate for any supervised method is O(n−1/2), even if the margin assumption is extremely favorable; on the contrary, the developed semi-supervised estimator can achieve faster O((n/ logn)−(1+α)β/(2β+d)) rate of convergence provided that sufficiently many unlabeled samples are available. We also show that under additional smoothness assumption, supervised methods are able to achieve faster rates and the unlabeled sample cannot improve the rate of convergence. Finally, a numerical study supports our theory and emphasizes the relevance of the assumptions we required from an empirical perspective.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47747471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Superposition of renewal processes is common in practice, and it is challenging to estimate the distribution of the individual inter-occurrence time associated with the renewal process. This is because with only aggregated event history, the link between the observed recurrence times and the respective renewal processes are completely missing, rendering existing theory and methods inapplicable. In this article, we propose a nonparametric procedure to estimate the inter-occurrence time distribution by properly deconvoluting the renewal equation with the empirical renewal function. By carefully controlling the discretization errors and properly handling challenges due to implicit and non-smooth mapping via the renewal equation, our theoretical analysis establishes the consistency and asymptotic normality of the nonparametric estimators. The proposed nonparametric distribution estimators are then utilized for developing theoretically valid and computationally efficient inferences when a parametric family is assumed for the individual renewal process. Comprehensive simulations show that compared with the existing maximum likelihood method, the proposed parametric estimation procedure is much faster, and the proposed estimators are more robust to round-off errors in the observed data.
{"title":"Estimating the inter-occurrence time distribution from superposed renewal processes","authors":"Xiaoyu Li, Z. Ye, C. Tang","doi":"10.3150/21-BEJ1331","DOIUrl":"https://doi.org/10.3150/21-BEJ1331","url":null,"abstract":"Superposition of renewal processes is common in practice, and it is challenging to estimate the distribution of the individual inter-occurrence time associated with the renewal process. This is because with only aggregated event history, the link between the observed recurrence times and the respective renewal processes are completely missing, rendering existing theory and methods inapplicable. In this article, we propose a nonparametric procedure to estimate the inter-occurrence time distribution by properly deconvoluting the renewal equation with the empirical renewal function. By carefully controlling the discretization errors and properly handling challenges due to implicit and non-smooth mapping via the renewal equation, our theoretical analysis establishes the consistency and asymptotic normality of the nonparametric estimators. The proposed nonparametric distribution estimators are then utilized for developing theoretically valid and computationally efficient inferences when a parametric family is assumed for the individual renewal process. Comprehensive simulations show that compared with the existing maximum likelihood method, the proposed parametric estimation procedure is much faster, and the proposed estimators are more robust to round-off errors in the observed data.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"27 1","pages":"2804-2826"},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44009027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article develops a two-sample nonparametric goodness-of-fit (GOF) test for uniform stochastic ordering (USO) when observations are taken in pairs. We propose a data-driven critical value that controls the type I error and yields a consistent test. A simulation study illustrates the finite-sample performance of our test. All the proofs are included in the supplemental file.
{"title":"Testing against uniform stochastic ordering with paired observations","authors":"Dewei Wang, Chuan-Fa Tang","doi":"10.3150/21-BEJ1322","DOIUrl":"https://doi.org/10.3150/21-BEJ1322","url":null,"abstract":"This article develops a two-sample nonparametric goodness-of-fit (GOF) test for uniform stochastic ordering (USO) when observations are taken in pairs. We propose a data-driven critical value that controls the type I error and yields a consistent test. A simulation study illustrates the finite-sample performance of our test. All the proofs are included in the supplemental file.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":"27 1","pages":"2556-2563"},"PeriodicalIF":1.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47444065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We investigate a family of discrete-time stationary processes defined by multiple stable integrals and renewal processes with infinite means. The model may exhibit behaviors of short-range or long-range dependence, respectively, depending on the parameters. The main contribution is to establish a phase transition in terms of the tail processes that characterize local clustering of extremes. Moreover, in the short-range dependence regime, the model provides an example where the extremal index is different from the candidate extremal index.
{"title":"Tail processes for stable-regenerative multiple-stable model","authors":"Shuyang Bai, Yizao Wang","doi":"10.3150/22-bej1582","DOIUrl":"https://doi.org/10.3150/22-bej1582","url":null,"abstract":"We investigate a family of discrete-time stationary processes defined by multiple stable integrals and renewal processes with infinite means. The model may exhibit behaviors of short-range or long-range dependence, respectively, depending on the parameters. The main contribution is to establish a phase transition in terms of the tail processes that characterize local clustering of extremes. Moreover, in the short-range dependence regime, the model provides an example where the extremal index is different from the candidate extremal index.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47389282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the i.i.d. Bernoulli field $mu_p$ on $mathbb{Z}^d$ with occupation density $pin [0,1]$. To each realization of the set of occupied sites we apply a thinning map that removes all occupied sites that are isolated in graph distance. We show that, while this map seems non-invasive for large $p$, as it changes only a small fraction $p(1-p)^{2d}$ of sites, there is $p(d)<1$ such that for all $pin(p(d),1)$ the resulting measure is a non-Gibbsian measure, i.e., it does not possess a continuous version of its finite-volume conditional probabilities. On the other hand, for small $p$, the Gibbs property is preserved.
{"title":"Gibbsianness and non-Gibbsianness for Bernoulli lattice fields under removal of isolated sites","authors":"B. Jahnel, C. Kuelske","doi":"10.3150/22-bej1572","DOIUrl":"https://doi.org/10.3150/22-bej1572","url":null,"abstract":"We consider the i.i.d. Bernoulli field $mu_p$ on $mathbb{Z}^d$ with occupation density $pin [0,1]$. To each realization of the set of occupied sites we apply a thinning map that removes all occupied sites that are isolated in graph distance. We show that, while this map seems non-invasive for large $p$, as it changes only a small fraction $p(1-p)^{2d}$ of sites, there is $p(d)<1$ such that for all $pin(p(d),1)$ the resulting measure is a non-Gibbsian measure, i.e., it does not possess a continuous version of its finite-volume conditional probabilities. On the other hand, for small $p$, the Gibbs property is preserved.","PeriodicalId":55387,"journal":{"name":"Bernoulli","volume":" ","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45757610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}