This paper investigates the phenomenon of benign overfitting in binary classification problems with heavy-tailed input distributions. We extend the analysis of maximum margin classifiers to $alpha$ sub-exponential distributions, where $alpha in (0,2]$, generalizing previous work that focused on sub-gaussian inputs. Our main result provides generalization error bounds for linear classifiers trained using gradient descent on unregularized logistic loss in this heavy-tailed setting. We prove that under certain conditions on the dimensionality $p$ and feature vector magnitude $|mu|$, the misclassification error of the maximum margin classifier asymptotically approaches the noise level. This work contributes to the understanding of benign overfitting in more robust distribution settings and demonstrates that the phenomenon persists even with heavier-tailed inputs than previously studied.
本文研究了具有重尾输入分布的二元分类问题中的良性过拟合现象。我们将最大边际分类器的分析扩展到了 $alpha$ 亚指数分布,其中 $alpha in (0,2]$,这是对之前专注于亚高斯输入的工作的推广。我们的主要结果为在这种重尾情况下使用梯度下降非规则化逻辑损失训练的线性分类器提供了广义误差边界。我们证明,在维度 $p$ 和特征向量大小 $|mu|$ 的特定条件下,最大边际分类器的误分类误差会渐近地接近噪声水平。这项工作有助于理解更稳健分布设置中的良性过拟合,并证明即使输入的尾部比以前研究的更重,这种现象也会持续存在。
{"title":"Benign Overfitting for $α$ Sub-exponential Input","authors":"Kota Okudo, Kei Kobayashi","doi":"arxiv-2409.00733","DOIUrl":"https://doi.org/arxiv-2409.00733","url":null,"abstract":"This paper investigates the phenomenon of benign overfitting in binary\u0000classification problems with heavy-tailed input distributions. We extend the\u0000analysis of maximum margin classifiers to $alpha$ sub-exponential\u0000distributions, where $alpha in (0,2]$, generalizing previous work that\u0000focused on sub-gaussian inputs. Our main result provides generalization error\u0000bounds for linear classifiers trained using gradient descent on unregularized\u0000logistic loss in this heavy-tailed setting. We prove that under certain\u0000conditions on the dimensionality $p$ and feature vector magnitude $|mu|$,\u0000the misclassification error of the maximum margin classifier asymptotically\u0000approaches the noise level. This work contributes to the understanding of\u0000benign overfitting in more robust distribution settings and demonstrates that\u0000the phenomenon persists even with heavier-tailed inputs than previously\u0000studied.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bi-factor analysis is a form of confirmatory factor analysis widely used in psychological and educational measurement. The use of a bi-factor model requires the specification of an explicit bi-factor structure on the relationship between the observed variables and the group factors. In practice, the bi-factor structure is sometimes unknown, in which case an exploratory form of bi-factor analysis is needed to find the bi-factor structure. Unfortunately, there are few methods for exploratory bi-factor analysis, with the exception of a rotation-based method proposed in Jennrich and Bentler (2011, 2012). However, this method only finds approximate bi-factor structures, as it does not yield an exact bi-factor loading structure, even after applying hard thresholding. In this paper, we propose a constraint-based optimisation method that learns an exact bi-factor loading structure from data, overcoming the issue with the rotation-based method. The key to the proposed method is a mathematical characterisation of the bi-factor loading structure as a set of equality constraints, which allows us to formulate the exploratory bi-factor analysis problem as a constrained optimisation problem in a continuous domain and solve the optimisation problem with an augmented Lagrangian method. The power of the proposed method is shown via simulation studies and a real data example. Extending the proposed method to exploratory hierarchical factor analysis is also discussed. The codes are available on ``https://anonymous.4open.science/r/Bifactor-ALM-C1E6".
{"title":"Exact Exploratory Bi-factor Analysis: A Constraint-based Optimisation Approach","authors":"Jiawei Qiao, Yunxiao Chen, Zhiliang Ying","doi":"arxiv-2409.00679","DOIUrl":"https://doi.org/arxiv-2409.00679","url":null,"abstract":"Bi-factor analysis is a form of confirmatory factor analysis widely used in\u0000psychological and educational measurement. The use of a bi-factor model\u0000requires the specification of an explicit bi-factor structure on the\u0000relationship between the observed variables and the group factors. In practice,\u0000the bi-factor structure is sometimes unknown, in which case an exploratory form\u0000of bi-factor analysis is needed to find the bi-factor structure. Unfortunately,\u0000there are few methods for exploratory bi-factor analysis, with the exception of\u0000a rotation-based method proposed in Jennrich and Bentler (2011, 2012). However,\u0000this method only finds approximate bi-factor structures, as it does not yield\u0000an exact bi-factor loading structure, even after applying hard thresholding. In\u0000this paper, we propose a constraint-based optimisation method that learns an\u0000exact bi-factor loading structure from data, overcoming the issue with the\u0000rotation-based method. The key to the proposed method is a mathematical\u0000characterisation of the bi-factor loading structure as a set of equality\u0000constraints, which allows us to formulate the exploratory bi-factor analysis\u0000problem as a constrained optimisation problem in a continuous domain and solve\u0000the optimisation problem with an augmented Lagrangian method. The power of the\u0000proposed method is shown via simulation studies and a real data example.\u0000Extending the proposed method to exploratory hierarchical factor analysis is\u0000also discussed. The codes are available on\u0000``https://anonymous.4open.science/r/Bifactor-ALM-C1E6\".","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce directional regularity, a new definition of anisotropy for multivariate functional data. Instead of taking the conventional view which determines anisotropy as a notion of smoothness along a dimension, directional regularity additionally views anisotropy through the lens of directions. We show that faster rates of convergence can be obtained through a change-of-basis by adapting to the directional regularity of a multivariate process. An algorithm for the estimation and identification of the change-of-basis matrix is constructed, made possible due to the unique replication structure of functional data. Non-asymptotic bounds are provided for our algorithm, supplemented by numerical evidence from an extensive simulation study. We discuss two possible applications of the directional regularity approach, and advocate its consideration as a standard pre-processing step in multivariate functional data analysis.
{"title":"Structural adaptation via directional regularity: rate accelerated estimation in multivariate functional data","authors":"Omar Kassi, Sunny G. W. Wang","doi":"arxiv-2409.00817","DOIUrl":"https://doi.org/arxiv-2409.00817","url":null,"abstract":"We introduce directional regularity, a new definition of anisotropy for\u0000multivariate functional data. Instead of taking the conventional view which\u0000determines anisotropy as a notion of smoothness along a dimension, directional\u0000regularity additionally views anisotropy through the lens of directions. We\u0000show that faster rates of convergence can be obtained through a change-of-basis\u0000by adapting to the directional regularity of a multivariate process. An\u0000algorithm for the estimation and identification of the change-of-basis matrix\u0000is constructed, made possible due to the unique replication structure of\u0000functional data. Non-asymptotic bounds are provided for our algorithm,\u0000supplemented by numerical evidence from an extensive simulation study. We\u0000discuss two possible applications of the directional regularity approach, and\u0000advocate its consideration as a standard pre-processing step in multivariate\u0000functional data analysis.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While differentially private synthetic data generation has been explored extensively in the literature, how to update this data in the future if the underlying private data changes is much less understood. We propose an algorithmic framework for streaming data that generates multiple synthetic datasets over time, tracking changes in the underlying private data. Our algorithm satisfies differential privacy for the entire input stream (continual differential privacy) and can be used for high-dimensional tabular data. Furthermore, we show the utility of our method via experiments on real-world datasets. The proposed algorithm builds upon a popular select, measure, fit, and iterate paradigm (used by offline synthetic data generation algorithms) and private counters for streams.
{"title":"Differentially Private Synthetic High-dimensional Tabular Stream","authors":"Girish Kumar, Thomas Strohmer, Roman Vershynin","doi":"arxiv-2409.00322","DOIUrl":"https://doi.org/arxiv-2409.00322","url":null,"abstract":"While differentially private synthetic data generation has been explored\u0000extensively in the literature, how to update this data in the future if the\u0000underlying private data changes is much less understood. We propose an\u0000algorithmic framework for streaming data that generates multiple synthetic\u0000datasets over time, tracking changes in the underlying private data. Our\u0000algorithm satisfies differential privacy for the entire input stream (continual\u0000differential privacy) and can be used for high-dimensional tabular data.\u0000Furthermore, we show the utility of our method via experiments on real-world\u0000datasets. The proposed algorithm builds upon a popular select, measure, fit,\u0000and iterate paradigm (used by offline synthetic data generation algorithms) and\u0000private counters for streams.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We offer in this short report the so-called adaptive functional smoothness estimation in the Hilbert space norm sense in the three classical problems of non-parametrical statistic: regression, density and spectral (density) function measurement (estimation).
{"title":"Adaptive smoothness of function estimation in the three classical problems of the non-parametrical statistic in the three classical problems of the non-parametrical statistic","authors":"M. R. Formica, E. Ostrovsky, L. Sirota","doi":"arxiv-2409.00491","DOIUrl":"https://doi.org/arxiv-2409.00491","url":null,"abstract":"We offer in this short report the so-called adaptive functional smoothness\u0000estimation in the Hilbert space norm sense in the three classical problems of\u0000non-parametrical statistic: regression, density and spectral (density) function\u0000measurement (estimation).","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"140 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elliptical distributions are a simple and flexible class of distributions that depend on a one-dimensional function, called the density generator. In this article, we study the non-parametric estimator of this generator that was introduced by Liebscher (2005). This estimator depends on two tuning parameters: a bandwidth $h$ -- as usual in kernel smoothing -- and an additional parameter $a$ that control the behavior near the center of the distribution. We give an explicit expression for the asymptotic MSE at a point $x$, and derive explicit expressions for the optimal tuning parameters $h$ and $a$. Estimation of the derivatives of the generator is also discussed. A simulation study shows the performance of the new methods.
{"title":"On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator","authors":"Victor Ryan, Alexis Derumigny","doi":"arxiv-2408.17087","DOIUrl":"https://doi.org/arxiv-2408.17087","url":null,"abstract":"Elliptical distributions are a simple and flexible class of distributions\u0000that depend on a one-dimensional function, called the density generator. In\u0000this article, we study the non-parametric estimator of this generator that was\u0000introduced by Liebscher (2005). This estimator depends on two tuning\u0000parameters: a bandwidth $h$ -- as usual in kernel smoothing -- and an\u0000additional parameter $a$ that control the behavior near the center of the\u0000distribution. We give an explicit expression for the asymptotic MSE at a point\u0000$x$, and derive explicit expressions for the optimal tuning parameters $h$ and\u0000$a$. Estimation of the derivatives of the generator is also discussed. A\u0000simulation study shows the performance of the new methods.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"144 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efstathios Paparoditis, Lea Wegner, Martin Wendler
Change-points in functional time series can be detected using the CUSUM-statistic, which is a non-linear functional of the partial sum process. Various methods have been proposed to obtain critical values for this statistic. In this paper we use the functional autoregressive sieve bootstrap to imitate the behavior of the partial sum process and we show that this procedure asymptotically correct estimates critical values under the null hypothesis. We also establish the consistency of the corresponding bootstrap based test under local alternatives. The finite sample performance of the procedure is studied via simulations under the null -hypothesis and under the alternative.
{"title":"Functional Sieve Bootstrap for the Partial Sum Process with Application to Change-Point Detection without Dimension Reduction","authors":"Efstathios Paparoditis, Lea Wegner, Martin Wendler","doi":"arxiv-2408.05071","DOIUrl":"https://doi.org/arxiv-2408.05071","url":null,"abstract":"Change-points in functional time series can be detected using the\u0000CUSUM-statistic, which is a non-linear functional of the partial sum process.\u0000Various methods have been proposed to obtain critical values for this\u0000statistic. In this paper we use the functional autoregressive sieve bootstrap\u0000to imitate the behavior of the partial sum process and we show that this\u0000procedure asymptotically correct estimates critical values under the null\u0000hypothesis. We also establish the consistency of the corresponding bootstrap\u0000based test under local alternatives. The finite sample performance of the\u0000procedure is studied via simulations under the null -hypothesis and under the\u0000alternative.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study the common least-squares minimization approach is compared to the Bayesian updating procedure. In the content of material parameter identification the posterior parameter density function is obtained from its prior and the likelihood function of the measurements. By using Markov Chain Monte Carlo methods, such as the Metropolis-Hastings algorithm cite{Hastings1970}, the global density function including local peaks can be computed. Thus this procedure enables an accurate evaluation of the global parameter quality. However, the computational effort is remarkable larger compared to the minimization approach. Thus several methodologies for an efficient approximation of the likelihood function are discussed in the present study.
{"title":"Identification of the parameters of complex constitutive models: Least squares minimization vs. Bayesian updating","authors":"Thomas Most","doi":"arxiv-2408.04928","DOIUrl":"https://doi.org/arxiv-2408.04928","url":null,"abstract":"In this study the common least-squares minimization approach is compared to\u0000the Bayesian updating procedure. In the content of material parameter\u0000identification the posterior parameter density function is obtained from its\u0000prior and the likelihood function of the measurements. By using Markov Chain\u0000Monte Carlo methods, such as the Metropolis-Hastings algorithm\u0000cite{Hastings1970}, the global density function including local peaks can be\u0000computed. Thus this procedure enables an accurate evaluation of the global\u0000parameter quality. However, the computational effort is remarkable larger\u0000compared to the minimization approach. Thus several methodologies for an\u0000efficient approximation of the likelihood function are discussed in the present\u0000study.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose an extension of the classical Sobol' estimator for the estimation of variance based sensitivity indices. The approach assumes a linear correlation model between the input variables which is used to decompose the contribution of an input variable into a correlated and an uncorrelated part. This method provides sampling matrices following the original joint probability distribution which are used directly to compute the model output without any assumptions or approximations of the model response function.
{"title":"Variance-based sensitivity analysis in the presence of correlated input variables","authors":"Thomas Most","doi":"arxiv-2408.04933","DOIUrl":"https://doi.org/arxiv-2408.04933","url":null,"abstract":"In this paper we propose an extension of the classical Sobol' estimator for\u0000the estimation of variance based sensitivity indices. The approach assumes a\u0000linear correlation model between the input variables which is used to decompose\u0000the contribution of an input variable into a correlated and an uncorrelated\u0000part. This method provides sampling matrices following the original joint\u0000probability distribution which are used directly to compute the model output\u0000without any assumptions or approximations of the model response function.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a nested family of Bayesian nonparametric models for network and interaction data with a hierarchical granularity structure that naturally arises through finer and coarser population labelings. In the case of network data, the structure is easily visualized by merging and shattering vertices, while respecting the edge structure. We further develop Bayesian inference procedures for the model family, and apply them to synthetic and real data. The family provides a connection of practical and theoretical interest between the Hollywood model of Crane and Dempsey, and the generalized-gamma graphex model of Caron and Fox. A key ingredient for the construction of the family is fragmentation and coagulation duality for integer partitions, and for this we develop novel duality relations that generalize those of Pitman and Dong, Goldschmidt and Martin. The duality is also crucially used in our inferential procedures.
我们为网络和交互数据引入了一个嵌套的贝叶斯非参数模型系列,该模型具有分层粒度结构,通过更细和更粗的群体标签自然形成。就网络数据而言,在尊重边结构的前提下,通过合并和破碎顶点,可以很容易地将结构可视化。我们进一步开发了模型族的贝叶斯推断程序,并将其应用于合成数据和真实数据。该模型族为 Crane 和 Dempsey 的好莱坞模型以及 Caron 和 Fox 的广义伽马石墨烯模型提供了实用和理论上的联系。构建这个族的一个关键要素是整数分割的破碎和凝固对偶性,为此我们发展了新的对偶关系,概括了皮特曼和东、戈尔德施密特和马丁的对偶关系。这种对偶性在我们的推论过程中也得到了重要应用。
{"title":"Network and interaction models for data with hierarchical granularity via fragmentation and coagulation","authors":"Lancelot F. James, Juho Lee, Nathan Ross","doi":"arxiv-2408.04866","DOIUrl":"https://doi.org/arxiv-2408.04866","url":null,"abstract":"We introduce a nested family of Bayesian nonparametric models for network and\u0000interaction data with a hierarchical granularity structure that naturally\u0000arises through finer and coarser population labelings. In the case of network\u0000data, the structure is easily visualized by merging and shattering vertices,\u0000while respecting the edge structure. We further develop Bayesian inference\u0000procedures for the model family, and apply them to synthetic and real data. The\u0000family provides a connection of practical and theoretical interest between the\u0000Hollywood model of Crane and Dempsey, and the generalized-gamma graphex model\u0000of Caron and Fox. A key ingredient for the construction of the family is\u0000fragmentation and coagulation duality for integer partitions, and for this we\u0000develop novel duality relations that generalize those of Pitman and Dong,\u0000Goldschmidt and Martin. The duality is also crucially used in our inferential\u0000procedures.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"126 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}