arXiv - STAT - Statistics Theory最新文献

英文中文

Benign Overfitting for $α$ Sub-exponential Input α$亚指数输入的良性过度拟合

arXiv - STAT - Statistics Theory

Pub Date : 2024-09-01 DOI: arxiv-2409.00733

Kota Okudo, Kei Kobayashi

This paper investigates the phenomenon of benign overfitting in binaryclassification problems with heavy-tailed input distributions. We extend theanalysis of maximum margin classifiers to $alpha$ sub-exponentialdistributions, where $alpha in (0,2]$, generalizing previous work thatfocused on sub-gaussian inputs. Our main result provides generalization errorbounds for linear classifiers trained using gradient descent on unregularizedlogistic loss in this heavy-tailed setting. We prove that under certainconditions on the dimensionality $p$ and feature vector magnitude $|mu|$,the misclassification error of the maximum margin classifier asymptoticallyapproaches the noise level. This work contributes to the understanding ofbenign overfitting in more robust distribution settings and demonstrates thatthe phenomenon persists even with heavier-tailed inputs than previouslystudied.

本文研究了具有重尾输入分布的二元分类问题中的良性过拟合现象。我们将最大边际分类器的分析扩展到了 $alpha$ 亚指数分布，其中 $alpha in (0,2]$，这是对之前专注于亚高斯输入的工作的推广。我们的主要结果为在这种重尾情况下使用梯度下降非规则化逻辑损失训练的线性分类器提供了广义误差边界。我们证明，在维度 $p$ 和特征向量大小 $|mu|$ 的特定条件下，最大边际分类器的误分类误差会渐近地接近噪声水平。这项工作有助于理解更稳健分布设置中的良性过拟合，并证明即使输入的尾部比以前研究的更重，这种现象也会持续存在。

引用次数: 0

Exact Exploratory Bi-factor Analysis: A Constraint-based Optimisation Approach 精确探索性双因素分析：基于约束的优化方法

arXiv - STAT - Statistics Theory

Pub Date : 2024-09-01 DOI: arxiv-2409.00679

Jiawei Qiao, Yunxiao Chen, Zhiliang Ying

Bi-factor analysis is a form of confirmatory factor analysis widely used inpsychological and educational measurement. The use of a bi-factor modelrequires the specification of an explicit bi-factor structure on therelationship between the observed variables and the group factors. In practice,the bi-factor structure is sometimes unknown, in which case an exploratory formof bi-factor analysis is needed to find the bi-factor structure. Unfortunately,there are few methods for exploratory bi-factor analysis, with the exception ofa rotation-based method proposed in Jennrich and Bentler (2011, 2012). However,this method only finds approximate bi-factor structures, as it does not yieldan exact bi-factor loading structure, even after applying hard thresholding. Inthis paper, we propose a constraint-based optimisation method that learns anexact bi-factor loading structure from data, overcoming the issue with therotation-based method. The key to the proposed method is a mathematicalcharacterisation of the bi-factor loading structure as a set of equalityconstraints, which allows us to formulate the exploratory bi-factor analysisproblem as a constrained optimisation problem in a continuous domain and solvethe optimisation problem with an augmented Lagrangian method. The power of theproposed method is shown via simulation studies and a real data example.Extending the proposed method to exploratory hierarchical factor analysis isalso discussed. The codes are available on``https://anonymous.4open.science/r/Bifactor-ALM-C1E6".

双因素分析是确证因素分析的一种形式，广泛应用于心理学和教育测量领域。双因素模型的使用要求在观察变量与群体因素之间的关系上明确规定双因素结构。在实践中，双因素结构有时是未知的，在这种情况下，就需要采用探索性的双因素分析方法来寻找双因素结构。遗憾的是，除了 Jennrich 和 Bentler（2011，2012）提出的一种基于旋转的方法外，探索性双因素分析方法很少。然而，这种方法只能找到近似的双因子结构，因为即使在应用硬阈值后，它也不能得到精确的双因子加载结构。在本文中，我们提出了一种基于约束的优化方法，它能从数据中学习精确的双因子负载结构，从而克服了基于旋转的方法所存在的问题。该方法的关键在于将双因子负载结构数学化为一组相等约束条件，从而将探索性双因子分析问题表述为连续域中的约束优化问题，并用增强拉格朗日法解决优化问题。我们还讨论了将所提方法扩展到探索性分层因子分析的问题。代码可在 "https://anonymous.4open.science/r/Bifactor-ALM-C1E6 "上获取。

{"title":"Exact Exploratory Bi-factor Analysis: A Constraint-based Optimisation Approach","authors":"Jiawei Qiao, Yunxiao Chen, Zhiliang Ying","doi":"arxiv-2409.00679","DOIUrl":"https://doi.org/arxiv-2409.00679","url":null,"abstract":"Bi-factor analysis is a form of confirmatory factor analysis widely used in\u0000psychological and educational measurement. The use of a bi-factor model\u0000requires the specification of an explicit bi-factor structure on the\u0000relationship between the observed variables and the group factors. In practice,\u0000the bi-factor structure is sometimes unknown, in which case an exploratory form\u0000of bi-factor analysis is needed to find the bi-factor structure. Unfortunately,\u0000there are few methods for exploratory bi-factor analysis, with the exception of\u0000a rotation-based method proposed in Jennrich and Bentler (2011, 2012). However,\u0000this method only finds approximate bi-factor structures, as it does not yield\u0000an exact bi-factor loading structure, even after applying hard thresholding. In\u0000this paper, we propose a constraint-based optimisation method that learns an\u0000exact bi-factor loading structure from data, overcoming the issue with the\u0000rotation-based method. The key to the proposed method is a mathematical\u0000characterisation of the bi-factor loading structure as a set of equality\u0000constraints, which allows us to formulate the exploratory bi-factor analysis\u0000problem as a constrained optimisation problem in a continuous domain and solve\u0000the optimisation problem with an augmented Lagrangian method. The power of the\u0000proposed method is shown via simulation studies and a real data example.\u0000Extending the proposed method to exploratory hierarchical factor analysis is\u0000also discussed. The codes are available on\u0000``https://anonymous.4open.science/r/Bifactor-ALM-C1E6\".","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142192661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structural adaptation via directional regularity: rate accelerated estimation in multivariate functional data 通过方向规则性进行结构适应：多变量功能数据中的速率加速估计

arXiv - STAT - Statistics Theory

Pub Date : 2024-09-01 DOI: arxiv-2409.00817

Omar Kassi, Sunny G. W. Wang

We introduce directional regularity, a new definition of anisotropy formultivariate functional data. Instead of taking the conventional view whichdetermines anisotropy as a notion of smoothness along a dimension, directionalregularity additionally views anisotropy through the lens of directions. Weshow that faster rates of convergence can be obtained through a change-of-basisby adapting to the directional regularity of a multivariate process. Analgorithm for the estimation and identification of the change-of-basis matrixis constructed, made possible due to the unique replication structure offunctional data. Non-asymptotic bounds are provided for our algorithm,supplemented by numerical evidence from an extensive simulation study. Wediscuss two possible applications of the directional regularity approach, andadvocate its consideration as a standard pre-processing step in multivariatefunctional data analysis.

我们介绍了方向正则性--多变量函数数据各向异性的新定义。传统观点将各向异性定义为沿某一维度的平滑性概念，而方向正则性则通过方向的视角来看待各向异性。我们看到，通过改变基础，适应多变量过程的方向规则性，可以获得更快的收敛速度。由于功能数据具有独特的复制结构，我们构建了估算和识别变化基础矩阵的分析方法。我们为算法提供了非渐近界限，并通过广泛的模拟研究提供了数值证据作为补充。我们讨论了定向正则方法的两种可能应用，并主张将其作为多变量函数数据分析的标准预处理步骤。

引用次数: 0

Differentially Private Synthetic High-dimensional Tabular Stream 差分私有化合成高维表格流

arXiv - STAT - Statistics Theory

Pub Date : 2024-08-31 DOI: arxiv-2409.00322

Girish Kumar, Thomas Strohmer, Roman Vershynin

While differentially private synthetic data generation has been exploredextensively in the literature, how to update this data in the future if theunderlying private data changes is much less understood. We propose analgorithmic framework for streaming data that generates multiple syntheticdatasets over time, tracking changes in the underlying private data. Ouralgorithm satisfies differential privacy for the entire input stream (continualdifferential privacy) and can be used for high-dimensional tabular data.Furthermore, we show the utility of our method via experiments on real-worlddatasets. The proposed algorithm builds upon a popular select, measure, fit,and iterate paradigm (used by offline synthetic data generation algorithms) andprivate counters for streams.

虽然文献中已经对差异化私有合成数据的生成进行了广泛的探索，但如果底层私有数据发生变化，如何在未来更新这些数据却鲜为人知。我们提出了流式数据分析算法框架，它可以随着时间的推移生成多个合成数据集，并跟踪底层隐私数据的变化。此外，我们还通过在真实世界数据集上的实验展示了我们方法的实用性。我们提出的算法建立在流行的选择、测量、拟合和迭代范式（用于离线合成数据生成算法）和流隐私计数器的基础之上。

引用次数: 0

Adaptive smoothness of function estimation in the three classical problems of the non-parametrical statistic in the three classical problems of the non-parametrical statistic 非参数统计的三个经典问题中函数估计的自适应平滑性非参数统计的三个经典问题中函数估计的自适应平滑性

arXiv - STAT - Statistics Theory

Pub Date : 2024-08-31 DOI: arxiv-2409.00491

M. R. Formica, E. Ostrovsky, L. Sirota

We offer in this short report the so-called adaptive functional smoothnessestimation in the Hilbert space norm sense in the three classical problems ofnon-parametrical statistic: regression, density and spectral (density) functionmeasurement (estimation).

在这篇简短的报告中，我们针对非参数统计的三个经典问题：回归、密度和频谱（密度）函数测量（估算），提出了在希尔伯特空间规范意义上的所谓自适应函数平滑性估算。

引用次数: 0

On the choice of the two tuning parameters for nonparametric estimation of an elliptical distribution generator 关于椭圆分布发生器非参数估计的两个调整参数的选择

arXiv - STAT - Statistics Theory

Pub Date : 2024-08-30 DOI: arxiv-2408.17087

Victor Ryan, Alexis Derumigny

Elliptical distributions are a simple and flexible class of distributionsthat depend on a one-dimensional function, called the density generator. Inthis article, we study the non-parametric estimator of this generator that wasintroduced by Liebscher (2005). This estimator depends on two tuningparameters: a bandwidth $h$ -- as usual in kernel smoothing -- and anadditional parameter $a$ that control the behavior near the center of thedistribution. We give an explicit expression for the asymptotic MSE at a point$x$, and derive explicit expressions for the optimal tuning parameters $h$ and$a$. Estimation of the derivatives of the generator is also discussed. Asimulation study shows the performance of the new methods.

椭圆分布是一类简单而灵活的分布，它取决于一个称为密度发生器的一维函数。在本文中，我们将研究 Liebscher（2005 年）提出的该生成器的非参数估计器。该估计器取决于两个调整参数：带宽 $h$ --与核平滑一样 --以及控制分布中心附近行为的附加参数 $a$。我们给出了一个点$x$的渐近 MSE 的明确表达式，并推导出最佳调整参数$h$和$a$的明确表达式。我们还讨论了生成器导数的估计。模拟研究显示了新方法的性能。

引用次数: 0

Functional Sieve Bootstrap for the Partial Sum Process with Application to Change-Point Detection without Dimension Reduction 部分求和过程的功能筛引导法在无降维变化点检测中的应用

arXiv - STAT - Statistics Theory

Pub Date : 2024-08-09 DOI: arxiv-2408.05071

Efstathios Paparoditis, Lea Wegner, Martin Wendler

Change-points in functional time series can be detected using theCUSUM-statistic, which is a non-linear functional of the partial sum process.Various methods have been proposed to obtain critical values for thisstatistic. In this paper we use the functional autoregressive sieve bootstrapto imitate the behavior of the partial sum process and we show that thisprocedure asymptotically correct estimates critical values under the nullhypothesis. We also establish the consistency of the corresponding bootstrapbased test under local alternatives. The finite sample performance of theprocedure is studied via simulations under the null -hypothesis and under thealternative.

函数时间序列中的变化点可以使用 CUSUM 统计量来检测，该统计量是偏和过程的非线性函数。在本文中，我们使用函数自回归筛子引导法来模仿偏和过程的行为，并证明了该方法在零假设下对临界值的估计是渐近正确的。我们还确定了在局部替代条件下基于自举的相应检验的一致性。我们还通过模拟研究了该程序在零假设和替代假设下的有限样本性能。

引用次数: 0

Identification of the parameters of complex constitutive models: Least squares minimization vs. Bayesian updating 复杂结构模型参数的识别：最小二乘最小化与贝叶斯更新

arXiv - STAT - Statistics Theory

Pub Date : 2024-08-09 DOI: arxiv-2408.04928

Thomas Most

In this study the common least-squares minimization approach is compared tothe Bayesian updating procedure. In the content of material parameteridentification the posterior parameter density function is obtained from itsprior and the likelihood function of the measurements. By using Markov ChainMonte Carlo methods, such as the Metropolis-Hastings algorithmcite{Hastings1970}, the global density function including local peaks can becomputed. Thus this procedure enables an accurate evaluation of the globalparameter quality. However, the computational effort is remarkable largercompared to the minimization approach. Thus several methodologies for anefficient approximation of the likelihood function are discussed in the presentstudy.

本研究将普通最小二乘最小化方法与贝叶斯更新程序进行了比较。在材料参数识别的内容中，后验参数密度函数是由前验参数密度函数和测量值的似然函数得到的。通过使用马尔可夫链蒙特卡洛方法（如 Metropolis-Hastings 算法），可以计算出包括局部峰值在内的全局密度函数。因此，这种方法可以准确评估全局参数的质量。然而，与最小化方法相比，计算量要大得多。因此，本研究讨论了几种有效逼近似然函数的方法。

引用次数: 0

Variance-based sensitivity analysis in the presence of correlated input variables 存在相关输入变量时基于方差的敏感性分析

arXiv - STAT - Statistics Theory

Pub Date : 2024-08-09 DOI: arxiv-2408.04933

Thomas Most

In this paper we propose an extension of the classical Sobol' estimator forthe estimation of variance based sensitivity indices. The approach assumes alinear correlation model between the input variables which is used to decomposethe contribution of an input variable into a correlated and an uncorrelatedpart. This method provides sampling matrices following the original jointprobability distribution which are used directly to compute the model outputwithout any assumptions or approximations of the model response function.

在本文中，我们提出了一种对经典索博尔估计法的扩展，用于估计基于方差的灵敏度指数。该方法假定输入变量之间存在线性相关模型，用于将输入变量的贡献分解为相关和非相关部分。这种方法提供了遵循原始联合概率分布的抽样矩阵，可直接用于计算模型输出，而无需对模型响应函数进行任何假设或近似。

引用次数: 0

Network and interaction models for data with hierarchical granularity via fragmentation and coagulation 通过分割和凝结实现分层粒度数据的网络和交互模型

arXiv - STAT - Statistics Theory

Pub Date : 2024-08-09 DOI: arxiv-2408.04866

Lancelot F. James, Juho Lee, Nathan Ross

We introduce a nested family of Bayesian nonparametric models for network andinteraction data with a hierarchical granularity structure that naturallyarises through finer and coarser population labelings. In the case of networkdata, the structure is easily visualized by merging and shattering vertices,while respecting the edge structure. We further develop Bayesian inferenceprocedures for the model family, and apply them to synthetic and real data. Thefamily provides a connection of practical and theoretical interest between theHollywood model of Crane and Dempsey, and the generalized-gamma graphex modelof Caron and Fox. A key ingredient for the construction of the family isfragmentation and coagulation duality for integer partitions, and for this wedevelop novel duality relations that generalize those of Pitman and Dong,Goldschmidt and Martin. The duality is also crucially used in our inferentialprocedures.

我们为网络和交互数据引入了一个嵌套的贝叶斯非参数模型系列，该模型具有分层粒度结构，通过更细和更粗的群体标签自然形成。就网络数据而言，在尊重边结构的前提下，通过合并和破碎顶点，可以很容易地将结构可视化。我们进一步开发了模型族的贝叶斯推断程序，并将其应用于合成数据和真实数据。该模型族为 Crane 和 Dempsey 的好莱坞模型以及 Caron 和 Fox 的广义伽马石墨烯模型提供了实用和理论上的联系。构建这个族的一个关键要素是整数分割的破碎和凝固对偶性，为此我们发展了新的对偶关系，概括了皮特曼和东、戈尔德施密特和马丁的对偶关系。这种对偶性在我们的推论过程中也得到了重要应用。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - STAT - Statistics Theory

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀