Journal of Multivariate Analysis最新文献

英文中文

Bounds and identification on direct and indirect effects under partially observed mediator-endpoint confounders 在部分观察到的中介-终点混杂因素下直接和间接影响的界限和鉴定

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-12-03 DOI: 10.1016/j.jmva.2025.105565

Yu Han , Peng Luo , Wei Zhang , Xiang Gu

The direct effect of a treatment variable and the indirect effect through a mediator variable on an endpoint variable are important for understanding a causal mechanism. The Controlled direct effect has a prescriptive interpretation, while the natural direct and indirect effects have a descriptive interpretation. In practice, these three effects are usually very difficult to identify. To tackle this problem, some researchers investigated the upper and lower bounds of these three effects when some reasonable identification conditions hold. For example, Luo and Geng (2016) gave the upper and lower bounds of these direct and indirect effects when there is an unobserved mediator-endpoint confounder vector and the endpoint variable is continuous. In this paper, we tighten the bounds on controlled direct effect in Luo and Geng (2016) when part of the confounders can be observed. Additionally, we give a sufficient condition to identify the direct and indirect effects when the variables satisfy one linear relationship.

治疗变量的直接影响和通过中介变量对终点变量的间接影响对于理解因果机制很重要。受控直接效应具有规定性解释，而自然直接效应和间接效应具有描述性解释。在实践中，这三种效应通常很难识别。为了解决这个问题，一些研究者研究了在一些合理的识别条件下，这三种效应的上界和下界。例如，Luo和Geng（2016）给出了存在未观察到的中介-端点混杂向量且端点变量连续时这些直接和间接影响的上界和下界。在本文中，当可以观察到部分混杂因素时，我们在Luo和Geng（2016）中收紧了受控直接效应的界限。此外，我们还给出了当变量满足一个线性关系时辨识直接效应和间接效应的充分条件。

引用次数: 0

Simultaneous variable selection and estimation of multivariate panel count data 多变量面板计数数据的同时变量选择与估计

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-12-02 DOI: 10.1016/j.jmva.2025.105559

Lei Ge , Rong Liu , Tao Hu , Jianguo Sun

Panel count data are a general type of data arising from the studies on recurrent events and occur when the observed information on each study subject consists of only the numbers of the occurrences of the recurrent events between successive examinations. It is easy to see that such data can occur in many fields, including economic studies, medical studies and social sciences. This paper considers regression analysis of multivariate panel count data with the focus on variable selection and estimation of significant covariate effects. For the problem, a minimum information criterion-based method is proposed and an expectation–maximization algorithm is developed for the determination of the proposed estimator. Furthermore, the resulting estimator is shown to have the desirable oracle property and a simulation study is performed and confirms the good finite-sample properties of the proposed method. Finally the method is applied to a set of real data arising from a skin cancer study.

面板计数数据是由复发事件研究产生的一种一般类型的数据，当观察到的每个研究对象的信息仅包括连续检查之间复发事件的发生次数时，就会出现这种数据。很容易看出，这样的数据可以出现在许多领域，包括经济研究、医学研究和社会科学。本文考虑多元面板计数数据的回归分析，重点是变量选择和显著协变量效应的估计。针对这一问题，提出了一种基于最小信息准则的方法，并提出了一种期望最大化算法来确定所提出的估计量。此外，所得到的估计器具有理想的oracle特性，并进行了仿真研究，证实了所提出的方法具有良好的有限样本特性。最后，将该方法应用于一组来自皮肤癌研究的真实数据。

引用次数: 0

Robust factor analysis with exponential squared loss 具有指数平方损失的稳健因子分析

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-12-01 DOI: 10.1016/j.jmva.2025.105567

Jiaqi Hu, Tingyin Wang, Xueqin Wang

The large dimensional factor model, aimed at reducing dimensionality and extracting features through a few latent common factors, has sparked significant interest due to its broad applications. Despite the popularity of traditional methods for factor models, they may yield incorrect estimators for heavy-tailed data. To address this issue, we introduce the exponential squared loss to the factor model in this study, denoted as the Robust Exponential Factor Analysis (REFA). We propose a modified rank minimization technique to enhance the estimation accuracy of factor numbers in finite-sample cases. Consistency properties for factors and loadings are established under mild conditions, without any moment assumptions on the errors. The performance of REFA with finite samples under both light and heavy-tailed cases has been demonstrated through simulation studies. Furthermore, an analysis employing a financial dataset of asset returns underscores the superiority of REFA. To facilitate the implementation of our proposed methodology by researchers, we have developed an R package named REFA, which is available on CRAN.

大维度因子模型旨在通过几个潜在的共同因子降维并提取特征，由于其广泛的应用而引起了人们的极大兴趣。尽管因子模型的传统方法很受欢迎，但它们可能对重尾数据产生不正确的估计。为了解决这个问题，我们在本研究中将指数平方损失引入因子模型，称为稳健指数因子分析（REFA）。为了提高有限样本情况下因子数的估计精度，提出了一种改进的秩最小化技术。因子和载荷的一致性特性是在温和的条件下建立的，没有对误差的任何力矩假设。通过仿真研究，验证了该方法在轻尾和重尾两种情况下的性能。此外，采用资产回报金融数据集的分析强调了REFA的优越性。为了便于研究人员实施我们提出的方法，我们开发了一个名为REFA的R包，可在CRAN上获得。

{"title":"Robust factor analysis with exponential squared loss","authors":"Jiaqi Hu, Tingyin Wang, Xueqin Wang","doi":"10.1016/j.jmva.2025.105567","DOIUrl":"10.1016/j.jmva.2025.105567","url":null,"abstract":"<div><div>The large dimensional factor model, aimed at reducing dimensionality and extracting features through a few latent common factors, has sparked significant interest due to its broad applications. Despite the popularity of traditional methods for factor models, they may yield incorrect estimators for heavy-tailed data. To address this issue, we introduce the exponential squared loss to the factor model in this study, denoted as the Robust Exponential Factor Analysis (REFA). We propose a modified rank minimization technique to enhance the estimation accuracy of factor numbers in finite-sample cases. Consistency properties for factors and loadings are established under mild conditions, without any moment assumptions on the errors. The performance of REFA with finite samples under both light and heavy-tailed cases has been demonstrated through simulation studies. Furthermore, an analysis employing a financial dataset of asset returns underscores the superiority of REFA. To facilitate the implementation of our proposed methodology by researchers, we have developed an <span>R</span> package named <span>REFA</span>, which is available on <span>CRAN</span>.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"213 ","pages":"Article 105567"},"PeriodicalIF":1.4,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145683763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Uniform designs for experiments with branching and nested factors 具有分支和嵌套因素的实验的统一设计

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-12-01 DOI: 10.1016/j.jmva.2025.105576

Feng Yang , Zheng Zhou , Yongdao Zhou

The factors that exist only at certain levels of other factors are called the nested factors. The factors that lead to such nested factors are called the branching factors. Experiments with branching and nested factors occur frequently in practical applications. Designing such experiments is challenging due to the special relationship between the branching and nested factors. In this paper, we propose uniform designs for experiments involving branching and nested factors. A novel criterion is introduced to measure the uniformity of such designs, and the corresponding lower bound is also given. The construction methods of uniform designs for experiments with branching and nested factors are provided, and their effectiveness is verified by simulation comparisons and a practical manufacturing experiment. The proposed method allows each of branching, nested and shared factors to be either qualitative or quantitative. Moreover, the run size and the levels of quantitative factors are very flexible, such that our method works well for both physical and computer experiments.

只存在于其他因素的一定水平上的因素称为嵌套因素。导致这些嵌套因子的因子被称为分支因子。分支因子和嵌套因子的实验在实际应用中经常出现。由于分支和嵌套因素之间的特殊关系，设计这样的实验是具有挑战性的。在本文中，我们提出了涉及分支和嵌套因素的实验的统一设计。引入了一种新的准则来衡量这种设计的均匀性，并给出了相应的下界。给出了分支因子和嵌套因子实验均匀设计的构建方法，并通过仿真对比和实际制造实验验证了其有效性。所提出的方法允许每个分支、嵌套和共享的因素是定性的或定量的。此外，运行规模和定量因素的水平非常灵活，因此我们的方法对物理和计算机实验都很有效。

引用次数: 0

Iterative sequential screening strategies for sparse recovery with computational advantages 具有计算优势的稀疏恢复迭代顺序筛选策略

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-11-29 DOI: 10.1016/j.jmva.2025.105570

Weixiong Liang , Yuehan Yang

A prominent problem in multi-response models is the presence of complex group structures of the high-dimensional data such as the overlapping group structures. In such models, both responses and predictors are grouped, and each response group is allowed to relate to multiple predictor groups. Ignoring such structures often yields insufficient statistical inference and misleading statistical conclusions. Motivated by practical needs and the sequential canonical correlation search (SCCS) algorithm proposed by Luo and Chen (2020), this paper proposes two computationally attractive feature selection algorithms, reallocating-SCCS (RSCCS) and prescreening-SCCS (PSCCS), for the high-dimensional multi-response models with complex group structures.

The proposed methods, RSCCS and PSCCS, consist of three steps. In the first step, to fully incorporate the information of group structures in the feature selection algorithm, both RSCCS and PSCCS select a non-zero coefficient block according to the canonical correlation between the residual response groups and feature groups. In the second step, RSCCS selects the non-zero coefficient row, while PSCCS conducts screening within the non-zero coefficient block using penalized regularizations. In the third step, RSCCS and PSCCS select features by EBIC based on different situations and different iterations. We demonstrate the advantages of these two methods compared with several existing approaches. The statistical guarantees of RSCCS and PSCCS are established. We provide numerical simulation results and analyze a real data example to compare their performance with other methods.

多响应模型的一个突出问题是高维数据中存在复杂的群结构，如重叠群结构。在这样的模型中，响应和预测因子都被分组，并且每个响应组都可以关联到多个预测因子组。忽视这种结构往往会产生不充分的统计推断和误导性的统计结论。本文基于实际需求，结合Luo和Chen（2020）提出的序列典型相关搜索（SCCS）算法，针对具有复杂群体结构的高维多响应模型，提出了两种计算上具有吸引力的特征选择算法——重新分配-SCCS （RSCCS）和预筛选-SCCS （PSCCS）。所提出的方法，RSCCS和PSCCS，包括三个步骤。第一步，为了在特征选择算法中充分考虑组结构信息，RSCCS和PSCCS都根据残差响应组与特征组之间的典型相关性选择非零系数块。在第二步中，RSCCS选择非零系数行，而PSCCS使用惩罚正则化在非零系数块内进行筛选。第三步，RSCCS和PSCCS根据不同的情况和不同的迭代，通过EBIC选择特征。与现有的几种方法相比，我们证明了这两种方法的优点。建立了RSCCS和PSCCS的统计保证。给出了数值模拟结果，并对一个实际数据实例进行了分析，与其他方法的性能进行了比较。

{"title":"Iterative sequential screening strategies for sparse recovery with computational advantages","authors":"Weixiong Liang , Yuehan Yang","doi":"10.1016/j.jmva.2025.105570","DOIUrl":"10.1016/j.jmva.2025.105570","url":null,"abstract":"<div><div>A prominent problem in multi-response models is the presence of complex group structures of the high-dimensional data such as the overlapping group structures. In such models, both responses and predictors are grouped, and each response group is allowed to relate to multiple predictor groups. Ignoring such structures often yields insufficient statistical inference and misleading statistical conclusions. Motivated by practical needs and the sequential canonical correlation search (SCCS) algorithm proposed by Luo and Chen (2020), this paper proposes two computationally attractive feature selection algorithms, reallocating-SCCS (RSCCS) and prescreening-SCCS (PSCCS), for the high-dimensional multi-response models with complex group structures.</div><div>The proposed methods, RSCCS and PSCCS, consist of three steps. In the first step, to fully incorporate the information of group structures in the feature selection algorithm, both RSCCS and PSCCS select a non-zero coefficient block according to the canonical correlation between the residual response groups and feature groups. In the second step, RSCCS selects the non-zero coefficient row, while PSCCS conducts screening within the non-zero coefficient block using penalized regularizations. In the third step, RSCCS and PSCCS select features by EBIC based on different situations and different iterations. We demonstrate the advantages of these two methods compared with several existing approaches. The statistical guarantees of RSCCS and PSCCS are established. We provide numerical simulation results and analyze a real data example to compare their performance with other methods.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105570"},"PeriodicalIF":1.4,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145616474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed estimation of spiked eigenvalues in spiked population models 尖峰种群模型中尖峰特征值的分布估计

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-11-29 DOI: 10.1016/j.jmva.2025.105558

Lu Yan , Jiang Hu

The proliferation of science and technology has led to the prevalence of voluminous data sets distributed across multiple machines. Conventional statistical methodologies may be infeasible in analyzing such massive data sets due to prohibitively long computing durations, memory constraints, communication overheads, and confidentiality considerations. In this paper, we propose distributed estimators of the spiked eigenvalues in spiked population models. The consistency and asymptotic normality of the distributed estimators are derived, and the statistical error analysis of the distributed estimators is also provided. Compared to the estimation from the full sample, the proposed distributed estimation shares the same order of convergence. Simulation study and real data analysis indicate that the proposed distributed estimation and testing procedures have excellent properties in terms of estimation accuracy and stability as well as transmission efficiency.

科学技术的发展导致了分布在多台机器上的大量数据集的流行。由于计算时间过长、内存限制、通信开销和机密性考虑，传统的统计方法在分析如此庞大的数据集时可能不可行。本文提出了尖峰种群模型中尖峰特征值的分布估计。导出了分布估计量的相合性和渐近正态性，并给出了分布估计量的统计误差分析。与全样本估计相比，所提出的分布式估计具有相同的收敛阶。仿真研究和实际数据分析表明，所提出的分布式估计和测试方法在估计精度、稳定性和传输效率方面具有优异的性能。

引用次数: 0

The mean tests with high dimensional data 高维数据的均值检验

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-11-28 DOI: 10.1016/j.jmva.2025.105564

Wenzhi Yang , Chi Yao , Yiming Liu , Guangming Pan , Wang Zhou

In this paper, we consider the mean tests with high dimensional data and give two new tests which consist of three steps. Firstly, we reduce the high dimensional vectors into many low dimensional vectors and construct the Hotelling’s

T^{2}

tests; Secondly, by the distribution or asymptotic distribution of these Hotelling’s tests under the null hypothesis, we transform these tests into uniform distribution or asymptotic uniform distribution random variables; Thirdly, the central limit theorems of the normalized sum of these transformations are obtained under the Gaussian case and non-Gaussian cases. Moreover, the asymptotic power of new test is also presented for non-Gaussian case. Compared to the existing tests, our tests not only have the good empirical sizes, but also have the high empirical powers.

本文考虑了高维数据的均值检验，给出了两个由三步组成的新检验。首先，将高维向量化简为多个低维向量，构造Hotelling’s T2检验；其次，利用这些Hotelling检验在零假设下的分布或渐近分布，将这些检验转化为均匀分布或渐近均匀分布的随机变量；第三，在高斯和非高斯情况下，得到了这些变换的归一化和的中心极限定理。此外，在非高斯情况下，给出了新检验的渐近幂。与已有的检验相比，我们的检验不仅具有较好的实证规模，而且具有较高的实证幂。

引用次数: 0

Rank-based combination independence tests for high-dimensional data 高维数据的基于秩的组合独立性检验

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-11-28 DOI: 10.1016/j.jmva.2025.105550

Liqi Xia , Ruiyuan Cao , Jiang Du , Ling Liu

This paper proposes a novel approach to enhance the versatility of the max-sum test in high-dimensional data analysis by combining two distinct rank correlation coefficients: Spearman’s

ρ

and Chatterjee’s

ξ

. We uncovered the independence between the max-type test and the sum-type test by deriving their joint distribution. This insight enables the development of a comprehensive max-sum test that tackles both sparse and dense alternative correlation structures in an adaptive manner. Leveraging the asymptotic independence between the two coefficients and the intrinsic highlights of two single-coefficient tests, we have strategically implemented Cauchy combination principles to devise a multifunctional testing methodology. This approach can accommodate monotonic and nonmonotonic data types and thus offers a versatile solution to a broad spectrum of analytical requirements. This versatility of our proposed method has been impressively demonstrated through a diverse range of simulation data studies and two real-world data analyses, underscoring its effectiveness and practical utility.

本文提出了一种新的方法，通过结合两个不同的秩相关系数：Spearman 's ρ和Chatterjee 's ξ，来增强高维数据分析中最大和检验的通用性。我们通过推导最大型检验和和型检验的联合分布，揭示了它们之间的独立性。这种见解使得开发一个全面的最大和测试能够以自适应的方式处理稀疏和密集的替代相关结构。利用两个系数之间的渐近独立性和两个单系数测试的内在亮点，我们有策略地实施柯西组合原理来设计多功能测试方法。这种方法可以容纳单调和非单调的数据类型，因此为广泛的分析需求提供了一个通用的解决方案。我们提出的方法的多功能性已经通过各种模拟数据研究和两个真实世界的数据分析得到了令人印象深刻的证明，强调了其有效性和实用性。

{"title":"Rank-based combination independence tests for high-dimensional data","authors":"Liqi Xia , Ruiyuan Cao , Jiang Du , Ling Liu","doi":"10.1016/j.jmva.2025.105550","DOIUrl":"10.1016/j.jmva.2025.105550","url":null,"abstract":"<div><div>This paper proposes a novel approach to enhance the versatility of the max-sum test in high-dimensional data analysis by combining two distinct rank correlation coefficients: Spearman’s <span><math><mi>ρ</mi></math></span> and Chatterjee’s <span><math><mi>ξ</mi></math></span>. We uncovered the independence between the max-type test and the sum-type test by deriving their joint distribution. This insight enables the development of a comprehensive max-sum test that tackles both sparse and dense alternative correlation structures in an adaptive manner. Leveraging the asymptotic independence between the two coefficients and the intrinsic highlights of two single-coefficient tests, we have strategically implemented Cauchy combination principles to devise a multifunctional testing methodology. This approach can accommodate monotonic and nonmonotonic data types and thus offers a versatile solution to a broad spectrum of analytical requirements. This versatility of our proposed method has been impressively demonstrated through a diverse range of simulation data studies and two real-world data analyses, underscoring its effectiveness and practical utility.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105550"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimation and testing for fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure 具有可分离空间和序列相关误差结构的部分线性非参数面板回归模型的固定效应估计与检验

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-11-28 DOI: 10.1016/j.jmva.2025.105552

Shuangshuang Li , Jianbao Chen

Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called

F_{N T}

is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and

F_{N T}

are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.

从“地点”收集的面板数据可能显示空间和序列相关性。为了研究这种空间和序列相关性，以及可能存在的非线性关系，引入了一种具有可分离空间和序列相关误差结构的固定效应部分线性非参数面板回归模型。我们得到了未知的轮廓拟极大似然估计。此外，设计了一种称为FNT的广义f检验来评估非参数组件设置的合理性。在几种条件下，给出了PQMLEs和FNT的渐近性质。蒙特卡罗试验表明，我们的估计量和检验统计量在有限的样本中表现出良好的性能，模型的错误规范可能会对未知参数的估计产生实质性的影响。通过对中国各省房价的分析，可以发现各省房价之间存在着非线性的、空间的和序列的相关关系。

{"title":"Estimation and testing for fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure","authors":"Shuangshuang Li , Jianbao Chen","doi":"10.1016/j.jmva.2025.105552","DOIUrl":"10.1016/j.jmva.2025.105552","url":null,"abstract":"<div><div>Panel data collected from “locations” may exhibit spatial and serial correlations. In order to study such spatial and serial correlations, and possible existing nonlinear relationships, a fixed effects partially linear nonparametric panel regression model with separable spatially and serially correlated error structure is introduced. We obtain profile quasi-maximum likelihood estimators (PQMLEs) of the unknowns. Furthermore, a generalized F-test called <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> is designed for assessing the reasonability of nonparametric component setting. Asymptotic properties of PQMLEs and <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>N</mi><mi>T</mi></mrow></msub></math></span> are provided under several conditions. Monte Carlo trials imply our estimators and test statistic exhibit good performance in finite samples and model misspecification may lead to substantial influence on the estimates of unknown parameters. The analysis of provincial housing price in China reveals the presence of nonlinear, spatial and serial correlation relationships.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105552"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs 基于dnn的mom - gan对污染数据分布估计的统计保证

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Journal of Multivariate Analysis

Pub Date : 2025-11-28 DOI: 10.1016/j.jmva.2025.105571

Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang

This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the

b

-smoothness Hölder class. The error bound essentially decreases in

n^{- b / p} \lor n^{- 1 / 2}

, where

n

and

p

are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.

本文利用生成对抗网络（gan）和均值中位数（MoM）估计的力量，研究了使用MoM- gan方法对污染数据的分布估计。具体来说，我们使用具有ReLU激活函数的深度神经网络（DNN）来建模GAN的生成器和鉴别器。在理论分析方面，我们推导了基于dnn的MoM-GAN估计器的非渐近误差界，该估计器通过积分概率度量来测量，并考虑了b-平滑Hölder类。误差界本质上在n−b/p中∨n−1/2中减小，其中n和p分别是输入数据的样本量和维数。它为MoM-GAN估计器的准确性和鲁棒性提供了严格的保证，即使在存在污染数据的情况下。我们提出了一种MoM-GAN方法的算法，并在两个实际应用中证明了它的有效性。我们的结果表明，MoM-GAN方法在处理污染数据时优于其他竞争方法，突出了其优越的性能和鲁棒性。

{"title":"Statistical guarantees for distribution estimation of contaminated data via DNN-based MoM-GANs","authors":"Fang Xie , Lihu Xu , Qiuran Yao , Huiming Zhang","doi":"10.1016/j.jmva.2025.105571","DOIUrl":"10.1016/j.jmva.2025.105571","url":null,"abstract":"<div><div>This paper investigates the distribution estimation of contaminated data using the MoM-GAN method, which leverages the power of generative adversarial nets (GANs) and median-of-means (MoM) estimation. Specifically, we use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. In terms of theoretical analysis, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator, which is measured by integral probability metrics and takes into account the <span><math><mi>b</mi></math></span>-smoothness Hölder class. The error bound essentially decreases in <span><math><mrow><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mi>b</mi><mo>/</mo><mi>p</mi></mrow></msup><mo>∨</mo><msup><mrow><mi>n</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msup></mrow></math></span>, where <span><math><mi>n</mi></math></span> and <span><math><mi>p</mi></math></span> are the sample size and the dimension of the input data, respectively. It provides a rigorous guarantee of the accuracy and robustness of the MoM-GAN estimator, even in the presence of contaminated data. We present an algorithm for the MoM-GAN method and demonstrate its effectiveness in two real-world applications. Our results show that the MoM-GAN method outperforms other competing methods when dealing with contaminated data, highlighting its superior performance and robustness.</div></div>","PeriodicalId":16431,"journal":{"name":"Journal of Multivariate Analysis","volume":"212 ","pages":"Article 105571"},"PeriodicalIF":1.4,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145681883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Multivariate Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀