Journal of Statistical Planning and Inference最新文献

英文中文

Maximum Projection Gini Correlation (MaGiC) for mixed categorical and numerical data

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-04-24 DOI: 10.1016/j.jspi.2025.106294

Hong Xiao , Radhakrishna Adhikari , Yixin Chen , Xin Dang

We propose a projection correlation for measure of dependence between numerical multivariate variables and categorical variables. The projection correlation, defined as the maximum of the Gini correlations (i.e., MaGiC) between the categorical variable and the univariate projections of the multivariate vector, is non-parametric, and intuitively produces a high coefficient when the two variables are dependent, and zero when they are independent. We show that MaGiC possesses the property of nestedness, in that it is non-decreasing with the increasing number of features in the numerical vector, while remaining unchanged if additional numerical features are independent of the categorical variable and original features. We establish

\sqrt{n}

-consistency of the sample projection correlation. A powerful

K

-sample test can be carried out via the MaGiC-based independence test. When compared with related correlation definitions for multivariate variables, MaGiC also enjoys a faster implementation, with the computational complexity

O (m n (d + log n))

where

d

is the dimension of the numerical variable,

n

is the sample size, and

m

is the number of projections performed, as opposed to

O (d n^{2})

for Gini correlation. We demonstrate these properties through simulation and application to real datasets.

{"title":"Maximum Projection Gini Correlation (MaGiC) for mixed categorical and numerical data","authors":"Hong Xiao , Radhakrishna Adhikari , Yixin Chen , Xin Dang","doi":"10.1016/j.jspi.2025.106294","DOIUrl":"10.1016/j.jspi.2025.106294","url":null,"abstract":"<div><div>We propose a projection correlation for measure of dependence between numerical multivariate variables and categorical variables. The projection correlation, defined as the maximum of the Gini correlations (i.e., MaGiC) between the categorical variable and the univariate projections of the multivariate vector, is non-parametric, and intuitively produces a high coefficient when the two variables are dependent, and zero when they are independent. We show that MaGiC possesses the property of nestedness, in that it is non-decreasing with the increasing number of features in the numerical vector, while remaining unchanged if additional numerical features are independent of the categorical variable and original features. We establish <span><math><msqrt><mrow><mi>n</mi></mrow></msqrt></math></span>-consistency of the sample projection correlation. A powerful <span><math><mi>K</mi></math></span>-sample test can be carried out via the MaGiC-based independence test. When compared with related correlation definitions for multivariate variables, MaGiC also enjoys a faster implementation, with the computational complexity <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>m</mi><mi>n</mi><mrow><mo>(</mo><mi>d</mi><mo>+</mo><mo>log</mo><mi>n</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math></span> where <span><math><mi>d</mi></math></span> is the dimension of the numerical variable, <span><math><mi>n</mi></math></span> is the sample size, and <span><math><mi>m</mi></math></span> is the number of projections performed, as opposed to <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>d</mi><mspace></mspace><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> for Gini correlation. We demonstrate these properties through simulation and application to real datasets.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"239 ","pages":"Article 106294"},"PeriodicalIF":0.8,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neighborhood VAR: Efficient estimation of multivariate timeseries with neighborhood information

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-03-31 DOI: 10.1016/j.jspi.2025.106277

Zhihao Hu , Shyam Ranganathan , Yang Shao , Xinwei Deng

Vector autoregression (VAR) models are popular in modeling multivariate time series in data sciences and other areas. When the number of time series is large, the number of parameters in the VAR model increases dramatically, posing great challenges for proper model estimation and inference. In this work, we propose a so-called neighborhood vector autoregression (NVAR) model to efficiently analyze large-dimensional multivariate time series. We assume that the time series have underlying neighborhood relationships, e.g., spatial or network, among them based on the inherent setting of the problem. When this neighborhood information is available or can be summarized using a distance matrix, we demonstrate that our proposed NVAR method provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real-data application in environmental science.

引用次数: 0

Inference on linear quantile regression with dyadic data

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-03-26 DOI: 10.1016/j.jspi.2025.106292

Hongqi Chen

This paper focuses on developing a robust inference procedure for the linear quantile regression estimator in the context of dyadic data structures. We investigate the asymptotic distribution of the quantile regression estimator under dependency structures arising from shared nodes in both undirected and directed networks. We establish consistency results for the covariance matrix estimator and provide asymptotic distributions for the associated

t

-statistic and Wald statistic, particularly in both univariate and joint hypothesis testing scenarios. To showcase the effectiveness of our proposed method, we present numerical simulations and an empirical application using international trade data. Our results demonstrate the excellent performance of the robust

t

-statistic and Wald statistic in quantile regression inference with dyadic data.

引用次数: 0

Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-03-19 DOI: 10.1016/j.jspi.2025.106291

Michael Kohler , Adam Krzyżak , Benjamin Walter

Image classification based on over-parametrized convolutional neural networks with a global average-pooling layer is considered. The weights of the network are learned by gradient descent. A bound on the rate of convergence of the difference between the misclassification risk of the newly introduced convolutional neural network estimate and the minimal possible value is derived.

引用次数: 0

On misspecification in cusp-type change-point models

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-03-13 DOI: 10.1016/j.jspi.2025.106290

O.V. Chernoyarov , S. Dachian , Yu.A. Kutoyants

The problem of parameter estimation by i.i.d. observations of an inhomogeneous Poisson process is considered in situation of misspecification. The model is that of a Poissonian signal observed in presence of a homogeneous Poissonian noise. The intensity function of the process is supposed to have a cusp-type singularity at the change-point (the unknown moment of arrival of the signal), while the supposed (theoretical) and the real (observed) levels of the signal are different. The asymptotic properties of the (pseudo) MLE are described. It is shown that the estimator converges to the value minimizing the Kullback–Leibler divergence, that the normalized error of estimation converges to some limit distribution, and that its polynomial moments also converge.

引用次数: 0

Estimation and testing for varying-coefficient single-index quantile regression models

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-03-11 DOI: 10.1016/j.jspi.2025.106289

Hui Ding , Mei Yao , Riquan Zhang , Zhenglong Zhang , Hanbing Zhu

In this paper we propose varying-coefficient single-index quantile regression models, which includes most existing quantile regression models. We adopt B-spline basis approximation for the estimation of nonparametric components and use the “delete-one-component” method to construct check loss function. Under some mild conditions, we establish asymptotic theory of the proposed estimators for both the parametric and nonparametric components. Moreover, we propose a rank score based test to examine whether the varying-coefficient functions are constant. The finite sample performance of the proposed estimation method is illustrated by simulation studies and an empirical analysis of two real datasets.

引用次数: 0

Fixed-budget optimal designs for multi-fidelity computer experiments

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-03-04 DOI: 10.1016/j.jspi.2025.106286

Gecheng Chen, Rui Tuo

This work focuses on the design of experiments of multi-fidelity computer experiments. We consider the autoregressive Gaussian process model proposed by Kennedy and O’Hagan (2000) and the optimal nested design that maximizes the prediction accuracy subject to a budget constraint. An approximate solution is identified through the idea of multi-level approximation and recent error bounds of Gaussian process regression. The proposed (approximately) optimal designs admit a simple analytical form. We prove that, to achieve the same prediction accuracy, the proposed optimal multi-fidelity design requires much lower computational cost than any single-fidelity design in the asymptotic sense. Numerical studies confirm this theoretical assertion.

引用次数: 0

Nonparametric regression with predictors missing at random and the scale depending on auxiliary covariates

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-03-01 DOI: 10.1016/j.jspi.2025.106278

Tian Jiang

Nonparametric regression with missing at random (MAR) predictors, univariate regression component of interest, and the scale function depending on both the predictor and auxiliary covariates, is considered. The asymptotic theory suggests that both heteroscedasticity and MAR mechanism affect the sharp constant of the minimax mean integrated squared error (MISE) convergence. We propose a data-driven procedure adaptive to the missing mechanism and unknown smoothness of the estimated regression function. The estimator preserves the optimal convergence rate and can achieve sharp minimaxity when predictors are missing completely at random (MCAR).

引用次数: 0

Uniformly asymptotic normality of estimation of the drift function for diffusion processes

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-02-21 DOI: 10.1016/j.jspi.2025.106274

Shanchao Yang , Qi Lan , Xueyan Xu , Zhu Liang

The diffusion process is widely used in finance, and many scholars pay close attention to the statistical estimation of diffusion processes. Some literature has discussed the non parametric kernel estimation of drift and diffusion functions, and proved the consistency and asymptotic normality of the estimators, but the convergence rate of asymptotic normality has not been discussed yet. In this paper, we derive the convergence rate of uniformly asymptotic normality of the drift function estimator by using the method of large and small blocks for stationary and

ρ

-mixing diffusion process. In the case of optimal bandwidth, the rate of uniformly asymptotic normality reaches

n^{- 2 / 15}

. In order to prove the results, we put forward some inequalities for mixing processes with variable sampling interval, which play a key role in the study of limit theory.

引用次数: 0

Fixed values versus empirical quantiles as thresholds in excess distribution modelling

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2025-02-08 DOI: 10.1016/j.jspi.2025.106276

Daniel Gaigall , Julian Gerstenberg

Conditional excess distribution modelling is a widely used technique, in financial and insurance mathematics or survival analysis, for instance. Classical theory considers the thresholds as fixed values. In contrast, the use of empirical quantiles as thresholds offers advantages with respect to the design of the statistical experiment. Either way, the modeller is in a non-standard situation and runs in the risk of improper usage of statistical procedures. From both points of view, statistical planning and inference, a detailed discussion is requested. For this purpose, we treat both methods and demonstrate the necessity taking into account the characteristics of the approaches in practice. In detail, we derive general statements for empirical processes related to the conditional excess distribution in both situations. As examples, estimating the mean excess and the conditional Value-at-Risk are given. We apply our findings for the testing problems of goodness-of-fit and homogeneity for the conditional excess distribution and obtain new results of outstanding interest.

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Statistical Planning and Inference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀