首页 > 最新文献

Journal of Statistical Planning and Inference最新文献

英文 中文
Distributed optimal subsampling for quantile regression with massive data 海量数据量化回归的分布式最优子采样
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-18 DOI: 10.1016/j.jspi.2024.106186
Yue Chao, Xuejun Ma, Boya Zhu

Methods for reducing distributed subsample sizes have increasingly become popular statistical problems in the big data era. Existing works of optimal subsample selection on the massive linear and generalized linear models with distributed data sources have been solidly investigated and widely applied. Nevertheless, few studies have developed distributed optimal subsample selection procedures for quantile regression in massive data. In such settings, the distributed optimal subsampling probabilities and subset sizes selection criteria need to be established simultaneously. In this work, we propose a distributed subsampling technique for the quantile regression models. The estimation approach is based on a two-step algorithm for the distributed subsampling procedures. Furthermore, the theoretical results, such as consistency and asymptotic normality of resultant estimators, are rigorously established under some regularity conditions. The empirical evaluation and performance of the proposed subsampling method are conducted in simulation experiments and real data applications.

减少分布式子样本规模的方法日益成为大数据时代的热门统计问题。关于分布式数据源的海量线性模型和广义线性模型的最优子样本选择的现有工作已经得到了扎实的研究和广泛的应用。然而,很少有研究为海量数据中的量化回归开发分布式最优子样本选择程序。在这种情况下,需要同时建立分布式最优子样本概率和子集大小选择标准。在这项工作中,我们提出了一种用于量化回归模型的分布式子采样技术。该估计方法基于分布式子采样程序的两步算法。此外,我们还在一些正则条件下严格地建立了理论结果,如结果估计子的一致性和渐近正态性。在模拟实验和实际数据应用中,对所提出的子抽样方法进行了实证评估并考察了其性能。
{"title":"Distributed optimal subsampling for quantile regression with massive data","authors":"Yue Chao,&nbsp;Xuejun Ma,&nbsp;Boya Zhu","doi":"10.1016/j.jspi.2024.106186","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106186","url":null,"abstract":"<div><p>Methods for reducing distributed subsample sizes have increasingly become popular statistical problems in the big data era. Existing works of optimal subsample selection on the massive linear and generalized linear models with distributed data sources have been solidly investigated and widely applied. Nevertheless, few studies have developed distributed optimal subsample selection procedures for quantile regression in massive data. In such settings, the distributed optimal subsampling probabilities and subset sizes selection criteria need to be established simultaneously. In this work, we propose a distributed subsampling technique for the quantile regression models. The estimation approach is based on a two-step algorithm for the distributed subsampling procedures. Furthermore, the theoretical results, such as consistency and asymptotic normality of resultant estimators, are rigorously established under some regularity conditions. The empirical evaluation and performance of the proposed subsampling method are conducted in simulation experiments and real data applications.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140638708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entropic regularization of neural networks: Self-similar approximations 神经网络的熵正则化:自相似近似
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-16 DOI: 10.1016/j.jspi.2024.106181
Amir R. Asadi, Po-Ling Loh

This paper focuses on entropic regularization and its multiscale extension in neural network learning. We leverage established results that characterize the optimizer of entropic regularization methods and their connection with generalization bounds. To avoid the significant computational complexity involved in sampling from the optimal multiscale Gibbs distributions, we describe how to make measured concessions in optimality by using self-similar approximating distributions. We study such scale-invariant approximations for linear neural networks and further extend the approximations to neural networks with nonlinear activation functions. We then illustrate the application of our proposed approach through empirical simulation. By navigating the interplay between optimization and computational efficiency, our research contributes to entropic regularization theory, proposing a practical method that embraces symmetry across scales.

本文重点研究神经网络学习中的熵正则化及其多尺度扩展。我们利用已有的结果来描述熵正则化方法的优化器及其与泛化边界的联系。为了避免从最优多尺度吉布斯分布中采样所带来的巨大计算复杂性,我们介绍了如何通过使用自相似近似分布,在最优性方面做出一定程度的让步。我们研究了线性神经网络的规模不变近似,并进一步将近似扩展到具有非线性激活函数的神经网络。然后,我们通过实证模拟来说明我们提出的方法的应用。通过在优化和计算效率之间的相互作用,我们的研究为熵正则化理论做出了贡献,提出了一种跨尺度对称的实用方法。
{"title":"Entropic regularization of neural networks: Self-similar approximations","authors":"Amir R. Asadi,&nbsp;Po-Ling Loh","doi":"10.1016/j.jspi.2024.106181","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106181","url":null,"abstract":"<div><p>This paper focuses on entropic regularization and its multiscale extension in neural network learning. We leverage established results that characterize the optimizer of entropic regularization methods and their connection with generalization bounds. To avoid the significant computational complexity involved in sampling from the optimal multiscale Gibbs distributions, we describe how to make measured concessions in optimality by using self-similar approximating distributions. We study such scale-invariant approximations for linear neural networks and further extend the approximations to neural networks with nonlinear activation functions. We then illustrate the application of our proposed approach through empirical simulation. By navigating the interplay between optimization and computational efficiency, our research contributes to entropic regularization theory, proposing a practical method that embraces symmetry across scales.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000387/pdfft?md5=fcc1f48fea9b9d957df56a1c168f3f74&pid=1-s2.0-S0378375824000387-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140643824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiplier subsample bootstrap for statistics of time series 用于时间序列统计的乘数子样本自举法
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-15 DOI: 10.1016/j.jspi.2024.106183
Ruru Ma, Shibin Zhang

Block-based bootstrap, block-based subsampling and multiplier bootstrap are three common nonparametric tools for statistical inference under dependent observations. Combining the ideas of those three, a novel resampling approach, the multiplier subsample bootstrap (MSB), is proposed. Instead of generating a resample from the observations, the MSB imitates the statistic by weighting the block-based subsample statistics with independent standard Gaussian random variables. Given the asymptotic normality of the statistic, the bootstrap validity is established under some mild moment conditions. Involving the idea of MSB, the other resampling approach, the hybrid multiplier subsampling periodogram bootstrap (HMP), is developed for mimicking frequency-domain spectral mean statistics in the paper. A simulation study demonstrates that both the MSB and HMP achieve good performance.

基于块的自举法、基于块的子采样法和乘数自举法是在依赖观测条件下进行统计推断的三种常见的非参数工具。结合这三种方法的思想,我们提出了一种新颖的重采样方法--乘数子样本自举法(MSB)。MSB 不是从观测数据中生成重采样,而是通过用独立的标准高斯随机变量对基于块的子样本统计量进行加权来模仿统计量。考虑到统计量的渐近正态性,在一些温和的矩条件下建立了引导有效性。结合 MSB 的思想,本文提出了另一种重采样方法,即混合乘法子采样周期图引导法(HMP),用于模拟频域频谱均值统计。仿真研究表明,MSB 和 HMP 都取得了良好的性能。
{"title":"Multiplier subsample bootstrap for statistics of time series","authors":"Ruru Ma,&nbsp;Shibin Zhang","doi":"10.1016/j.jspi.2024.106183","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106183","url":null,"abstract":"<div><p>Block-based bootstrap, block-based subsampling and multiplier bootstrap are three common nonparametric tools for statistical inference under dependent observations. Combining the ideas of those three, a novel resampling approach, the multiplier subsample bootstrap (MSB), is proposed. Instead of generating a resample from the observations, the MSB imitates the statistic by weighting the block-based subsample statistics with independent standard Gaussian random variables. Given the asymptotic normality of the statistic, the bootstrap validity is established under some mild moment conditions. Involving the idea of MSB, the other resampling approach, the hybrid multiplier subsampling periodogram bootstrap (HMP), is developed for mimicking frequency-domain spectral mean statistics in the paper. A simulation study demonstrates that both the MSB and HMP achieve good performance.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140607310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks 为各种深度-2 神经网络推导小岭变换的统一傅立叶切片法
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-15 DOI: 10.1016/j.jspi.2024.106184
Sho Sonoda , Isao Ishikawa , Masahiro Ikeda

To investigate neural network parameters, it is easier to study the distribution of parameters than to study the parameters in each neuron. The ridgelet transform is a pseudo-inverse operator that maps a given function f to the parameter distribution γ so that a network NN[γ] reproduces f, i.e. NN[γ]=f. For depth-2 fully-connected networks on a Euclidean space, the ridgelet transform has been discovered up to the closed-form expression, thus we could describe how the parameters are distributed. However, for a variety of modern neural network architectures, the closed-form expression has not been known. In this paper, we explain a systematic method using Fourier expressions to derive ridgelet transforms for a variety of modern networks such as networks on finite fields Fp, group convolutional networks on abstract Hilbert space H, fully-connected networks on noncompact symmetric spaces G/K, and pooling layers, or the d-plane ridgelet transform.

要研究神经网络参数,研究参数分布比研究每个神经元的参数更容易。ridgelet 变换是一个伪逆变换算子,它能将给定函数 f 映射到参数分布 γ 上,从而使网络 NN[γ] 重现 f,即 NN[γ]=f。对于欧几里得空间上的深度-2 全连接网络,我们已经发现了小岭变换的闭式表达,因此可以描述参数是如何分布的。然而,对于各种现代神经网络架构,我们还不知道其闭式表达。在本文中,我们解释了一种使用傅立叶表达式的系统方法,以推导出各种现代网络的小岭变换,如有限场 Fp 上的网络、抽象希尔伯特空间 H 上的群卷积网络、非紧凑对称空间 G/K 上的全连接网络以及池化层或 d 平面小岭变换。
{"title":"A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks","authors":"Sho Sonoda ,&nbsp;Isao Ishikawa ,&nbsp;Masahiro Ikeda","doi":"10.1016/j.jspi.2024.106184","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106184","url":null,"abstract":"<div><p>To investigate neural network parameters, it is easier to study the distribution of parameters than to study the parameters in each neuron. The ridgelet transform is a pseudo-inverse operator that maps a given function <span><math><mi>f</mi></math></span> to the parameter distribution <span><math><mi>γ</mi></math></span> so that a network <span><math><mrow><mstyle><mi>N</mi><mi>N</mi></mstyle><mrow><mo>[</mo><mi>γ</mi><mo>]</mo></mrow></mrow></math></span> reproduces <span><math><mi>f</mi></math></span>, i.e. <span><math><mrow><mstyle><mi>N</mi><mi>N</mi></mstyle><mrow><mo>[</mo><mi>γ</mi><mo>]</mo></mrow><mo>=</mo><mi>f</mi></mrow></math></span>. For depth-2 fully-connected networks on a Euclidean space, the ridgelet transform has been discovered up to the closed-form expression, thus we could describe how the parameters are distributed. However, for a variety of modern neural network architectures, the closed-form expression has not been known. In this paper, we explain a systematic method using Fourier expressions to derive ridgelet transforms for a variety of modern networks such as networks on finite fields <span><math><msub><mrow><mi>F</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span>, group convolutional networks on abstract Hilbert space <span><math><mi>H</mi></math></span>, fully-connected networks on noncompact symmetric spaces <span><math><mrow><mi>G</mi><mo>/</mo><mi>K</mi></mrow></math></span>, and pooling layers, or the <span><math><mi>d</mi></math></span>-plane ridgelet transform.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000417/pdfft?md5=98e3c89ff86925f67f13c56d174f0109&pid=1-s2.0-S0378375824000417-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140618803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust nonparametric regression based on deep ReLU neural networks 基于深度 ReLU 神经网络的鲁棒非参数回归
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-15 DOI: 10.1016/j.jspi.2024.106182
Juntong Chen

In this paper, we consider robust nonparametric regression using deep neural networks with ReLU activation function. While several existing theoretically justified methods are geared towards robustness against identical heavy-tailed noise distributions, the rise of adversarial attacks has emphasized the importance of safeguarding estimation procedures against systematic contamination. We approach this statistical issue by shifting our focus towards estimating conditional distributions. To address it robustly, we introduce a novel estimation procedure based on -estimation. Under a mild model assumption, we establish general non-asymptotic risk bounds for the resulting estimators, showcasing their robustness against contamination, outliers, and model misspecification. We then delve into the application of our approach using deep ReLU neural networks. When the model is well-specified and the regression function belongs to an α-Hölder class, employing -type estimation on suitable networks enables the resulting estimators to achieve the minimax optimal rate of convergence. Additionally, we demonstrate that deep -type estimators can circumvent the curse of dimensionality by assuming the regression function closely resembles the composition of several Hölder functions. To attain this, new deep fully-connected ReLU neural networks have been designed to approximate this composition class. This approximation result can be of independent interest.

在本文中,我们考虑使用具有 ReLU 激活函数的深度神经网络进行稳健的非参数回归。虽然现有的几种理论上合理的方法都是针对相同重尾噪声分布的鲁棒性,但对抗性攻击的兴起强调了保护估计程序免受系统性污染的重要性。我们通过将重点转向条件分布的估计来解决这一统计问题。为了稳健地解决这个问题,我们引入了一种基于 ℓ 估计的新型估计程序。在温和的模型假设下,我们为所得到的估计值建立了一般的非渐近风险边界,展示了它们对污染、异常值和模型错误规范的稳健性。然后,我们利用深度 ReLU 神经网络深入研究了我们方法的应用。当模型指定良好且回归函数属于 α-Hölder 类时,在合适的网络上采用 ℓ 型估计能使得到的估计器达到最小最优收敛率。此外,我们还证明了深度ℓ 型估计器可以通过假设回归函数与多个霍尔德函数的组成非常相似来规避维度诅咒。为了实现这一目标,我们设计了新的深度全连接 ReLU 神经网络来逼近这一组成类别。这一近似结果具有独立的意义。
{"title":"Robust nonparametric regression based on deep ReLU neural networks","authors":"Juntong Chen","doi":"10.1016/j.jspi.2024.106182","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106182","url":null,"abstract":"<div><p>In this paper, we consider robust nonparametric regression using deep neural networks with ReLU activation function. While several existing theoretically justified methods are geared towards robustness against identical heavy-tailed noise distributions, the rise of adversarial attacks has emphasized the importance of safeguarding estimation procedures against systematic contamination. We approach this statistical issue by shifting our focus towards estimating conditional distributions. To address it robustly, we introduce a novel estimation procedure based on <span><math><mi>ℓ</mi></math></span>-estimation. Under a mild model assumption, we establish general non-asymptotic risk bounds for the resulting estimators, showcasing their robustness against contamination, outliers, and model misspecification. We then delve into the application of our approach using deep ReLU neural networks. When the model is well-specified and the regression function belongs to an <span><math><mi>α</mi></math></span>-Hölder class, employing <span><math><mi>ℓ</mi></math></span>-type estimation on suitable networks enables the resulting estimators to achieve the minimax optimal rate of convergence. Additionally, we demonstrate that deep <span><math><mi>ℓ</mi></math></span>-type estimators can circumvent the curse of dimensionality by assuming the regression function closely resembles the composition of several Hölder functions. To attain this, new deep fully-connected ReLU neural networks have been designed to approximate this composition class. This approximation result can be of independent interest.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000399/pdfft?md5=79a5bc36ebe3d6024d39b9f8adf1f910&pid=1-s2.0-S0378375824000399-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140649412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence guarantees for forward gradient descent in the linear regression model 线性回归模型中前向梯度下降的收敛保证
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-06 DOI: 10.1016/j.jspi.2024.106174
Thijs Bos , Johannes Schmidt-Hieber

Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If d denotes the number of parameters and k the number of samples, we prove that the mean squared error of this method converges for kd2log(d) with rate d2log(d)/k. Compared to the dimension dependence d for stochastic gradient descent, an additional factor dlog(d) occurs.

人们对人工神经网络和生物神经网络之间关系的兴趣再次激发了对无梯度方法的研究。考虑到随机设计的线性回归模型,我们在本研究中从理论上分析了基于梯度随机线性组合的生物(权重扰动)前向梯度方案。如果 d 表示参数个数,k 表示样本个数,我们证明这种方法的均方误差在 k≳d2log(d) 条件下以 d2log(d)/k 的速率收敛。与随机梯度下降法的维度依赖性 d 相比,多了一个系数 dlog(d)。
{"title":"Convergence guarantees for forward gradient descent in the linear regression model","authors":"Thijs Bos ,&nbsp;Johannes Schmidt-Hieber","doi":"10.1016/j.jspi.2024.106174","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106174","url":null,"abstract":"<div><p>Renewed interest in the relationship between artificial and biological neural networks motivates the study of gradient-free methods. Considering the linear regression model with random design, we theoretically analyze in this work the biologically motivated (weight-perturbed) forward gradient scheme that is based on random linear combination of the gradient. If <span><math><mi>d</mi></math></span> denotes the number of parameters and <span><math><mi>k</mi></math></span> the number of samples, we prove that the mean squared error of this method converges for <span><math><mrow><mi>k</mi><mo>≳</mo><msup><mrow><mi>d</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>log</mo><mrow><mo>(</mo><mi>d</mi><mo>)</mo></mrow></mrow></math></span> with rate <span><math><mrow><msup><mrow><mi>d</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>log</mo><mrow><mo>(</mo><mi>d</mi><mo>)</mo></mrow><mo>/</mo><mi>k</mi></mrow></math></span>. Compared to the dimension dependence <span><math><mi>d</mi></math></span> for stochastic gradient descent, an additional factor <span><math><mrow><mi>d</mi><mo>log</mo><mrow><mo>(</mo><mi>d</mi><mo>)</mo></mrow></mrow></math></span> occurs.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000314/pdfft?md5=fc5918288c472da3301b467d899078ad&pid=1-s2.0-S0378375824000314-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140536571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward improved inference for Krippendorff’s Alpha agreement coefficient 改进克里彭多夫阿尔法一致系数的推断方法
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-05 DOI: 10.1016/j.jspi.2024.106170
John Hughes

In this article I recommend a better point estimator for Krippendorff’s Alpha agreement coefficient, and develop a jackknife variance estimator that leads to much better interval estimation than does the customary bootstrap procedure or an alternative bootstrap procedure. Having developed the new methodology, I analyze nominal data previously analyzed by Krippendorff, and two experimentally observed datasets: (1) ordinal data from an imaging study of congenital diaphragmatic hernia, and (2) United States Environmental Protection Agency air pollution data for the Philadelphia, Pennsylvania area. The latter two applications are novel. The proposed methodology is now supported in version 2.0 of my open source R package, krippendorffsalpha, which supports common and user-defined distance functions, and can accommodate any number of units, any number of coders, and missingness. Interval computation can be parallelized.

在这篇文章中,我为克里彭多夫的阿尔法一致系数推荐了一个更好的点估计器,并开发了一个杰克刀方差估计器,它能比习惯的自举程序或替代自举程序带来更好的区间估计。在开发出新方法后,我分析了克里彭多夫之前分析过的名义数据,以及两个实验观察数据集:(1) 来自先天性膈疝成像研究的序数数据,以及 (2) 美国环境保护局提供的宾夕法尼亚州费城地区空气污染数据。后两个应用都很新颖。现在,我的开源 R 软件包 krippendorffsalpha 的 2.0 版本支持所提出的方法,该软件包支持常见的和用户定义的距离函数,并能容纳任意数量的单位、任意数量的编码器和缺失。区间计算可以并行化。
{"title":"Toward improved inference for Krippendorff’s Alpha agreement coefficient","authors":"John Hughes","doi":"10.1016/j.jspi.2024.106170","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106170","url":null,"abstract":"<div><p>In this article I recommend a better point estimator for Krippendorff’s Alpha agreement coefficient, and develop a jackknife variance estimator that leads to much better interval estimation than does the customary bootstrap procedure or an alternative bootstrap procedure. Having developed the new methodology, I analyze nominal data previously analyzed by Krippendorff, and two experimentally observed datasets: (1) ordinal data from an imaging study of congenital diaphragmatic hernia, and (2) United States Environmental Protection Agency air pollution data for the Philadelphia, Pennsylvania area. The latter two applications are novel. The proposed methodology is now supported in version 2.0 of my open source R package, <span>krippendorffsalpha</span>, which supports common and user-defined distance functions, and can accommodate any number of units, any number of coders, and missingness. Interval computation can be parallelized.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140549711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Informed censoring: The parametric combination of data and expert information 知情剔除:数据和专家信息的参数组合
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-05 DOI: 10.1016/j.jspi.2024.106171
Hansjörg Albrecher , Martin Bladt

The statistical censoring setup is extended to the situation when random measures can be assigned to the realization of datapoints, leading to a new way of incorporating expert information into the usual parametric estimation procedures. The asymptotic theory is provided for the resulting estimators, and some special cases of practical relevance are studied in more detail. Although the proposed framework mathematically generalizes censoring and coarsening at random, and borrows techniques from M-estimation theory, it provides a novel and transparent methodology which enjoys significant practical applicability in situations where expert information is present. The potential of the approach is illustrated by a concrete actuarial application of tail parameter estimation for a heavy-tailed MTPL dataset with limited available expert information.

统计剔除设置被扩展到可以为数据点的实现分配随机度量的情况,从而为将专家信息纳入通常的参数估计程序提供了一种新方法。我们为由此产生的估计器提供了渐近理论,并对一些具有实际意义的特殊情况进行了更详细的研究。尽管所提出的框架在数学上概括了随机普查和粗化,并借鉴了 M 估计理论的技术,但它提供了一种新颖、透明的方法,在存在专家信息的情况下具有重要的实际应用价值。通过对重尾 MTPL 数据集尾部参数估计的具体精算应用,在专家信息有限的情况下,说明了该方法的潜力。
{"title":"Informed censoring: The parametric combination of data and expert information","authors":"Hansjörg Albrecher ,&nbsp;Martin Bladt","doi":"10.1016/j.jspi.2024.106171","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106171","url":null,"abstract":"<div><p>The statistical censoring setup is extended to the situation when random measures can be assigned to the realization of datapoints, leading to a new way of incorporating expert information into the usual parametric estimation procedures. The asymptotic theory is provided for the resulting estimators, and some special cases of practical relevance are studied in more detail. Although the proposed framework mathematically generalizes censoring and coarsening at random, and borrows techniques from M-estimation theory, it provides a novel and transparent methodology which enjoys significant practical applicability in situations where expert information is present. The potential of the approach is illustrated by a concrete actuarial application of tail parameter estimation for a heavy-tailed MTPL dataset with limited available expert information.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000284/pdfft?md5=89a65e4806020bf82eea7d220ec50689&pid=1-s2.0-S0378375824000284-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140536572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-asymptotic model selection for models of network data with parameter vectors of increasing dimension 参数向量维度不断增加的网络数据模型的非渐近模型选择
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-05 DOI: 10.1016/j.jspi.2024.106173
Sean Eli , Michael Schweinberger

Model selection for network data is an open area of research. Using the β-model as a convenient starting point, we propose a simple and non-asymptotic approach to model selection of β-models with and without constraints. Simulations indicate that the proposed model selection approach selects the data-generating model with high probability, in contrast to classical and extended Bayesian Information Criteria. We conclude with an application to the Enron email network, which has 181,831 connections among 36,692 employees.

网络数据的模型选择是一个开放的研究领域。我们将 β 模型作为一个方便的起点,提出了一种简单、非渐进的方法来选择有约束和无约束的 β 模型。模拟结果表明,与经典贝叶斯信息标准和扩展贝叶斯信息标准相比,所提出的模型选择方法能高概率地选择数据生成模型。最后,我们将应用于安然公司的电子邮件网络,该网络在 36,692 名员工中拥有 181,831 个连接。
{"title":"Non-asymptotic model selection for models of network data with parameter vectors of increasing dimension","authors":"Sean Eli ,&nbsp;Michael Schweinberger","doi":"10.1016/j.jspi.2024.106173","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106173","url":null,"abstract":"<div><p>Model selection for network data is an open area of research. Using the <span><math><mi>β</mi></math></span>-model as a convenient starting point, we propose a simple and non-asymptotic approach to model selection of <span><math><mi>β</mi></math></span>-models with and without constraints. Simulations indicate that the proposed model selection approach selects the data-generating model with high probability, in contrast to classical and extended Bayesian Information Criteria. We conclude with an application to the Enron email network, which has 181,831 connections among 36,692 employees.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140536570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hermite regression estimation in noisy convolution model 噪声卷积模型中的赫米特回归估计
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-03-26 DOI: 10.1016/j.jspi.2024.106168
Ousmane Sacko

In this paper, we consider the following regression model: y(kT/n)=fg(kT/n)+ɛk,k=n,,n1, T fixed, where g is known and f is the unknown function to be estimated. The errors (ɛk)nkn1 are independent and identically distributed centered with finite known variance. Two adaptive estimation methods for f are considered by exploiting the properties of the Hermite basis. We study the quadratic risk of each estimator. If f belongs to Sobolev regularity spaces, we derive rates of convergence. Adaptive procedures to select the relevant parameter inspired by the Goldenshluger and Lepski method are proposed and we prove that the resulting estimators satisfy oracle inequalities for sub-Gaussian ɛ’s. Finally, we illustrate numerically these approaches.

本文考虑以下回归模型:y(kT/n)=f⋆g(kT/n)+ɛk,k=-n,...,n-1, T 固定,其中 g 为已知函数,f 为待估计的未知函数。误差 (ɛk)-n≤k≤n-1 是独立且同分布的中心误差,具有有限的已知方差。利用赫米特基的特性,我们考虑了 f 的两种自适应估计方法。我们研究了每种估计方法的二次风险。如果 f 属于 Sobolev 正则空间,我们将得出收敛率。受 Goldenshluger 和 Lepski 方法的启发,我们提出了选择相关参数的自适应程序,并证明所得到的估计器满足亚高斯ɛ的oracle 不等式。最后,我们用数字说明了这些方法。
{"title":"Hermite regression estimation in noisy convolution model","authors":"Ousmane Sacko","doi":"10.1016/j.jspi.2024.106168","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106168","url":null,"abstract":"<div><p>In this paper, we consider the following regression model: <span><math><mrow><mi>y</mi><mrow><mo>(</mo><mi>k</mi><mi>T</mi><mo>/</mo><mi>n</mi><mo>)</mo></mrow><mo>=</mo><mi>f</mi><mo>⋆</mo><mi>g</mi><mrow><mo>(</mo><mi>k</mi><mi>T</mi><mo>/</mo><mi>n</mi><mo>)</mo></mrow><mo>+</mo><msub><mrow><mi>ɛ</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>,</mo><mi>k</mi><mo>=</mo><mo>−</mo><mi>n</mi><mo>,</mo><mo>…</mo><mo>,</mo><mi>n</mi><mo>−</mo><mn>1</mn></mrow></math></span>, <span><math><mi>T</mi></math></span> fixed, where <span><math><mi>g</mi></math></span> is known and <span><math><mi>f</mi></math></span> is the unknown function to be estimated. The errors <span><math><msub><mrow><mrow><mo>(</mo><msub><mrow><mi>ɛ</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>)</mo></mrow></mrow><mrow><mo>−</mo><mi>n</mi><mo>≤</mo><mi>k</mi><mo>≤</mo><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msub></math></span> are independent and identically distributed centered with finite known variance. Two adaptive estimation methods for <span><math><mi>f</mi></math></span> are considered by exploiting the properties of the Hermite basis. We study the quadratic risk of each estimator. If <span><math><mi>f</mi></math></span> belongs to Sobolev regularity spaces, we derive rates of convergence. Adaptive procedures to select the relevant parameter inspired by the Goldenshluger and Lepski method are proposed and we prove that the resulting estimators satisfy oracle inequalities for sub-Gaussian <span><math><mi>ɛ</mi></math></span>’s. Finally, we illustrate numerically these approaches.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140350068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Statistical Planning and Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1