Journal of Statistical Planning and Inference最新文献

英文中文

A criterion for estimating the largest linear homoscedastic zone in Gaussian data 估计高斯数据中最大线性同余区的标准

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-08-06 DOI: 10.1016/j.jspi.2024.106223

Jean-Marc Bardet

A criterion is constructed to identify the largest homoscedastic region in a Gaussian dataset. This can be reduced to a one-sided non-parametric break detection, knowing that up to a certain index the output is governed by a linear homoscedastic model, while after this index it is different (e.g. a different model, different variables, different volatility, ….). We show the convergence of the estimator of this index, with asymptotic concentration inequalities that can be exponential. A criterion and convergence results are derived when the linear homoscedastic zone is bounded by two breaks on both sides. Additionally, a criterion for choosing between zero, one, or two breaks is proposed. Monte Carlo experiments will also confirm its very good numerical performance.

我们构建了一个标准来识别高斯数据集中最大的同方差区域。这可以简化为单边非参数断裂检测，即在某一指数之前，输出由线性同方差模型控制，而在该指数之后，输出则不同（例如，不同的模型、不同的变量、不同的波动率，....）。我们展示了该指数估计值的收敛性，其渐近集中不等式可能是指数型的。当线性同余区两侧有两个断点时，我们将得出一个标准和收敛结果。此外，还提出了在零、一或两个断点之间进行选择的标准。蒙特卡罗实验也将证实其非常好的数值性能。

引用次数: 0

Statistical inference from partially nominated sets: An application to estimating the prevalence of osteoporosis among adult women 从部分提名集进行统计推断：应用于估算成年女性骨质疏松症患病率

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-07-26 DOI: 10.1016/j.jspi.2024.106214

Zeinab Akbari Ghamsari , Ehsan Zamanzade , Majid Asadi

This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method.

本文的重点是基于最大值或最小值提名抽样（NS）设计的新型变体进行统计推断。这些抽样设计有助于利用现有的辅助排序信息，从总体分布的尾部获得更具代表性的样本单位。然而，在实践中执行提名抽样的一个常见困难是，除非研究人员唯一确定每组中排名最高或最低的样本单位，否则无法获得提名样本。为了克服这个问题，我们提出了 NS 的一种变体，即部分提名抽样，允许研究人员在找不到排名最高或最低的样本单位时，宣布两个或两个以上的单位排名并列。基于这种抽样设计，利用最大似然法和基于矩的方法为累积分布函数建立了两个渐近无偏估计器，并证明了它们的渐近正态性。几项数值研究表明，在分析母分布的上尾或下尾时，所提出的估计器比简单随机抽样中的同类估计器具有更高的相对效率。随后，我们在第三次全国健康与营养调查（NHANES III）的真实数据集上实施了所开发的程序，以估计 50 岁及以上成年女性的骨质疏松症患病率。结果表明，在某些情况下，我们开发的技术只需要 SRS 所需的样本量的三分之一就能达到预期精度。与标准 SRS 方法相比，这大大减少了时间和成本。

{"title":"Statistical inference from partially nominated sets: An application to estimating the prevalence of osteoporosis among adult women","authors":"Zeinab Akbari Ghamsari , Ehsan Zamanzade , Majid Asadi","doi":"10.1016/j.jspi.2024.106214","DOIUrl":"10.1016/j.jspi.2024.106214","url":null,"abstract":"<div><p>This paper focuses on drawing statistical inference based on a novel variant of maxima or minima nomination sampling (NS) designs. These sampling designs are useful for obtaining more representative sample units from the tails of the population distribution using the available auxiliary ranking information. However, one common difficulty in performing NS in practice is that the researcher cannot obtain a nominated sample unless he/she uniquely determines the sample unit with the highest or the lowest rank in each set. To overcome this problem, a variant of NS, which is called partial nomination sampling, is proposed, in which the researcher is allowed to declare that two or more units are tied in the ranks whenever he/she cannot find the sample unit with the highest or the lowest rank. Based on this sampling design, two asymptotically unbiased estimators are developed for the cumulative distribution function, which is obtained using maximum likelihood and moment-based approaches, and their asymptotic normalities are proved. Several numerical studies have shown that the proposed estimators have higher relative efficiencies than their counterparts in simple random sampling in analyzing either the upper or the lower tail of the parent distribution. The procedures that we developed are then implemented on a real dataset from the Third National Health and Nutrition Examination Survey (NHANES III) to estimate the prevalence of osteoporosis among adult women aged 50 and over. It is shown that in certain circumstances, the techniques that we have developed require only one-third of the sample size needed in SRS to achieve the desired precision. This results in a considerable reduction in time and cost compared to the standard SRS method.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"235 ","pages":"Article 106214"},"PeriodicalIF":0.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stable convergence of conditional least squares estimators for supercritical continuous state and continuous time branching processes with immigration 有移民的超临界连续状态和连续时间分支过程的条件最小二乘估计子的稳定收敛性

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-07-22 DOI: 10.1016/j.jspi.2024.106213

Mátyás Barczy

We prove stable convergence of conditional least squares estimators of drift parameters for supercritical continuous state and continuous time branching processes with immigration based on discrete time observations.

我们证明了超临界连续状态和连续时间分支过程的漂移参数条件最小二乘法估计值的稳定收敛性，并基于离散时间观测结果进行了移民。

引用次数: 0

Some clustering-based change-point detection methods applicable to high dimension, low sample size data 一些适用于高维度、低样本量数据的基于聚类的变化点检测方法

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-07-16 DOI: 10.1016/j.jspi.2024.106212

Trisha Dawn , Angshuman Roy , Alokesh Manna , Anil K. Ghosh

Detection of change-points in a sequence of high dimensional observations is a challenging problem, and this becomes even more challenging when the sample size (i.e., the sequence length) is small. In this article, we propose some change-point detection methods based on clustering, which can be conveniently used in such high dimension, low sample size situations. First, we consider the single change-point problem. Using $k$ -means clustering based on a suitable dissimilarity measures, we propose some methods for testing the existence of a change-point and estimating its location. High dimensional behavior of these proposed methods are investigated under appropriate regularity conditions. Next, we extend our methods for detection of multiple change-points. We carry out extensive numerical studies and analyze a real data set to compare the performance of our proposed methods with some state-of-the-art methods.

在高维观测序列中检测变化点是一个具有挑战性的问题，而当样本量（即序列长度）较小时，这个问题就变得更具挑战性。在本文中，我们提出了一些基于聚类的变化点检测方法，可以方便地用于这种高维度、低样本量的情况。首先，我们考虑单个变化点问题。利用基于合适的异或度量的均值聚类，我们提出了一些检测变化点是否存在并估计其位置的方法。在适当的正则条件下，我们对这些方法的高维行为进行了研究。接下来，我们扩展了检测多个变化点的方法。我们进行了大量的数值研究，并分析了一个真实数据集，将我们提出的方法与一些最先进的方法进行了性能比较。

引用次数: 0

Regression to the mean for overdispersed count data 过度分散计数数据的均值回归

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-07-05 DOI: 10.1016/j.jspi.2024.106211

Kiran Iftikhar , Manzoor Khan , Jake Olivier

In repeated measurements, regression to the mean (RTM) is a tendency of subjects with observed extreme values to move closer to the mean when measured a second time. Not accounting for RTM could lead to incorrect decisions such as when observed natural variation is incorrectly attributed to the effect of a treatment/intervention. A strategy for addressing RTM is to decompose the total effect, the expected difference in paired random variables conditional on the first being in the tail of its distribution, into regression to the mean and unbiased treatment effects. The unbiased treatment effect can then be estimated by subtraction. Formulae are available in the literature to quantify RTM for Poisson distributed data which are constrained by mean–variance equivalence, although there are many real life examples of overdispersed count data that are not well approximated by the Poisson. The negative binomial can be considered an explicit overdispersed Poisson process where the Poisson intensity is chosen from a gamma distribution. In this study, the truncated bivariate negative binomial distribution is used to decompose the total effect formulae into RTM and treatment effects. Maximum likelihood estimators (MLE) and method of moments estimators are developed for the total, RTM, and treatment effects. A simulation study is carried out to investigate the properties of the estimators and compare them with those developed under the assumption of the Poisson process. Data on the incidence of dengue cases reported from 2007 to 2017 are used to estimate the total, RTM, and treatment effects.

在重复测量中，均值回归（RTM）是指观察到极值的受试者在第二次测量时向均值靠拢的趋势。不考虑 RTM 可能会导致错误的决策，例如将观察到的自然变化错误地归因于治疗/干预的效果。处理 RTM 的一种策略是将总效应（即配对随机变量的预期差异，条件是第一个变量处于其分布的尾部）分解为回归均值效应和无偏治疗效应。然后通过减法估算无偏治疗效果。尽管现实生活中有许多过度分散的计数数据不能很好地用泊松来近似，但文献中仍有一些公式可以量化泊松分布数据的 RTM。负二项分布可视为一个明确的过分散泊松过程，其中泊松强度是从伽马分布中选择的。在本研究中，截断的二元负二项分布用于将总效应公式分解为 RTM 和治疗效应。为总效应、RTM 和治疗效应开发了最大似然估计器（MLE）和矩估计法。通过模拟研究调查了估计器的特性，并与在泊松过程假设下开发的估计器进行了比较。2007 年至 2017 年登革热病例报告的发病率数据用于估计总效应、RTM效应和治疗效应。

{"title":"Regression to the mean for overdispersed count data","authors":"Kiran Iftikhar , Manzoor Khan , Jake Olivier","doi":"10.1016/j.jspi.2024.106211","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106211","url":null,"abstract":"<div><p>In repeated measurements, regression to the mean (RTM) is a tendency of subjects with observed extreme values to move closer to the mean when measured a second time. Not accounting for RTM could lead to incorrect decisions such as when observed natural variation is incorrectly attributed to the effect of a treatment/intervention. A strategy for addressing RTM is to decompose the <em>total effect</em>, the expected difference in paired random variables conditional on the first being in the tail of its distribution, into regression to the mean and unbiased treatment effects. The unbiased treatment effect can then be estimated by subtraction. Formulae are available in the literature to quantify RTM for Poisson distributed data which are constrained by mean–variance equivalence, although there are many real life examples of overdispersed count data that are not well approximated by the Poisson. The negative binomial can be considered an explicit overdispersed Poisson process where the Poisson intensity is chosen from a gamma distribution. In this study, the truncated bivariate negative binomial distribution is used to decompose the total effect formulae into RTM and treatment effects. Maximum likelihood estimators (MLE) and method of moments estimators are developed for the total, RTM, and treatment effects. A simulation study is carried out to investigate the properties of the estimators and compare them with those developed under the assumption of the Poisson process. Data on the incidence of dengue cases reported from 2007 to 2017 are used to estimate the total, RTM, and treatment effects.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106211"},"PeriodicalIF":0.8,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Oracle-efficient estimation and global inferences for variance function of functional data 函数数据方差函数的 Oracle 高效估计和全局推断

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-07-04 DOI: 10.1016/j.jspi.2024.106210

Li Cai , Suojin Wang

A new two-step reconstruction-based moment estimator and an asymptotically correct smooth simultaneous confidence band as a global inference tool are proposed for the heteroscedastic variance function of dense functional data. Step one involves spline smoothing for individual trajectory reconstructions and step two employs kernel regression on the individual squared residuals to estimate each trajectory variability. Then by the method of moment an estimator for the variance function of functional data is constructed. The estimation procedure is innovative by synthesizing spline smoothing and kernel regression together, which allows one not only to apply the fast computing speed of spline regression but also to employ the flexible local estimation and the extreme value theory of kernel smoothing. The resulting estimator for the variance function is shown to be oracle-efficient in the sense that it is uniformly as efficient as the ideal estimator when all trajectories were known by “oracle”. As a result, an asymptotically correct simultaneous confidence band for the variance function is established. Simulation results support our asymptotic theory with fast computation. As an illustration, the proposed method is applied to the analyses of two real data sets leading to a number of discoveries.

针对密集函数数据的异方差函数，提出了一种新的基于两步重构的矩估计器和一种渐近正确的平滑同步置信带作为全局推断工具。第一步是对单个轨迹重建进行样条平滑，第二步是对单个平方残差进行核回归，以估计每个轨迹的变异性。然后通过矩方法构建功能数据方差函数的估计器。该估算程序的创新之处在于将样条平滑法和核回归法综合在一起，不仅可以应用样条平滑法的快速计算速度，还可以利用核平滑法灵活的局部估算和极值理论。结果表明，方差函数的估计器具有oracle效率，即当所有轨迹都由 "oracle "已知时，它的效率与理想估计器一样一致。因此，建立了方差函数的渐近正确同步置信带。仿真结果支持我们的渐近理论和快速计算。为了说明问题，我们将所提出的方法应用于对两个真实数据集的分析，结果发现了许多问题。

{"title":"Oracle-efficient estimation and global inferences for variance function of functional data","authors":"Li Cai , Suojin Wang","doi":"10.1016/j.jspi.2024.106210","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106210","url":null,"abstract":"<div><p>A new two-step reconstruction-based moment estimator and an asymptotically correct smooth simultaneous confidence band as a global inference tool are proposed for the heteroscedastic variance function of dense functional data. Step one involves spline smoothing for individual trajectory reconstructions and step two employs kernel regression on the individual squared residuals to estimate each trajectory variability. Then by the method of moment an estimator for the variance function of functional data is constructed. The estimation procedure is innovative by synthesizing spline smoothing and kernel regression together, which allows one not only to apply the fast computing speed of spline regression but also to employ the flexible local estimation and the extreme value theory of kernel smoothing. The resulting estimator for the variance function is shown to be oracle-efficient in the sense that it is uniformly as efficient as the ideal estimator when all trajectories were known by “oracle”. As a result, an asymptotically correct simultaneous confidence band for the variance function is established. Simulation results support our asymptotic theory with fast computation. As an illustration, the proposed method is applied to the analyses of two real data sets leading to a number of discoveries.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106210"},"PeriodicalIF":0.8,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141593789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Column expanded Latin hypercube designs 列扩展拉丁超立方设计

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-06-27 DOI: 10.1016/j.jspi.2024.106208

Qiao Wei, Jian-Feng Yang, Min-Qian Liu

Maximin distance designs and orthogonal designs are extensively applied in computer experiments, but the construction of such designs is challenging, especially under the maximin distance criterion. In this paper, by adding columns to a fold-over optimal maximin $L_{2}$ -distance Latin hypercube design (LHD), we construct a class of LHDs, called column expanded LHDs, which are nearly optimal under both the maximin $L_{2}$ -distance and orthogonality criteria. The advantage of the proposed method is that the resulting designs have flexible numbers of factors without computer search. Detailed comparisons with existing LHDs show that the constructed LHDs have larger minimum distances between design points and smaller correlation coefficients between distinct columns.

最大距离设计和正交设计在计算机实验中得到了广泛应用，但这类设计的构建极具挑战性，尤其是在最大距离准则下。本文通过在折叠最优最大L2距离拉丁超立方设计（LHD）中添加列，构建了一类在最大L2距离准则和正交准则下都接近最优的LHD，称为列扩展LHD。所提方法的优势在于，设计结果具有灵活的因子数，无需计算机搜索。与现有 LHD 的详细比较表明，所构建的 LHD 设计点之间的最小距离更大，不同列之间的相关系数更小。

引用次数: 0

The impact of misclassification on covariate-adaptive randomized clinical trials with generalized linear models 误分类对使用广义线性模型的协变量自适应随机临床试验的影响

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-06-27 DOI: 10.1016/j.jspi.2024.106209

Tong Wang, Wei Ma

Covariate-adaptive randomization (CAR) is a type of randomization method that uses covariate information to enhance the comparability between different treatment groups. Under such randomization, the covariate is usually well balanced, i.e., the imbalance between the treatment group and placebo group is controlled. In practice, the covariate is sometimes misclassified. The covariate misclassification affects the CAR itself and statistical inferences after the CAR. In this paper, we examine the impact of covariate misclassification on CAR from two aspects. First, we study the balancing properties of CAR with unequal allocation in the presence of covariate misclassification. We show the convergence rate of the imbalance and compare it with that under true covariate. Second, we study the hypothesis test under CAR with misclassified covariates in a generalized linear model (GLM) framework. We consider both the unadjusted and adjusted models. To illustrate the theoretical results, we discuss the validity of test procedures for three commonly-used GLM, i.e., logistic regression, Poisson regression and exponential model. Specifically, we show that the adjusted model is often invalid when the misclassified covariates are adjusted. In this case, we provide a simple correction for the inflated Type-I error. The correction is useful and easy to implement because it does not require misclassification specification and estimation of the misclassification rate. Our study enriches the literature on the impact of covariate misclassification on CAR and provides a practical approach for handling misclassification.

协变量自适应随机化（CAR）是一种利用协变量信息来增强不同治疗组之间可比性的随机化方法。在这种随机化方法下，协变量通常是平衡的，即治疗组与安慰剂组之间的不平衡得到了控制。实际上，协变量有时会被误分类。协变量分类错误会影响 CAR 本身和 CAR 后的统计推断。本文将从两个方面研究协变量误分类对 CAR 的影响。首先，我们研究了存在协变量误分类时不平等分配 CAR 的平衡特性。我们展示了不平衡的收敛速率，并将其与真实协变量下的收敛速率进行了比较。其次，我们在广义线性模型（GLM）框架下研究了具有误分类协变量的 CAR 假设检验。我们同时考虑了未调整模型和调整模型。为了说明理论结果，我们讨论了三种常用 GLM（即逻辑回归、泊松回归和指数模型）测试程序的有效性。具体来说，我们表明，当对误判协变量进行调整时，调整后的模型往往是无效的。在这种情况下，我们对夸大的 I 类误差进行了简单的修正。该校正方法非常有用，而且易于实施，因为它不需要误分类规范和误分类率估计。我们的研究丰富了有关协变量误分类对 CAR 影响的文献，并提供了处理误分类的实用方法。

{"title":"The impact of misclassification on covariate-adaptive randomized clinical trials with generalized linear models","authors":"Tong Wang, Wei Ma","doi":"10.1016/j.jspi.2024.106209","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106209","url":null,"abstract":"<div><p>Covariate-adaptive randomization (CAR) is a type of randomization method that uses covariate information to enhance the comparability between different treatment groups. Under such randomization, the covariate is usually well balanced, i.e., the imbalance between the treatment group and placebo group is controlled. In practice, the covariate is sometimes misclassified. The covariate misclassification affects the CAR itself and statistical inferences after the CAR. In this paper, we examine the impact of covariate misclassification on CAR from two aspects. First, we study the balancing properties of CAR with unequal allocation in the presence of covariate misclassification. We show the convergence rate of the imbalance and compare it with that under true covariate. Second, we study the hypothesis test under CAR with misclassified covariates in a generalized linear model (GLM) framework. We consider both the unadjusted and adjusted models. To illustrate the theoretical results, we discuss the validity of test procedures for three commonly-used GLM, i.e., logistic regression, Poisson regression and exponential model. Specifically, we show that the adjusted model is often invalid when the misclassified covariates are adjusted. In this case, we provide a simple correction for the inflated Type-I error. The correction is useful and easy to implement because it does not require misclassification specification and estimation of the misclassification rate. Our study enriches the literature on the impact of covariate misclassification on CAR and provides a practical approach for handling misclassification.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"234 ","pages":"Article 106209"},"PeriodicalIF":0.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141593759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A zero-estimator approach for estimating the signal level in a high-dimensional model-free setting 在高维无模型环境中估计信号水平的零估计器方法

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-06-22 DOI: 10.1016/j.jspi.2024.106207

Ilan Livne, David Azriel, Yair Goldberg

We study a high-dimensional regression setting under the assumption of known covariate distribution. We aim at estimating the amount of explained variation in the response by the best linear function of the covariates (the signal level). In our setting, neither sparsity of the coefficient vector, nor normality of the covariates or linearity of the conditional expectation are assumed. We present an unbiased and consistent estimator and then improve it by using a zero-estimator approach, where a zero-estimator is a statistic whose expected value is zero. More generally, we present an algorithm based on the zero estimator approach that in principle can improve any given estimator. We study some asymptotic properties of the proposed estimators and demonstrate their finite sample performance in a simulation study.

我们研究的是已知协变量分布假设下的高维回归设置。我们的目标是通过协变量（信号水平）的最佳线性函数来估计响应中可解释的变化量。在我们的设置中，既不假设系数向量的稀疏性，也不假设协变量的正态性或条件期望的线性。我们提出了一个无偏且一致的估计器，然后通过使用零估计器方法对其进行改进，零估计器是一种期望值为零的统计量。更广泛地说，我们提出了一种基于零估计方法的算法，该算法原则上可以改进任何给定的估计值。我们研究了所提估计器的一些渐近特性，并在模拟研究中展示了它们的有限样本性能。

引用次数: 0

Layer sparsity in neural networks 神经网络中的层稀疏性

IF 0.9 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference

Pub Date : 2024-06-09 DOI: 10.1016/j.jspi.2024.106195

Mohamed Hebiri , Johannes Lederer , Mahsa Taheri

Sparsity has become popular in machine learning because it can save computational resources, facilitate interpretations, and prevent overfitting. This paper discusses sparsity in the framework of neural networks. In particular, we formulate a new notion of sparsity, called layer sparsity, that concerns the networks’ layers and, therefore, aligns particularly well with the current trend toward deep networks. We then introduce corresponding regularization and refitting schemes that can complement standard deep-learning pipelines to generate more compact and accurate networks.

稀疏性在机器学习中很受欢迎，因为它可以节省计算资源、方便解释并防止过度拟合。本文讨论神经网络框架下的稀疏性。特别是，我们提出了一种新的稀疏性概念，称为层稀疏性，它与网络的层有关，因此特别符合当前的深度网络趋势。然后，我们介绍了相应的正则化和重拟合方案，这些方案可以补充标准深度学习管道，生成更紧凑、更精确的网络。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Statistical Planning and Inference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀