Journal of Nonparametric Statistics最新文献_第6页

Persistent homology based goodness-of-fit tests for spatial tessellations 基于持续同调的空间镶嵌拟合优度检验

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-11-09 DOI: 10.1080/10485252.2023.2280022

Christian Hirsch, Johannes Krebs, Claudia Redenbach

AbstractMotivated by the rapidly increasing relevance of virtual material design in the domain of materials science, it has become essential to assess whether topological properties of stochastic models for a spatial tessellation are in accordance with a given dataset. Recently, tools from topological data analysis such as the persistence diagram have allowed to reach profound insights in a variety of application contexts. In this work, we establish the asymptotic normality of a variety of test statistics derived from a tessellation-adapted refinement of the persistence diagram. Since in applications, it is common to work with tessellation data subject to interactions, we establish our main results for Voronoi and Laguerre tessellations whose generators form a Gibbs point process. We elucidate how these conceptual results can be used to derive goodness of fit tests, and then investigate their power in a simulation study. Finally, we apply our testing methodology to a tessellation describing real foam data.Keywords: Tessellationtopological data analysisgoodness-of-fitpersistence diagram2010 Mathematics Subject Classifications: 60K3560F1082C22 AcknowledgmentsWe thank the two anonymous referees for their careful reading of the manuscript. Their comments and suggestions substantially improved the quality of the presentation. We thank Anne Jung (Helmut Schmidt University Hamburg) for providing the foam sample and Christian Jung (RPTU Kaiserslautern-Landau) for computing the Laguerre approximation.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingJohannes Krebs was partially supported by the German Research Foundation (DFG), Grant Number KR-4977/2-1.

摘要由于虚拟材料设计在材料科学领域的相关性迅速增加，评估空间镶嵌随机模型的拓扑特性是否符合给定的数据集变得至关重要。最近，来自拓扑数据分析(如持久性图)的工具已经允许在各种应用程序上下文中获得深刻的见解。在这项工作中，我们建立了各种测试统计量的渐近正态性，这些统计量来自于持久图的自适应细分细化。由于在应用程序中，通常会处理受交互影响的镶嵌数据，因此我们建立了Voronoi和Laguerre镶嵌的主要结果，其生成器形成吉布斯点过程。我们阐明如何使用这些概念结果来推导拟合优度检验，然后在模拟研究中研究它们的能力。最后，我们将我们的测试方法应用于描述真实泡沫数据的镶嵌。关键词:细分拓扑数据分析拟合优度持续图2010数学学科分类:60K3560F1082C22致谢感谢两位匿名审稿人对本文的认真阅读。他们的意见和建议大大提高了报告的质量。我们感谢Anne Jung(汉堡赫尔穆特施密特大学)提供的泡沫样品和Christian Jung (RPTU Kaiserslautern-Landau)计算的Laguerre近似。披露声明作者未报告潜在的利益冲突。johannes Krebs的部分资金由德国研究基金会(DFG)提供，资助号为KR-4977/2-1。

{"title":"Persistent homology based goodness-of-fit tests for spatial tessellations","authors":"Christian Hirsch, Johannes Krebs, Claudia Redenbach","doi":"10.1080/10485252.2023.2280022","DOIUrl":"https://doi.org/10.1080/10485252.2023.2280022","url":null,"abstract":"AbstractMotivated by the rapidly increasing relevance of virtual material design in the domain of materials science, it has become essential to assess whether topological properties of stochastic models for a spatial tessellation are in accordance with a given dataset. Recently, tools from topological data analysis such as the persistence diagram have allowed to reach profound insights in a variety of application contexts. In this work, we establish the asymptotic normality of a variety of test statistics derived from a tessellation-adapted refinement of the persistence diagram. Since in applications, it is common to work with tessellation data subject to interactions, we establish our main results for Voronoi and Laguerre tessellations whose generators form a Gibbs point process. We elucidate how these conceptual results can be used to derive goodness of fit tests, and then investigate their power in a simulation study. Finally, we apply our testing methodology to a tessellation describing real foam data.Keywords: Tessellationtopological data analysisgoodness-of-fitpersistence diagram2010 Mathematics Subject Classifications: 60K3560F1082C22 AcknowledgmentsWe thank the two anonymous referees for their careful reading of the manuscript. Their comments and suggestions substantially improved the quality of the presentation. We thank Anne Jung (Helmut Schmidt University Hamburg) for providing the foam sample and Christian Jung (RPTU Kaiserslautern-Landau) for computing the Laguerre approximation.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingJohannes Krebs was partially supported by the German Research Foundation (DFG), Grant Number KR-4977/2-1.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135241575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximate tolerance intervals for nonparametric regression models 非参数回归模型的近似容差区间

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-11-02 DOI: 10.1080/10485252.2023.2277260

Yafan Guo, Derek S. Young

AbstractTolerance intervals in regression allow the user to quantify, with a specified degree of confidence, bounds for a specified proportion of the sampled population when conditioned on a set of covariate values. While methods are available for tolerance intervals in fully-parametric regression settings, the construction of tolerance intervals for nonparametric regression models has been treated in a limited capacity. This paper fills this gap and develops likelihood-based approaches for the construction of pointwise one-sided and two-sided tolerance intervals for nonparametric regression models. A numerical approach is also presented for constructing simultaneous tolerance intervals. An appealing facet of this work is that the resulting methodology is consistent with what is done for fully-parametric regression tolerance intervals. Extensive coverage studies are presented, which demonstrate very good performance of the proposed methods. The proposed tolerance intervals are calculated and interpreted for analyses involving a fertility dataset and a triceps measurement dataset.Keywords: Bootstrapboundary effectscoverage probabilitiesk-factorsmoothing splineAMS Subject Classifications: 62G0862G15 AcknowledgmentsWe would thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources. The authors are also thankful to the Associate Editor and two reviewers who provided numerous insightful comments that improved the overall quality of this work.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementThe fertility data are available at the HFC's website bluehttps://www.fertilitydata.org/cgi-bin/data.php. The triceps data are available in the R package MultiKink (Wan and Zhong Citation2020), and can be accessed by typing data(triceps).

在一组协变量值的条件下，回归中的容忍区间允许用户以指定的置信度量化抽样总体中指定比例的界限。虽然在全参数回归设置中有可用于公差区间的方法，但对非参数回归模型的公差区间的构造的处理能力有限。本文填补了这一空白，并开发了基于似然的方法来构建非参数回归模型的点向单侧和双侧容差区间。提出了一种构造同步公差区间的数值方法。这项工作的一个吸引人的方面是，所得到的方法与对全参数回归容忍区间所做的一致。广泛的覆盖研究表明，所提出的方法具有良好的性能。提出的公差区间计算和解释分析涉及生育数据集和三头肌测量数据集。关键词:自举边界效应覆盖概率因子平滑样条ams学科分类:62G0862G15致谢我们要感谢肯塔基大学计算科学和信息技术服务研究计算中心对Lipscomb计算集群和相关研究计算资源的支持和使用。作者还感谢副编辑和两位审稿人，他们提供了许多有见地的评论，提高了本文的整体质量。披露声明作者未报告潜在的利益冲突。数据可用性声明生育率数据可在HFC的网站bluehttps://www.fertilitydata.org/cgi-bin/data.php上获得。肱三头肌数据可以在R软件包MultiKink (Wan and Zhong Citation2020)中获得，可以通过输入数据(肱三头肌)来访问。

{"title":"Approximate tolerance intervals for nonparametric regression models","authors":"Yafan Guo, Derek S. Young","doi":"10.1080/10485252.2023.2277260","DOIUrl":"https://doi.org/10.1080/10485252.2023.2277260","url":null,"abstract":"AbstractTolerance intervals in regression allow the user to quantify, with a specified degree of confidence, bounds for a specified proportion of the sampled population when conditioned on a set of covariate values. While methods are available for tolerance intervals in fully-parametric regression settings, the construction of tolerance intervals for nonparametric regression models has been treated in a limited capacity. This paper fills this gap and develops likelihood-based approaches for the construction of pointwise one-sided and two-sided tolerance intervals for nonparametric regression models. A numerical approach is also presented for constructing simultaneous tolerance intervals. An appealing facet of this work is that the resulting methodology is consistent with what is done for fully-parametric regression tolerance intervals. Extensive coverage studies are presented, which demonstrate very good performance of the proposed methods. The proposed tolerance intervals are calculated and interpreted for analyses involving a fertility dataset and a triceps measurement dataset.Keywords: Bootstrapboundary effectscoverage probabilitiesk-factorsmoothing splineAMS Subject Classifications: 62G0862G15 AcknowledgmentsWe would thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources. The authors are also thankful to the Associate Editor and two reviewers who provided numerous insightful comments that improved the overall quality of this work.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementThe fertility data are available at the HFC's website bluehttps://www.fertilitydata.org/cgi-bin/data.php. The triceps data are available in the R package MultiKink (Wan and Zhong Citation2020), and can be accessed by typing data(triceps).","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"33 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135933082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Penalised estimation of partially linear additive zero-inflated Bernoulli regression models 部分线性加性零膨胀伯努利回归模型的惩罚估计

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-10-27 DOI: 10.1080/10485252.2023.2275056

Minggen Lu, Chin-Shang Li, Karla D. Wagner

AbstractWe develop a practical and computationally efficient penalised estimation approach for partially linear additive models to zero-inflated binary outcome data. To facilitate estimation, B-splines are employed to approximate unknown nonparametric components. A two-stage iterative expectation-maximisation (EM) algorithm is proposed to calculate penalised spline estimates. The large-sample properties such as the uniform convergence and the optimal rate of convergence for functional estimators, and the asymptotic normality and efficiency for regression coefficient estimators are established. Further, two variance-covariance estimation approaches are proposed to provide reliable Wald-type inference for regression coefficients. We conducted an extensive Monte Carlo study to evaluate the numerical properties of the proposed penalised methodology and compare it to the competing spline method [Li and Lu. ‘Semiparametric Zero-Inflated Bernoulli Regression with Applications’, Journal of Applied Statistics, 49, 2845–2869]. The methodology is further illustrated by an egocentric network study.Keywords: Additive Bernoulli regressionB-splineEM algorithmpenalised estimationzero-inflatedAMS SUBJECT CLASSIFICATIONS: 62G0562G2062G08 AcknowledgmentsThe authors are grateful to the Editor, the Associate Editor, and two reviewers for their useful comments and constructive suggestions which led to significant improvement in the revised manuscript.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health under Award Number R01DA038185.

摘要针对零膨胀二进制结果数据的部分线性加性模型，提出了一种实用且计算效率高的惩罚估计方法。为了便于估计，采用b样条近似未知的非参数分量。提出了一种两阶段迭代期望最大化算法来计算惩罚样条估计。建立了泛函估计量的一致收敛性和最优收敛速率等大样本性质，以及回归系数估计量的渐近正态性和有效性。进一步，提出了两种方差-协方差估计方法，为回归系数提供可靠的wald型推断。我们进行了广泛的蒙特卡罗研究，以评估所提出的惩罚方法的数值特性，并将其与竞争的样条方法进行比较[Li和Lu]。“半参数零膨胀Bernoulli回归及其应用”，应用统计学报，49,2845-2869。以自我为中心的网络研究进一步说明了该方法。关键词:加性伯努利回归b样条em算法补偿估计零膨胀ams学科分类:62G0562G2062G08致谢作者感谢编辑，副编辑和两位审稿人的有用意见和建设性建议，使修改后的稿件有了显著的改进。披露声明作者未报告潜在的利益冲突。本研究得到了美国国立卫生研究院国家药物滥用研究所(NIDA)的部分资助，资助号为R01DA038185。

{"title":"Penalised estimation of partially linear additive zero-inflated Bernoulli regression models","authors":"Minggen Lu, Chin-Shang Li, Karla D. Wagner","doi":"10.1080/10485252.2023.2275056","DOIUrl":"https://doi.org/10.1080/10485252.2023.2275056","url":null,"abstract":"AbstractWe develop a practical and computationally efficient penalised estimation approach for partially linear additive models to zero-inflated binary outcome data. To facilitate estimation, B-splines are employed to approximate unknown nonparametric components. A two-stage iterative expectation-maximisation (EM) algorithm is proposed to calculate penalised spline estimates. The large-sample properties such as the uniform convergence and the optimal rate of convergence for functional estimators, and the asymptotic normality and efficiency for regression coefficient estimators are established. Further, two variance-covariance estimation approaches are proposed to provide reliable Wald-type inference for regression coefficients. We conducted an extensive Monte Carlo study to evaluate the numerical properties of the proposed penalised methodology and compare it to the competing spline method [Li and Lu. ‘Semiparametric Zero-Inflated Bernoulli Regression with Applications’, Journal of Applied Statistics, 49, 2845–2869]. The methodology is further illustrated by an egocentric network study.Keywords: Additive Bernoulli regressionB-splineEM algorithmpenalised estimationzero-inflatedAMS SUBJECT CLASSIFICATIONS: 62G0562G2062G08 AcknowledgmentsThe authors are grateful to the Editor, the Associate Editor, and two reviewers for their useful comments and constructive suggestions which led to significant improvement in the revised manuscript.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health under Award Number R01DA038185.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"74 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136235012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A modified Nadaraya–Watson procedure for variable selection and nonparametric prediction with missing data 缺失数据下变量选择和非参数预测的改进Nadaraya-Watson程序

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-10-18 DOI: 10.1080/10485252.2023.2270079

Kin Yap Cheung, Stephen M. S. Lee

AbstractWe propose a new method for variable selection and prediction under a nonparametric regression setting, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. Despite its practical relevance, the problem has received little attention in the literature and its solutions are largely non-existent. Our proposal hinges on the construction of a modified Nadaraya–Watson estimator of the conditional mean regression function, with its bandwidths regularised to select variables and its weights adapted to accommodate different types of missingness. The method allows for information sharing across different missing data patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missingness mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. Our theoretical and numerical results show that the new method is consistent in variable selection and yields better prediction accuracy compared to existing methods.KEYWORDS: Nadaraya–Watson estimatormissing datanonparametric regressionvariable selection Disclosure statementNo potential conflict of interest was reported by the author(s).

摘要本文提出了一种新的非参数回归设置下的变量选择和预测方法，其中协变量可能因为其值对观察者隐藏或因为它不适用于被观察的特定主题而丢失。尽管具有实际意义，但该问题在文献中很少受到关注，其解决方案基本上不存在。我们的建议依赖于条件平均回归函数的改进Nadaraya-Watson估计器的构造，其带宽经过正则化以选择变量，其权重经过调整以适应不同类型的缺失。该方法允许跨不同缺失数据模式共享信息，而不会影响估计器的一致性。不同于其他传统的方法，例如那些基于imputation或likelihood的方法，我们的方法只需要对模型和缺失机制进行温和的假设。对于预测，我们专注于寻找预测平均响应的相关变量，条件是协变量向量服从给定类型的缺失。理论和数值结果表明，与现有方法相比，新方法在变量选择上具有一致性，预测精度更高。关键词:Nadaraya-Watson估计缺失数据非参数回归变量选择披露声明作者未报告潜在利益冲突。

{"title":"A modified Nadaraya–Watson procedure for variable selection and nonparametric prediction with missing data","authors":"Kin Yap Cheung, Stephen M. S. Lee","doi":"10.1080/10485252.2023.2270079","DOIUrl":"https://doi.org/10.1080/10485252.2023.2270079","url":null,"abstract":"AbstractWe propose a new method for variable selection and prediction under a nonparametric regression setting, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. Despite its practical relevance, the problem has received little attention in the literature and its solutions are largely non-existent. Our proposal hinges on the construction of a modified Nadaraya–Watson estimator of the conditional mean regression function, with its bandwidths regularised to select variables and its weights adapted to accommodate different types of missingness. The method allows for information sharing across different missing data patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missingness mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. Our theoretical and numerical results show that the new method is consistent in variable selection and yields better prediction accuracy compared to existing methods.KEYWORDS: Nadaraya–Watson estimatormissing datanonparametric regressionvariable selection Disclosure statementNo potential conflict of interest was reported by the author(s).","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135883797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Threshold selection for extremal index estimation 极值指数估计的阈值选择

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-10-13 DOI: 10.1080/10485252.2023.2266050

Natalia M. Markovich, Igor V. Rodionov

ABSTRACTWe propose a new threshold selection method for nonparametric estimation of the extremal index of stochastic processes. The discrepancy method was proposed as a data-driven smoothing tool for estimation of a probability density function. Now it is modified to select a threshold parameter of an extremal index estimator. A modification of the discrepancy statistic based on the Cramér–von Mises–Smirnov statistic ω2 is calculated by k largest order statistics instead of an entire sample. Its asymptotic distribution as k→∞ is proved to coincide with the ω2-distribution. Its quantiles are used as discrepancy values. The convergence rate of an extremal index estimate coupled with the discrepancy method is derived. The discrepancy method is used as an automatic threshold selection for the intervals and K-gaps estimators. It may be applied to other estimators of the extremal index. The performance of our method is evaluated by simulated and real data examples.KEYWORDS: Cramér–von Mises–Smirnov statisticdiscrepancy methodextremal indexnonparametric estimationthreshold selectionAMS SUBJECT CLASSIFICATION:: 62G32 Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 The connection between (Equation1(1) ωn2=n∫−∞∞(Fn(x)−F(x))2dF(x)(1) ) and (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) can be found in Markovich (Citation2007, p. 81).2 Theoretically, events {Ti=1} are allowed. In practice, such cases related to single inter-arrival times between consecutive exceedances are meaningless.3 The modification (ω^n2−0.4/n+0.6/n2)(1+1/n) of classical statistic (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) eliminates the dependence of the percentage points of the C–M–S statistic on the sample size (Stephens Citation1974). For n>40 it changes the statistic on less than one percent. One can use the modification with regard to ω~L2(θ^) for finite L due to the closeness of its distribution to the limit distribution of the C–M–S statistic by Theorem 3.2.Additional informationFundingThe work of N.M. Markovich in Sections 1, 2, 4 and 5 was supported by the Russian Science Foundation [grant number 22-21-00177]. The work of I. V. Rodionov in Section 3 and proofs in Markovich and Rodionov (Citation2022) was performed at the Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences with the support of the Russian Science Foundation (grant No. 21-71-00035).

摘要针对随机过程极值指标的非参数估计，提出了一种新的阈值选择方法。提出了一种数据驱动的平滑方法，用于估计概率密度函数。现在将其修改为选择极值索引估计器的阈值参数。基于cram - von Mises-Smirnov统计量ω2的差异统计量的修正是通过k个最大阶统计量而不是整个样本来计算的。证明了它在k→∞时的渐近分布与ω - 2分布一致。其分位数用作差异值。导出了与差值法相结合的极值指数估计的收敛速度。差异方法被用作区间和k -间隙估计器的自动阈值选择。它可以应用于极值指数的其他估计。通过仿真和实际数据实例对该方法的性能进行了评价。关键词:克拉姆萨姆-冯·米斯-斯米尔诺夫统计差异法极值指数非参数估计阈值选择ams主题分类::62G32披露声明作者未报告潜在利益冲突。理论上，事件{Ti=1}是允许的。在实践中，这类与连续超标之间的单一到达间隔时间有关的情况是没有意义的对于n>40，它对统计量的改变小于1%。对于有限的L，由于它的分布与定理3.2中C-M-S统计量的极限分布很接近，我们可以使用关于ω~L2(θ^)的修正。N.M. Markovich在第1,2,4和5部分的工作得到了俄罗斯科学基金会的支持[资助号22-21-00177]。I. V. Rodionov在第3节中的工作以及Markovich和Rodionov的证明(Citation2022)由俄罗斯科学院信息传输问题研究所(Kharkevich研究所)在俄罗斯科学基金会(资助号21-71-00035)的支持下完成。

{"title":"Threshold selection for extremal index estimation","authors":"Natalia M. Markovich, Igor V. Rodionov","doi":"10.1080/10485252.2023.2266050","DOIUrl":"https://doi.org/10.1080/10485252.2023.2266050","url":null,"abstract":"ABSTRACTWe propose a new threshold selection method for nonparametric estimation of the extremal index of stochastic processes. The discrepancy method was proposed as a data-driven smoothing tool for estimation of a probability density function. Now it is modified to select a threshold parameter of an extremal index estimator. A modification of the discrepancy statistic based on the Cramér–von Mises–Smirnov statistic ω2 is calculated by k largest order statistics instead of an entire sample. Its asymptotic distribution as k→∞ is proved to coincide with the ω2-distribution. Its quantiles are used as discrepancy values. The convergence rate of an extremal index estimate coupled with the discrepancy method is derived. The discrepancy method is used as an automatic threshold selection for the intervals and K-gaps estimators. It may be applied to other estimators of the extremal index. The performance of our method is evaluated by simulated and real data examples.KEYWORDS: Cramér–von Mises–Smirnov statisticdiscrepancy methodextremal indexnonparametric estimationthreshold selectionAMS SUBJECT CLASSIFICATION:: 62G32 Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 The connection between (Equation1(1) ωn2=n∫−∞∞(Fn(x)−F(x))2dF(x)(1) ) and (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) can be found in Markovich (Citation2007, p. 81).2 Theoretically, events {Ti=1} are allowed. In practice, such cases related to single inter-arrival times between consecutive exceedances are meaningless.3 The modification (ω^n2−0.4/n+0.6/n2)(1+1/n) of classical statistic (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) eliminates the dependence of the percentage points of the C–M–S statistic on the sample size (Stephens Citation1974). For n>40 it changes the statistic on less than one percent. One can use the modification with regard to ω~L2(θ^) for finite L due to the closeness of its distribution to the limit distribution of the C–M–S statistic by Theorem 3.2.Additional informationFundingThe work of N.M. Markovich in Sections 1, 2, 4 and 5 was supported by the Russian Science Foundation [grant number 22-21-00177]. The work of I. V. Rodionov in Section 3 and proofs in Markovich and Rodionov (Citation2022) was performed at the Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences with the support of the Russian Science Foundation (grant No. 21-71-00035).","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135805169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Model-free prediction of time series: a nonparametric approach 无模型时间序列预测:一种非参数方法

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-10-11 DOI: 10.1080/10485252.2023.2266740

Mohammad Mohammadi, Meng Li

AbstractWe propose a novel approach for model-free time series forecasting. Unlike most existing methods, the proposed method does not rely on parametric error distributions nor assume parametric forms of the mean function, leading to broad applicability. We achieve such generality by establishing a simple but powerful representation of a time series {Xt;t∈Z} with suptE|Xt|<∞, that is, Xt has a solution which is a linear combination of infinite past values. Then using the obtained solution a prediction algorithm is presented, with large sample theoretical guarantees. Simulation studies show favourable performance of the proposed method compared with popular parametric and neural networks methods, and suggest its superiority when the sample size is small. An application to practical time series is discussed.Keywords: Predictionnonparametric methodsneural networksα-stable distributionMSC2010 subject classifications:: Primary: 60G25Secondary: 62M20 Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 See https://www.sciencedirect.com/topics/engineering/left-inverse.

摘要提出了一种新的无模型时间序列预测方法。与大多数现有方法不同，该方法不依赖于参数误差分布，也不假设均值函数的参数形式，具有广泛的适用性。我们通过建立一个简单而强大的时间序列{Xt;t∈Z}的表示，suptE|Xt|<∞，即Xt有一个解是无限个过去值的线性组合，从而实现了这种普遍性。然后利用得到的解给出了一种具有大样本理论保证的预测算法。仿真研究表明，与常用的参数网络和神经网络方法相比，该方法具有良好的性能，并且在样本容量较小的情况下具有优越性。讨论了该方法在实际时间序列中的应用。关键词:预测非参数方法神经网络α-稳定分布msc2010学科分类:初级:60g25次级:62M20披露声明作者未报告潜在利益冲突。注1参见https://www.sciencedirect.com/topics/engineering/left-inverse。

引用次数: 0

On estimation of covariance function for functional data with detection limits 带检出限的函数数据的协方差函数估计

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-09-19 DOI: 10.1080/10485252.2023.2258999

Haiyan Liu, Jeanine Houwing-Duistermaat

In many studies on disease progression, biomarkers are restricted by detection limits, hence informatively missing. Current approaches ignore the problem by just filling in the value of the detection limit for the missing observations for the estimation of the mean and covariance function, which yield inaccurate estimation. Inspired by our recent work [Liu and Houwing-Duistermaat (2022), ‘Fast Estimators for the Mean Function for Functional Data with Detection Limits’, Stat, e467.] in which novel estimators for mean function for data subject to detection limit are proposed, in this paper, we will propose a novel estimator for the covariance function for sparse and dense data subject to a detection limit. We will derive the asymptotic properties of the estimator. We will compare our method to the standard method, which ignores the detection limit, via simulations. We will illustrate the new approach by analysing biomarker data subject to a detection limit. In contrast to the standard method, our method appeared to provide more accurate estimates of the covariance. Moreover its computation time is small.

在许多疾病进展的研究中，生物标志物受到检测限的限制，因此信息缺失。目前的方法忽略了这个问题，只是在估计均值和协方差函数时，为缺失的观测值填写检测限的值，从而产生不准确的估计。受我们最近工作的启发[Liu和Houwing-Duistermaat(2022)，“具有检测限的功能数据的平均函数的快速估计器”，Stat, e467。]，其中提出了受检测极限约束的数据的均值函数的新估计，在本文中，我们将提出一个受检测极限约束的稀疏和密集数据的协方差函数的新估计。我们将推导估计量的渐近性质。我们将通过模拟将我们的方法与忽略检测极限的标准方法进行比较。我们将通过分析受检测限制的生物标志物数据来说明新方法。与标准方法相比，我们的方法似乎提供了更准确的协方差估计。而且计算时间小。

引用次数: 0

Fighting selection bias in statistical learning: application to visual recognition from biased image databases 对抗统计学习中的选择偏差:应用于有偏差图像数据库的视觉识别

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-09-19 DOI: 10.1080/10485252.2023.2259011

Stephan Clémençon, Pierre Laforgue, Robin Vogel

AbstractIn practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how biasing models can remedy these problems. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition is that the supports of the biased distributions must partly overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach.Keywords: Sampling biasselection effectvisual recognitionreliable statistical learning Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis work was partially supported by the research chair ‘Good In Tech : Rethinking innovation and technology as drivers of a better world for and by humans’, under the auspices of the ‘Fondation du Risque’ and in partnership with the Institut Mines-Télécom, Sciences Po, Afnor, Ag2r La Mondiale, CGI France, Danone and Sycomore.

在实践中，特别是在训练深度神经网络时，视觉识别规则通常是基于各种信息来源学习的。另一方面，最近部署的面部识别系统在不同人群中表现不均匀，突出了数据集幼稚聚合引起的代表性问题。在本文中，我们展示了偏置模型如何解决这些问题。基于对工作中的偏倚机制的(近似)了解，我们的方法包括重新加权观测值，从而形成目标分布的近去偏估计量。一个关键条件是有偏分布的支持必须部分重叠，并覆盖目标分布的支持。为了在实践中满足这一要求，我们建议使用低维图像表示，在图像数据库中共享。最后，我们提供了数值实验，突出了我们方法的相关性。关键词:抽样偏倚选择效应视觉识别可靠统计学习披露声明作者未报告潜在的利益冲突。这项工作得到了“科技的好处:重新思考创新和技术作为人类和人类更美好世界的驱动力”研究主席的部分支持，该研究主席由“Risque基金会”主持，并与矿业研究所、巴黎政治学院、Afnor、Ag2r La Mondiale、CGI法国、达能和Sycomore合作。

{"title":"Fighting selection bias in statistical learning: application to visual recognition from biased image databases","authors":"Stephan Clémençon, Pierre Laforgue, Robin Vogel","doi":"10.1080/10485252.2023.2259011","DOIUrl":"https://doi.org/10.1080/10485252.2023.2259011","url":null,"abstract":"AbstractIn practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how biasing models can remedy these problems. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition is that the supports of the biased distributions must partly overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach.Keywords: Sampling biasselection effectvisual recognitionreliable statistical learning Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis work was partially supported by the research chair ‘Good In Tech : Rethinking innovation and technology as drivers of a better world for and by humans’, under the auspices of the ‘Fondation du Risque’ and in partnership with the Institut Mines-Télécom, Sciences Po, Afnor, Ag2r La Mondiale, CGI France, Danone and Sycomore.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135106810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient nonparametric estimation of generalised autocovariances 广义自协方差的有效非参数估计

4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-09-02 DOI: 10.1080/10485252.2023.2252527

Alessandra Luati, Francesca Papagni, Tommaso Proietti

This paper provides a necessary and sufficient condition for asymptotic efficiency of a nonparametric estimator of the generalised autocovariance function of a stationary random process. The generalised autocovariance function is the inverse Fourier transform of a power transformation of the spectral density and encompasses the traditional and inverse autocovariance functions as particular cases. A nonparametric estimator is based on the inverse discrete Fourier transform of the power transformation of the pooled periodogram. We consider two cases: the fixed bandwidth design and the adaptive bandwidth design. The general result on the asymptotic efficiency, established for linear processes, is then applied to the class of stationary ARMA processes and its implications are discussed. Finally, we illustrate that for a class of contrast functionals and spectral densities, the minimum contrast estimator of the spectral density satisfies a Yule–Walker system of equations in the generalised autocovariance estimator.

本文给出了平稳随机过程广义自协方差函数的非参数估计量渐近有效的一个充分必要条件。广义自协方差函数是谱密度幂变换的傅里叶反变换，包含传统自协方差函数和逆自协方差函数作为特殊情况。非参数估计是基于池化周期图幂变换的离散傅里叶反变换。我们考虑了两种情况:固定带宽设计和自适应带宽设计。然后将线性过程的渐近效率的一般结果应用于一类平稳ARMA过程，并讨论了其意义。最后，我们证明了对于一类对比泛函和谱密度，谱密度的最小对比估计量在广义自协方差估计量中满足Yule-Walker方程组。

引用次数: 0

Nonparametric relative error estimation of the regression function for left truncated and right censored time series data 左截短和右截短时间序列数据回归函数的非参数相对误差估计

IF 1.2 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Nonparametric Statistics

Pub Date : 2023-09-02 DOI: 10.1080/10485252.2023.2241572

N. Bayarassou, F. Hamrani, E. Ould Saïd

引用次数: 0