Pub Date : 2023-11-09DOI: 10.1080/10485252.2023.2280022
Christian Hirsch, Johannes Krebs, Claudia Redenbach
AbstractMotivated by the rapidly increasing relevance of virtual material design in the domain of materials science, it has become essential to assess whether topological properties of stochastic models for a spatial tessellation are in accordance with a given dataset. Recently, tools from topological data analysis such as the persistence diagram have allowed to reach profound insights in a variety of application contexts. In this work, we establish the asymptotic normality of a variety of test statistics derived from a tessellation-adapted refinement of the persistence diagram. Since in applications, it is common to work with tessellation data subject to interactions, we establish our main results for Voronoi and Laguerre tessellations whose generators form a Gibbs point process. We elucidate how these conceptual results can be used to derive goodness of fit tests, and then investigate their power in a simulation study. Finally, we apply our testing methodology to a tessellation describing real foam data.Keywords: Tessellationtopological data analysisgoodness-of-fitpersistence diagram2010 Mathematics Subject Classifications: 60K3560F1082C22 AcknowledgmentsWe thank the two anonymous referees for their careful reading of the manuscript. Their comments and suggestions substantially improved the quality of the presentation. We thank Anne Jung (Helmut Schmidt University Hamburg) for providing the foam sample and Christian Jung (RPTU Kaiserslautern-Landau) for computing the Laguerre approximation.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingJohannes Krebs was partially supported by the German Research Foundation (DFG), Grant Number KR-4977/2-1.
{"title":"Persistent homology based goodness-of-fit tests for spatial tessellations","authors":"Christian Hirsch, Johannes Krebs, Claudia Redenbach","doi":"10.1080/10485252.2023.2280022","DOIUrl":"https://doi.org/10.1080/10485252.2023.2280022","url":null,"abstract":"AbstractMotivated by the rapidly increasing relevance of virtual material design in the domain of materials science, it has become essential to assess whether topological properties of stochastic models for a spatial tessellation are in accordance with a given dataset. Recently, tools from topological data analysis such as the persistence diagram have allowed to reach profound insights in a variety of application contexts. In this work, we establish the asymptotic normality of a variety of test statistics derived from a tessellation-adapted refinement of the persistence diagram. Since in applications, it is common to work with tessellation data subject to interactions, we establish our main results for Voronoi and Laguerre tessellations whose generators form a Gibbs point process. We elucidate how these conceptual results can be used to derive goodness of fit tests, and then investigate their power in a simulation study. Finally, we apply our testing methodology to a tessellation describing real foam data.Keywords: Tessellationtopological data analysisgoodness-of-fitpersistence diagram2010 Mathematics Subject Classifications: 60K3560F1082C22 AcknowledgmentsWe thank the two anonymous referees for their careful reading of the manuscript. Their comments and suggestions substantially improved the quality of the presentation. We thank Anne Jung (Helmut Schmidt University Hamburg) for providing the foam sample and Christian Jung (RPTU Kaiserslautern-Landau) for computing the Laguerre approximation.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingJohannes Krebs was partially supported by the German Research Foundation (DFG), Grant Number KR-4977/2-1.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":" 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135241575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-02DOI: 10.1080/10485252.2023.2277260
Yafan Guo, Derek S. Young
AbstractTolerance intervals in regression allow the user to quantify, with a specified degree of confidence, bounds for a specified proportion of the sampled population when conditioned on a set of covariate values. While methods are available for tolerance intervals in fully-parametric regression settings, the construction of tolerance intervals for nonparametric regression models has been treated in a limited capacity. This paper fills this gap and develops likelihood-based approaches for the construction of pointwise one-sided and two-sided tolerance intervals for nonparametric regression models. A numerical approach is also presented for constructing simultaneous tolerance intervals. An appealing facet of this work is that the resulting methodology is consistent with what is done for fully-parametric regression tolerance intervals. Extensive coverage studies are presented, which demonstrate very good performance of the proposed methods. The proposed tolerance intervals are calculated and interpreted for analyses involving a fertility dataset and a triceps measurement dataset.Keywords: Bootstrapboundary effectscoverage probabilitiesk-factorsmoothing splineAMS Subject Classifications: 62G0862G15 AcknowledgmentsWe would thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources. The authors are also thankful to the Associate Editor and two reviewers who provided numerous insightful comments that improved the overall quality of this work.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementThe fertility data are available at the HFC's website bluehttps://www.fertilitydata.org/cgi-bin/data.php. The triceps data are available in the R package MultiKink (Wan and Zhong Citation2020), and can be accessed by typing data(triceps).
在一组协变量值的条件下,回归中的容忍区间允许用户以指定的置信度量化抽样总体中指定比例的界限。虽然在全参数回归设置中有可用于公差区间的方法,但对非参数回归模型的公差区间的构造的处理能力有限。本文填补了这一空白,并开发了基于似然的方法来构建非参数回归模型的点向单侧和双侧容差区间。提出了一种构造同步公差区间的数值方法。这项工作的一个吸引人的方面是,所得到的方法与对全参数回归容忍区间所做的一致。广泛的覆盖研究表明,所提出的方法具有良好的性能。提出的公差区间计算和解释分析涉及生育数据集和三头肌测量数据集。关键词:自举边界效应覆盖概率因子平滑样条ams学科分类:62G0862G15致谢我们要感谢肯塔基大学计算科学和信息技术服务研究计算中心对Lipscomb计算集群和相关研究计算资源的支持和使用。作者还感谢副编辑和两位审稿人,他们提供了许多有见地的评论,提高了本文的整体质量。披露声明作者未报告潜在的利益冲突。数据可用性声明生育率数据可在HFC的网站bluehttps://www.fertilitydata.org/cgi-bin/data.php上获得。肱三头肌数据可以在R软件包MultiKink (Wan and Zhong Citation2020)中获得,可以通过输入数据(肱三头肌)来访问。
{"title":"Approximate tolerance intervals for nonparametric regression models","authors":"Yafan Guo, Derek S. Young","doi":"10.1080/10485252.2023.2277260","DOIUrl":"https://doi.org/10.1080/10485252.2023.2277260","url":null,"abstract":"AbstractTolerance intervals in regression allow the user to quantify, with a specified degree of confidence, bounds for a specified proportion of the sampled population when conditioned on a set of covariate values. While methods are available for tolerance intervals in fully-parametric regression settings, the construction of tolerance intervals for nonparametric regression models has been treated in a limited capacity. This paper fills this gap and develops likelihood-based approaches for the construction of pointwise one-sided and two-sided tolerance intervals for nonparametric regression models. A numerical approach is also presented for constructing simultaneous tolerance intervals. An appealing facet of this work is that the resulting methodology is consistent with what is done for fully-parametric regression tolerance intervals. Extensive coverage studies are presented, which demonstrate very good performance of the proposed methods. The proposed tolerance intervals are calculated and interpreted for analyses involving a fertility dataset and a triceps measurement dataset.Keywords: Bootstrapboundary effectscoverage probabilitiesk-factorsmoothing splineAMS Subject Classifications: 62G0862G15 AcknowledgmentsWe would thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources. The authors are also thankful to the Associate Editor and two reviewers who provided numerous insightful comments that improved the overall quality of this work.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementThe fertility data are available at the HFC's website bluehttps://www.fertilitydata.org/cgi-bin/data.php. The triceps data are available in the R package MultiKink (Wan and Zhong Citation2020), and can be accessed by typing data(triceps).","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"33 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135933082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-27DOI: 10.1080/10485252.2023.2275056
Minggen Lu, Chin-Shang Li, Karla D. Wagner
AbstractWe develop a practical and computationally efficient penalised estimation approach for partially linear additive models to zero-inflated binary outcome data. To facilitate estimation, B-splines are employed to approximate unknown nonparametric components. A two-stage iterative expectation-maximisation (EM) algorithm is proposed to calculate penalised spline estimates. The large-sample properties such as the uniform convergence and the optimal rate of convergence for functional estimators, and the asymptotic normality and efficiency for regression coefficient estimators are established. Further, two variance-covariance estimation approaches are proposed to provide reliable Wald-type inference for regression coefficients. We conducted an extensive Monte Carlo study to evaluate the numerical properties of the proposed penalised methodology and compare it to the competing spline method [Li and Lu. ‘Semiparametric Zero-Inflated Bernoulli Regression with Applications’, Journal of Applied Statistics, 49, 2845–2869]. The methodology is further illustrated by an egocentric network study.Keywords: Additive Bernoulli regressionB-splineEM algorithmpenalised estimationzero-inflatedAMS SUBJECT CLASSIFICATIONS: 62G0562G2062G08 AcknowledgmentsThe authors are grateful to the Editor, the Associate Editor, and two reviewers for their useful comments and constructive suggestions which led to significant improvement in the revised manuscript.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health under Award Number R01DA038185.
{"title":"Penalised estimation of partially linear additive zero-inflated Bernoulli regression models","authors":"Minggen Lu, Chin-Shang Li, Karla D. Wagner","doi":"10.1080/10485252.2023.2275056","DOIUrl":"https://doi.org/10.1080/10485252.2023.2275056","url":null,"abstract":"AbstractWe develop a practical and computationally efficient penalised estimation approach for partially linear additive models to zero-inflated binary outcome data. To facilitate estimation, B-splines are employed to approximate unknown nonparametric components. A two-stage iterative expectation-maximisation (EM) algorithm is proposed to calculate penalised spline estimates. The large-sample properties such as the uniform convergence and the optimal rate of convergence for functional estimators, and the asymptotic normality and efficiency for regression coefficient estimators are established. Further, two variance-covariance estimation approaches are proposed to provide reliable Wald-type inference for regression coefficients. We conducted an extensive Monte Carlo study to evaluate the numerical properties of the proposed penalised methodology and compare it to the competing spline method [Li and Lu. ‘Semiparametric Zero-Inflated Bernoulli Regression with Applications’, Journal of Applied Statistics, 49, 2845–2869]. The methodology is further illustrated by an egocentric network study.Keywords: Additive Bernoulli regressionB-splineEM algorithmpenalised estimationzero-inflatedAMS SUBJECT CLASSIFICATIONS: 62G0562G2062G08 AcknowledgmentsThe authors are grateful to the Editor, the Associate Editor, and two reviewers for their useful comments and constructive suggestions which led to significant improvement in the revised manuscript.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research was partially supported by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health under Award Number R01DA038185.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"74 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136235012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-18DOI: 10.1080/10485252.2023.2270079
Kin Yap Cheung, Stephen M. S. Lee
AbstractWe propose a new method for variable selection and prediction under a nonparametric regression setting, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. Despite its practical relevance, the problem has received little attention in the literature and its solutions are largely non-existent. Our proposal hinges on the construction of a modified Nadaraya–Watson estimator of the conditional mean regression function, with its bandwidths regularised to select variables and its weights adapted to accommodate different types of missingness. The method allows for information sharing across different missing data patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missingness mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. Our theoretical and numerical results show that the new method is consistent in variable selection and yields better prediction accuracy compared to existing methods.KEYWORDS: Nadaraya–Watson estimatormissing datanonparametric regressionvariable selection Disclosure statementNo potential conflict of interest was reported by the author(s).
{"title":"A modified Nadaraya–Watson procedure for variable selection and nonparametric prediction with missing data","authors":"Kin Yap Cheung, Stephen M. S. Lee","doi":"10.1080/10485252.2023.2270079","DOIUrl":"https://doi.org/10.1080/10485252.2023.2270079","url":null,"abstract":"AbstractWe propose a new method for variable selection and prediction under a nonparametric regression setting, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. Despite its practical relevance, the problem has received little attention in the literature and its solutions are largely non-existent. Our proposal hinges on the construction of a modified Nadaraya–Watson estimator of the conditional mean regression function, with its bandwidths regularised to select variables and its weights adapted to accommodate different types of missingness. The method allows for information sharing across different missing data patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missingness mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. Our theoretical and numerical results show that the new method is consistent in variable selection and yields better prediction accuracy compared to existing methods.KEYWORDS: Nadaraya–Watson estimatormissing datanonparametric regressionvariable selection Disclosure statementNo potential conflict of interest was reported by the author(s).","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135883797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-13DOI: 10.1080/10485252.2023.2266050
Natalia M. Markovich, Igor V. Rodionov
ABSTRACTWe propose a new threshold selection method for nonparametric estimation of the extremal index of stochastic processes. The discrepancy method was proposed as a data-driven smoothing tool for estimation of a probability density function. Now it is modified to select a threshold parameter of an extremal index estimator. A modification of the discrepancy statistic based on the Cramér–von Mises–Smirnov statistic ω2 is calculated by k largest order statistics instead of an entire sample. Its asymptotic distribution as k→∞ is proved to coincide with the ω2-distribution. Its quantiles are used as discrepancy values. The convergence rate of an extremal index estimate coupled with the discrepancy method is derived. The discrepancy method is used as an automatic threshold selection for the intervals and K-gaps estimators. It may be applied to other estimators of the extremal index. The performance of our method is evaluated by simulated and real data examples.KEYWORDS: Cramér–von Mises–Smirnov statisticdiscrepancy methodextremal indexnonparametric estimationthreshold selectionAMS SUBJECT CLASSIFICATION:: 62G32 Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 The connection between (Equation1(1) ωn2=n∫−∞∞(Fn(x)−F(x))2dF(x)(1) ) and (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) can be found in Markovich (Citation2007, p. 81).2 Theoretically, events {Ti=1} are allowed. In practice, such cases related to single inter-arrival times between consecutive exceedances are meaningless.3 The modification (ω^n2−0.4/n+0.6/n2)(1+1/n) of classical statistic (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) eliminates the dependence of the percentage points of the C–M–S statistic on the sample size (Stephens Citation1974). For n>40 it changes the statistic on less than one percent. One can use the modification with regard to ω~L2(θ^) for finite L due to the closeness of its distribution to the limit distribution of the C–M–S statistic by Theorem 3.2.Additional informationFundingThe work of N.M. Markovich in Sections 1, 2, 4 and 5 was supported by the Russian Science Foundation [grant number 22-21-00177]. The work of I. V. Rodionov in Section 3 and proofs in Markovich and Rodionov (Citation2022) was performed at the Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences with the support of the Russian Science Foundation (grant No. 21-71-00035).
摘要针对随机过程极值指标的非参数估计,提出了一种新的阈值选择方法。提出了一种数据驱动的平滑方法,用于估计概率密度函数。现在将其修改为选择极值索引估计器的阈值参数。基于cram - von Mises-Smirnov统计量ω2的差异统计量的修正是通过k个最大阶统计量而不是整个样本来计算的。证明了它在k→∞时的渐近分布与ω - 2分布一致。其分位数用作差异值。导出了与差值法相结合的极值指数估计的收敛速度。差异方法被用作区间和k -间隙估计器的自动阈值选择。它可以应用于极值指数的其他估计。通过仿真和实际数据实例对该方法的性能进行了评价。关键词:克拉姆萨姆-冯·米斯-斯米尔诺夫统计差异法极值指数非参数估计阈值选择ams主题分类::62G32披露声明作者未报告潜在利益冲突。理论上,事件{Ti=1}是允许的。在实践中,这类与连续超标之间的单一到达间隔时间有关的情况是没有意义的对于n>40,它对统计量的改变小于1%。对于有限的L,由于它的分布与定理3.2中C-M-S统计量的极限分布很接近,我们可以使用关于ω~L2(θ^)的修正。N.M. Markovich在第1,2,4和5部分的工作得到了俄罗斯科学基金会的支持[资助号22-21-00177]。I. V. Rodionov在第3节中的工作以及Markovich和Rodionov的证明(Citation2022)由俄罗斯科学院信息传输问题研究所(Kharkevich研究所)在俄罗斯科学基金会(资助号21-71-00035)的支持下完成。
{"title":"Threshold selection for extremal index estimation","authors":"Natalia M. Markovich, Igor V. Rodionov","doi":"10.1080/10485252.2023.2266050","DOIUrl":"https://doi.org/10.1080/10485252.2023.2266050","url":null,"abstract":"ABSTRACTWe propose a new threshold selection method for nonparametric estimation of the extremal index of stochastic processes. The discrepancy method was proposed as a data-driven smoothing tool for estimation of a probability density function. Now it is modified to select a threshold parameter of an extremal index estimator. A modification of the discrepancy statistic based on the Cramér–von Mises–Smirnov statistic ω2 is calculated by k largest order statistics instead of an entire sample. Its asymptotic distribution as k→∞ is proved to coincide with the ω2-distribution. Its quantiles are used as discrepancy values. The convergence rate of an extremal index estimate coupled with the discrepancy method is derived. The discrepancy method is used as an automatic threshold selection for the intervals and K-gaps estimators. It may be applied to other estimators of the extremal index. The performance of our method is evaluated by simulated and real data examples.KEYWORDS: Cramér–von Mises–Smirnov statisticdiscrepancy methodextremal indexnonparametric estimationthreshold selectionAMS SUBJECT CLASSIFICATION:: 62G32 Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 The connection between (Equation1(1) ωn2=n∫−∞∞(Fn(x)−F(x))2dF(x)(1) ) and (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) can be found in Markovich (Citation2007, p. 81).2 Theoretically, events {Ti=1} are allowed. In practice, such cases related to single inter-arrival times between consecutive exceedances are meaningless.3 The modification (ω^n2−0.4/n+0.6/n2)(1+1/n) of classical statistic (Equation2(2) ω^n2(h)=∑i=1n(F^h(Xi,n)−i−0.5n)2+112n(2) ) eliminates the dependence of the percentage points of the C–M–S statistic on the sample size (Stephens Citation1974). For n>40 it changes the statistic on less than one percent. One can use the modification with regard to ω~L2(θ^) for finite L due to the closeness of its distribution to the limit distribution of the C–M–S statistic by Theorem 3.2.Additional informationFundingThe work of N.M. Markovich in Sections 1, 2, 4 and 5 was supported by the Russian Science Foundation [grant number 22-21-00177]. The work of I. V. Rodionov in Section 3 and proofs in Markovich and Rodionov (Citation2022) was performed at the Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences with the support of the Russian Science Foundation (grant No. 21-71-00035).","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135805169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-11DOI: 10.1080/10485252.2023.2266740
Mohammad Mohammadi, Meng Li
AbstractWe propose a novel approach for model-free time series forecasting. Unlike most existing methods, the proposed method does not rely on parametric error distributions nor assume parametric forms of the mean function, leading to broad applicability. We achieve such generality by establishing a simple but powerful representation of a time series {Xt;t∈Z} with suptE|Xt|<∞, that is, Xt has a solution which is a linear combination of infinite past values. Then using the obtained solution a prediction algorithm is presented, with large sample theoretical guarantees. Simulation studies show favourable performance of the proposed method compared with popular parametric and neural networks methods, and suggest its superiority when the sample size is small. An application to practical time series is discussed.Keywords: Predictionnonparametric methodsneural networksα-stable distributionMSC2010 subject classifications:: Primary: 60G25Secondary: 62M20 Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 See https://www.sciencedirect.com/topics/engineering/left-inverse.
{"title":"Model-free prediction of time series: a nonparametric approach","authors":"Mohammad Mohammadi, Meng Li","doi":"10.1080/10485252.2023.2266740","DOIUrl":"https://doi.org/10.1080/10485252.2023.2266740","url":null,"abstract":"AbstractWe propose a novel approach for model-free time series forecasting. Unlike most existing methods, the proposed method does not rely on parametric error distributions nor assume parametric forms of the mean function, leading to broad applicability. We achieve such generality by establishing a simple but powerful representation of a time series {Xt;t∈Z} with suptE|Xt|<∞, that is, Xt has a solution which is a linear combination of infinite past values. Then using the obtained solution a prediction algorithm is presented, with large sample theoretical guarantees. Simulation studies show favourable performance of the proposed method compared with popular parametric and neural networks methods, and suggest its superiority when the sample size is small. An application to practical time series is discussed.Keywords: Predictionnonparametric methodsneural networksα-stable distributionMSC2010 subject classifications:: Primary: 60G25Secondary: 62M20 Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 See https://www.sciencedirect.com/topics/engineering/left-inverse.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136097527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-19DOI: 10.1080/10485252.2023.2258999
Haiyan Liu, Jeanine Houwing-Duistermaat
In many studies on disease progression, biomarkers are restricted by detection limits, hence informatively missing. Current approaches ignore the problem by just filling in the value of the detection limit for the missing observations for the estimation of the mean and covariance function, which yield inaccurate estimation. Inspired by our recent work [Liu and Houwing-Duistermaat (2022), ‘Fast Estimators for the Mean Function for Functional Data with Detection Limits’, Stat, e467.] in which novel estimators for mean function for data subject to detection limit are proposed, in this paper, we will propose a novel estimator for the covariance function for sparse and dense data subject to a detection limit. We will derive the asymptotic properties of the estimator. We will compare our method to the standard method, which ignores the detection limit, via simulations. We will illustrate the new approach by analysing biomarker data subject to a detection limit. In contrast to the standard method, our method appeared to provide more accurate estimates of the covariance. Moreover its computation time is small.
{"title":"On estimation of covariance function for functional data with detection limits","authors":"Haiyan Liu, Jeanine Houwing-Duistermaat","doi":"10.1080/10485252.2023.2258999","DOIUrl":"https://doi.org/10.1080/10485252.2023.2258999","url":null,"abstract":"In many studies on disease progression, biomarkers are restricted by detection limits, hence informatively missing. Current approaches ignore the problem by just filling in the value of the detection limit for the missing observations for the estimation of the mean and covariance function, which yield inaccurate estimation. Inspired by our recent work [Liu and Houwing-Duistermaat (2022), ‘Fast Estimators for the Mean Function for Functional Data with Detection Limits’, Stat, e467.] in which novel estimators for mean function for data subject to detection limit are proposed, in this paper, we will propose a novel estimator for the covariance function for sparse and dense data subject to a detection limit. We will derive the asymptotic properties of the estimator. We will compare our method to the standard method, which ignores the detection limit, via simulations. We will illustrate the new approach by analysing biomarker data subject to a detection limit. In contrast to the standard method, our method appeared to provide more accurate estimates of the covariance. Moreover its computation time is small.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135106403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-19DOI: 10.1080/10485252.2023.2259011
Stephan Clémençon, Pierre Laforgue, Robin Vogel
AbstractIn practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how biasing models can remedy these problems. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition is that the supports of the biased distributions must partly overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach.Keywords: Sampling biasselection effectvisual recognitionreliable statistical learning Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis work was partially supported by the research chair ‘Good In Tech : Rethinking innovation and technology as drivers of a better world for and by humans’, under the auspices of the ‘Fondation du Risque’ and in partnership with the Institut Mines-Télécom, Sciences Po, Afnor, Ag2r La Mondiale, CGI France, Danone and Sycomore.
在实践中,特别是在训练深度神经网络时,视觉识别规则通常是基于各种信息来源学习的。另一方面,最近部署的面部识别系统在不同人群中表现不均匀,突出了数据集幼稚聚合引起的代表性问题。在本文中,我们展示了偏置模型如何解决这些问题。基于对工作中的偏倚机制的(近似)了解,我们的方法包括重新加权观测值,从而形成目标分布的近去偏估计量。一个关键条件是有偏分布的支持必须部分重叠,并覆盖目标分布的支持。为了在实践中满足这一要求,我们建议使用低维图像表示,在图像数据库中共享。最后,我们提供了数值实验,突出了我们方法的相关性。关键词:抽样偏倚选择效应视觉识别可靠统计学习披露声明作者未报告潜在的利益冲突。这项工作得到了“科技的好处:重新思考创新和技术作为人类和人类更美好世界的驱动力”研究主席的部分支持,该研究主席由“Risque基金会”主持,并与矿业研究所、巴黎政治学院、Afnor、Ag2r La Mondiale、CGI法国、达能和Sycomore合作。
{"title":"Fighting selection bias in statistical learning: application to visual recognition from biased image databases","authors":"Stephan Clémençon, Pierre Laforgue, Robin Vogel","doi":"10.1080/10485252.2023.2259011","DOIUrl":"https://doi.org/10.1080/10485252.2023.2259011","url":null,"abstract":"AbstractIn practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how biasing models can remedy these problems. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition is that the supports of the biased distributions must partly overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach.Keywords: Sampling biasselection effectvisual recognitionreliable statistical learning Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis work was partially supported by the research chair ‘Good In Tech : Rethinking innovation and technology as drivers of a better world for and by humans’, under the auspices of the ‘Fondation du Risque’ and in partnership with the Institut Mines-Télécom, Sciences Po, Afnor, Ag2r La Mondiale, CGI France, Danone and Sycomore.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135106810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper provides a necessary and sufficient condition for asymptotic efficiency of a nonparametric estimator of the generalised autocovariance function of a stationary random process. The generalised autocovariance function is the inverse Fourier transform of a power transformation of the spectral density and encompasses the traditional and inverse autocovariance functions as particular cases. A nonparametric estimator is based on the inverse discrete Fourier transform of the power transformation of the pooled periodogram. We consider two cases: the fixed bandwidth design and the adaptive bandwidth design. The general result on the asymptotic efficiency, established for linear processes, is then applied to the class of stationary ARMA processes and its implications are discussed. Finally, we illustrate that for a class of contrast functionals and spectral densities, the minimum contrast estimator of the spectral density satisfies a Yule–Walker system of equations in the generalised autocovariance estimator.
{"title":"Efficient nonparametric estimation of generalised autocovariances","authors":"Alessandra Luati, Francesca Papagni, Tommaso Proietti","doi":"10.1080/10485252.2023.2252527","DOIUrl":"https://doi.org/10.1080/10485252.2023.2252527","url":null,"abstract":"This paper provides a necessary and sufficient condition for asymptotic efficiency of a nonparametric estimator of the generalised autocovariance function of a stationary random process. The generalised autocovariance function is the inverse Fourier transform of a power transformation of the spectral density and encompasses the traditional and inverse autocovariance functions as particular cases. A nonparametric estimator is based on the inverse discrete Fourier transform of the power transformation of the pooled periodogram. We consider two cases: the fixed bandwidth design and the adaptive bandwidth design. The general result on the asymptotic efficiency, established for linear processes, is then applied to the class of stationary ARMA processes and its implications are discussed. Finally, we illustrate that for a class of contrast functionals and spectral densities, the minimum contrast estimator of the spectral density satisfies a Yule–Walker system of equations in the generalised autocovariance estimator.","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134950151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-02DOI: 10.1080/10485252.2023.2241572
N. Bayarassou, F. Hamrani, E. Ould Saïd
{"title":"Nonparametric relative error estimation of the regression function for left truncated and right censored time series data","authors":"N. Bayarassou, F. Hamrani, E. Ould Saïd","doi":"10.1080/10485252.2023.2241572","DOIUrl":"https://doi.org/10.1080/10485252.2023.2241572","url":null,"abstract":"","PeriodicalId":50112,"journal":{"name":"Journal of Nonparametric Statistics","volume":"30 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76728038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}