Pub Date : 2024-06-01DOI: 10.1016/j.jspi.2024.106196
Hao Zeng , Chuang Wan , Wei Zhong , Tuo Liu
Integrative analysis plays a critical role in integrating heterogeneous data from multiple datasets to provide a comprehensive view of the overall data features. However, in multiple datasets, outliers and heavy-tailed data can render least squares estimation unreliable. In response, we propose a Robust Integrative Analysis via Quantile Regression (RIAQ) that accounts for homogeneity and sparsity in multiple datasets. The RIAQ approach is not only able to identify latent homogeneous coefficient structures but also recover the sparsity of high-dimensional covariates via double penalty terms. The integration of sample information across multiple datasets improves estimation efficiency, while a sparse model improves model interpretability. Furthermore, quantile regression allows the detection of subgroup structures under different quantile levels, providing a comprehensive picture of the relationship between response and high-dimensional covariates. We develop an efficient alternating direction method of multipliers (ADMM) algorithm to solve the optimization problem and study its convergence. We also derive the parameter selection consistency of the modified Bayesian information criterion. Numerical studies demonstrate that our proposed estimator has satisfactory finite-sample performance, especially in heavy-tailed cases.
{"title":"Robust Integrative Analysis via Quantile Regression with Homogeneity and Sparsity","authors":"Hao Zeng , Chuang Wan , Wei Zhong , Tuo Liu","doi":"10.1016/j.jspi.2024.106196","DOIUrl":"10.1016/j.jspi.2024.106196","url":null,"abstract":"<div><p>Integrative analysis plays a critical role in integrating heterogeneous data from multiple datasets to provide a comprehensive view of the overall data features. However, in multiple datasets, outliers and heavy-tailed data can render least squares estimation unreliable. In response, we propose a Robust Integrative Analysis via Quantile Regression (RIAQ) that accounts for homogeneity and sparsity in multiple datasets. The RIAQ approach is not only able to identify latent homogeneous coefficient structures but also recover the sparsity of high-dimensional covariates via double penalty terms. The integration of sample information across multiple datasets improves estimation efficiency, while a sparse model improves model interpretability. Furthermore, quantile regression allows the detection of subgroup structures under different quantile levels, providing a comprehensive picture of the relationship between response and high-dimensional covariates. We develop an efficient alternating direction method of multipliers (ADMM) algorithm to solve the optimization problem and study its convergence. We also derive the parameter selection consistency of the modified Bayesian information criterion. Numerical studies demonstrate that our proposed estimator has satisfactory finite-sample performance, especially in heavy-tailed cases.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1016/j.jspi.2024.106194
Anne-Marie Toparkus, Rafael Weißbach
In studies on lifetimes, occasionally, the population contains statistical units that are born before the data collection has started. Left-truncated are units that deceased before this start. For all other units, the age at the study start often is recorded and we aim at testing whether this second measurement is independent of the genuine measure of interest, the lifetime. Our basic model of dependence is the one-parameter Gumbel–Barnett copula. For simplicity, the marginal distribution of the lifetime is assumed to be Exponential and for the age-at-study-start, namely the distribution of birth dates, we assume a Uniform. Also for simplicity, and to fit our application, we assume that units that die later than our study period, are also truncated. As a result from point process theory, we can approximate the truncated sample by a Poisson process and thereby derive its likelihood. Identification, consistency and asymptotic distribution of the maximum-likelihood estimator are derived. Testing for positive truncation dependence must include the hypothetical independence which coincides with the boundary of the copula’s parameter space. By non-standard theory, the maximum likelihood estimator of the exponential and the copula parameter is distributed as a mixture of a two- and a one-dimensional normal distribution. For the proof, the third parameter, the unobservable sample size, is profiled out. An interesting result is, that it differs to view the data as truncated sample, or, as simple sample from the truncated population, but not by much. The application are 55 thousand double-truncated lifetimes of German businesses that closed down over the period 2014 to 2016. The likelihood has its maximum for the copula parameter at the parameter space boundary so that the -value of test is 0.5. The life expectancy does not increase relative to the year of foundation. Using a Farlie–Gumbel–Morgenstern copula, which models positive and negative dependence, finds that life expectancy of German enterprises even decreases significantly over time. A simulation under the condition of the application suggests that the tests retain the nominal level and have good power.
{"title":"Testing truncation dependence: The Gumbel–Barnett copula","authors":"Anne-Marie Toparkus, Rafael Weißbach","doi":"10.1016/j.jspi.2024.106194","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106194","url":null,"abstract":"<div><p>In studies on lifetimes, occasionally, the population contains statistical units that are born before the data collection has started. Left-truncated are units that deceased before this start. For all other units, the age at the study start often is recorded and we aim at testing whether this second measurement is independent of the genuine measure of interest, the lifetime. Our basic model of dependence is the one-parameter Gumbel–Barnett copula. For simplicity, the marginal distribution of the lifetime is assumed to be Exponential and for the age-at-study-start, namely the distribution of birth dates, we assume a Uniform. Also for simplicity, and to fit our application, we assume that units that die later than our study period, are also truncated. As a result from point process theory, we can approximate the truncated sample by a Poisson process and thereby derive its likelihood. Identification, consistency and asymptotic distribution of the maximum-likelihood estimator are derived. Testing for positive truncation dependence must include the hypothetical independence which coincides with the boundary of the copula’s parameter space. By non-standard theory, the maximum likelihood estimator of the exponential and the copula parameter is distributed as a mixture of a two- and a one-dimensional normal distribution. For the proof, the third parameter, the unobservable sample size, is profiled out. An interesting result is, that it differs to view the data as truncated sample, or, as simple sample from the truncated population, but not by much. The application are 55 thousand double-truncated lifetimes of German businesses that closed down over the period 2014 to 2016. The likelihood has its maximum for the copula parameter at the parameter space boundary so that the <span><math><mi>p</mi></math></span>-value of test is 0.5. The life expectancy does not increase relative to the year of foundation. Using a Farlie–Gumbel–Morgenstern copula, which models positive and negative dependence, finds that life expectancy of German enterprises even decreases significantly over time. A simulation under the condition of the application suggests that the tests retain the nominal level and have good power.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S037837582400051X/pdfft?md5=a5bc737bb68bd11a1a31f4aeb333c40e&pid=1-s2.0-S037837582400051X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141240222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-21DOI: 10.1016/j.jspi.2024.106192
Yingnan Zhang, Jiangmin Pan, Lei Shi
Row–column designs that provide unconfounded estimation of all main effects and the maximum number of two-factor interactions (2fi’s) are called 2fi-optimal. This issue has been paid great attention recently for its wide application in industrial or physical experiments. The constructions of 2fi-optimal two-level and three-level full factorial and fractional factorial row–column designs have been proposed. However, the results for higher prime levels have not been achieved yet. In this paper, we give theoretical constructions of 2fi-optimal full factorial row–column designs for any odd prime level and any parameter combination, and theoretical constructions of 2fi-optimal fractional factorial row–column designs for any prime level and any parameter combination.
能对所有主效应和最大数量的双因素交互作用(2fi)进行无约束估计的行列式设计被称为 2fi 最佳设计。最近,这一问题因其在工业或物理实验中的广泛应用而备受关注。已有人提出了 2fi-optimal 两级和三级全因子和分数因子行列式设计的构造。但是,对于更高的素数级,目前还没有结果。在本文中,我们给出了针对任意奇数素数级 s 和任意参数组合的 2fi-optimal sn 全因子行列式设计的理论构造,以及针对任意素数级 s 和任意参数组合的 2fi-optimal sn-1 小数因子行列式设计的理论构造。
{"title":"Construction of 2fi-optimal row–column designs","authors":"Yingnan Zhang, Jiangmin Pan, Lei Shi","doi":"10.1016/j.jspi.2024.106192","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106192","url":null,"abstract":"<div><p>Row–column designs that provide unconfounded estimation of all main effects and the maximum number of two-factor interactions (2fi’s) are called 2fi-optimal. This issue has been paid great attention recently for its wide application in industrial or physical experiments. The constructions of 2fi-optimal two-level and three-level full factorial and fractional factorial row–column designs have been proposed. However, the results for higher prime levels have not been achieved yet. In this paper, we give theoretical constructions of 2fi-optimal <span><math><msup><mrow><mi>s</mi></mrow><mrow><mi>n</mi></mrow></msup></math></span> full factorial row–column designs for any odd prime level <span><math><mi>s</mi></math></span> and any parameter combination, and theoretical constructions of 2fi-optimal <span><math><msup><mrow><mi>s</mi></mrow><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msup></math></span> fractional factorial row–column designs for any prime level <span><math><mi>s</mi></math></span> and any parameter combination.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141164387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-21DOI: 10.1016/j.jspi.2024.106193
Francisco Cribari-Neto, José Jairo Santana-e-Silva, Klaus L.P. Vasconcellos
The beta regression model is tailored for responses that assume values in the standard unit interval. It comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which holds when the model is correctly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show that it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. An empirical application is presented and discussed.
贝塔回归模型适用于在标准单位区间内取值的响应。它包括两个子模型,一个用于平均响应,另一个用于精确参数。我们对这种模型的正确规范进行了检验。这些检验以信息矩阵相等为基础,当模型被正确指定时,信息矩阵相等成立。我们确定了检验在不同精度贝塔回归类中的有效性,提供了检验统计中使用的量的闭式表达式,并提出了检验的无效和非无效行为的模拟证据。我们表明,在采用数据重采样时,可以很好地控制 I 类错误概率,而且检验能够可靠地检测出错误的模型规范,尤其是在样本量不小的情况下。本文介绍并讨论了一个经验应用。
{"title":"Beta regression misspecification tests","authors":"Francisco Cribari-Neto, José Jairo Santana-e-Silva, Klaus L.P. Vasconcellos","doi":"10.1016/j.jspi.2024.106193","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106193","url":null,"abstract":"<div><p>The beta regression model is tailored for responses that assume values in the standard unit interval. It comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which holds when the model is correctly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show that it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. An empirical application is presented and discussed.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-15DOI: 10.1016/j.jspi.2024.106191
Richeng Hu , Ngai-Hang Chan , Rongmao Zhang
A self-normalized approach for testing the stationarity of a -dimensional random field is considered in this paper. Because the discrete Fourier transforms (DFT) at fundamental frequencies of a second-order stationary random field are asymptotically uncorrelated (see Bandyopadhyay and Subba Rao, 2017), one can construct a stationarity test based on the sample covariance of the DFTs. Such a test is usually inferior because it involves an overestimated scale parameter that leads to low size and power. To circumvent this shortcoming, this paper proposes two self-normalized statistics based on extreme value and partial sum of the sample covariance of the DFTs. Under certain regularity conditions, it is shown that the proposed tests converge to functionals of Brownian motion. Simulations and a data analysis demonstrate the outstanding performance of the proposed tests.
{"title":"Self-normalized inference for stationarity of irregular spatial data","authors":"Richeng Hu , Ngai-Hang Chan , Rongmao Zhang","doi":"10.1016/j.jspi.2024.106191","DOIUrl":"10.1016/j.jspi.2024.106191","url":null,"abstract":"<div><p>A self-normalized approach for testing the stationarity of a <span><math><mi>d</mi></math></span>-dimensional random field is considered in this paper. Because the discrete Fourier transforms (DFT) at fundamental frequencies of a second-order stationary random field are asymptotically uncorrelated (see Bandyopadhyay and Subba Rao, 2017), one can construct a stationarity test based on the sample covariance of the DFTs. Such a test is usually inferior because it involves an overestimated scale parameter that leads to low size and power. To circumvent this shortcoming, this paper proposes two self-normalized statistics based on extreme value and partial sum of the sample covariance of the DFTs. Under certain regularity conditions, it is shown that the proposed tests converge to functionals of Brownian motion. Simulations and a data analysis demonstrate the outstanding performance of the proposed tests.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141046356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-10DOI: 10.1016/j.jspi.2024.106189
Michaël Allouche , Jonathan El Methni , Stéphane Girard
Conditional tail expectation (CTE) is a coherent risk measure defined as the mean of the loss distribution above a high quantile. The existence of the CTE as well as the asymptotic properties of associated estimators however require integrability conditions that may be violated when dealing with heavy-tailed distributions. We introduce Box–Cox transforms of the CTE that have two benefits. First, they alleviate these theoretical issues. Second, they enable to recover a number of risk measures such as conditional tail expectation, expected shortfall, conditional value-at-risk or conditional tail variance. The construction of dedicated estimators is based on the investigation of the asymptotic relationship between Box–Cox transforms of the CTE and quantiles at extreme probability levels, as well as on an extrapolation formula established in the heavy-tailed context. We quantify and estimate the bias induced by the use of these approximations and then introduce reduced-bias estimators whose asymptotic properties are rigorously shown. Their finite-sample properties are assessed on a simulation study and illustrated on real data, highlighting the practical interest of both the bias reduction and the Box–Cox transform.
{"title":"Reduced-bias estimation of the extreme conditional tail expectation for Box–Cox transforms of heavy-tailed distributions","authors":"Michaël Allouche , Jonathan El Methni , Stéphane Girard","doi":"10.1016/j.jspi.2024.106189","DOIUrl":"10.1016/j.jspi.2024.106189","url":null,"abstract":"<div><p>Conditional tail expectation (CTE) is a coherent risk measure defined as the mean of the loss distribution above a high quantile. The existence of the CTE as well as the asymptotic properties of associated estimators however require integrability conditions that may be violated when dealing with heavy-tailed distributions. We introduce Box–Cox transforms of the CTE that have two benefits. First, they alleviate these theoretical issues. Second, they enable to recover a number of risk measures such as conditional tail expectation, expected shortfall, conditional value-at-risk or conditional tail variance. The construction of dedicated estimators is based on the investigation of the asymptotic relationship between Box–Cox transforms of the CTE and quantiles at extreme probability levels, as well as on an extrapolation formula established in the heavy-tailed context. We quantify and estimate the bias induced by the use of these approximations and then introduce reduced-bias estimators whose asymptotic properties are rigorously shown. Their finite-sample properties are assessed on a simulation study and illustrated on real data, highlighting the practical interest of both the bias reduction and the Box–Cox transform.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141035013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1016/j.jspi.2024.106190
Yuang Tian , Hong Zhang , Alexandre Bureau , Hagit Hochner , Jinbo Chen
Parent-of-origin effect plays an important role in mammal development and disorder. Case-control mother–child pair genotype data can be used to detect parent-of-origin effect and is often convenient to collect in practice. Most existing methods for assessing parent-of-origin effect do not incorporate any covariates, which may be required to control for confounding factors. We propose to model the parent-of-origin effect through a logistic regression model, with predictors including maternal and child genotypes, parental origins, and covariates. The parental origins may not be fully inferred from genotypes of a target genetic marker, so we propose to use genotypes of markers tightly linked to the target marker to increase inference efficiency. A robust statistical inference procedure is developed based on a modified profile log-likelihood in a retrospective way. A computationally feasible expectation–maximization algorithm is devised to estimate all unknown parameters involved in the modified profile log-likelihood. This algorithm differs from the conventional expectation–maximization algorithm in the sense that it is based on a modified instead of the original profile log-likelihood function. The convergence of the algorithm is established under some mild regularity conditions. This expectation–maximization algorithm also allows convenient handling of missing child genotypes. Large sample properties, including weak consistency, asymptotic normality, and asymptotic efficiency, are established for the proposed estimator under some mild regularity conditions. Finite sample properties are evaluated through extensive simulation studies and the application to a real dataset.
{"title":"Efficient inference of parent-of-origin effect using case-control mother–child genotype data","authors":"Yuang Tian , Hong Zhang , Alexandre Bureau , Hagit Hochner , Jinbo Chen","doi":"10.1016/j.jspi.2024.106190","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106190","url":null,"abstract":"<div><p>Parent-of-origin effect plays an important role in mammal development and disorder. Case-control mother–child pair genotype data can be used to detect parent-of-origin effect and is often convenient to collect in practice. Most existing methods for assessing parent-of-origin effect do not incorporate any covariates, which may be required to control for confounding factors. We propose to model the parent-of-origin effect through a logistic regression model, with predictors including maternal and child genotypes, parental origins, and covariates. The parental origins may not be fully inferred from genotypes of a target genetic marker, so we propose to use genotypes of markers tightly linked to the target marker to increase inference efficiency. A robust statistical inference procedure is developed based on a modified profile log-likelihood in a retrospective way. A computationally feasible expectation–maximization algorithm is devised to estimate all unknown parameters involved in the modified profile log-likelihood. This algorithm differs from the conventional expectation–maximization algorithm in the sense that it is based on a modified instead of the original profile log-likelihood function. The convergence of the algorithm is established under some mild regularity conditions. This expectation–maximization algorithm also allows convenient handling of missing child genotypes. Large sample properties, including weak consistency, asymptotic normality, and asymptotic efficiency, are established for the proposed estimator under some mild regularity conditions. Finite sample properties are evaluated through extensive simulation studies and the application to a real dataset.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140950303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-26DOI: 10.1016/j.jspi.2024.106187
Namhyun Kim , Pipat Wongsa-art , Yingcun Xia
The current paper aims to complement the recent development of the observation-driven models of dynamic counts with a parametric-driven one for a general case, particularly discrete two parameters exponential family distributions. The current paper proposes a finite semiparametric exponential mixture of SETAR processes of the conditional mean of counts to capture the nonlinearity and complexity. Because of the intrinsic latency of the conditional mean, the general additive state-space representation of dynamic counts is firstly proposed then stationarity and geometric ergodicity are established under a mild set of conditions. We also propose to estimate the unknown parameters by using quasi maximum likelihood estimation and establishes the asymptotic properties of the quasi maximum likelihood estimators (QMLEs), particularly -consistency and normality under the relatively mild set of conditions. Furthermore, the finite sample properties of the QMLEs are investigated via simulation exercises and an illustration of the proposed process is presented by applying the proposed method to the intraday transaction counts per minute of AstraZeneca stock.
本文旨在用一般情况下的参数驱动动态计数模型,特别是离散双参数指数族分布模型,来补充观测驱动动态计数模型的最新发展。本文提出了一种计数条件均值 SETAR 过程的有限半参数指数混合物,以捕捉非线性和复杂性。由于条件均值的内在延迟性,本文首先提出了动态计数的一般加法状态空间表示法,然后在一组温和的条件下建立了静态性和几何遍历性。我们还提出用准极大似然估计法来估计未知参数,并建立了准极大似然估计器(QMLEs)的渐近特性,特别是在相对温和的条件下的 T 一致性和正态性。此外,还通过模拟练习研究了 QMLE 的有限样本特性,并通过将所提方法应用于阿斯利康股票每分钟的盘中交易计数,对所提过程进行了说明。
{"title":"A dynamic count process","authors":"Namhyun Kim , Pipat Wongsa-art , Yingcun Xia","doi":"10.1016/j.jspi.2024.106187","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106187","url":null,"abstract":"<div><p>The current paper aims to complement the recent development of the observation-driven models of dynamic counts with a parametric-driven one for a general case, particularly discrete two parameters exponential family distributions. The current paper proposes a finite semiparametric exponential mixture of SETAR processes of the conditional mean of counts to capture the nonlinearity and complexity. Because of the intrinsic latency of the conditional mean, the general additive state-space representation of dynamic counts is firstly proposed then stationarity and geometric ergodicity are established under a mild set of conditions. We also propose to estimate the unknown parameters by using quasi maximum likelihood estimation and establishes the asymptotic properties of the quasi maximum likelihood estimators (QMLEs), particularly <span><math><msqrt><mrow><mi>T</mi></mrow></msqrt></math></span>-consistency and normality under the relatively mild set of conditions. Furthermore, the finite sample properties of the QMLEs are investigated via simulation exercises and an illustration of the proposed process is presented by applying the proposed method to the intraday transaction counts per minute of AstraZeneca stock.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140894991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-21DOI: 10.1016/j.jspi.2024.106172
Arindam RoyChoudhury
We present a proof of consistency of the maximum likelihood estimator (MLE) of population tree in a previously proposed coalescent model. As the model involves tree-topology as a parameter, the standard proof of consistency for continuous parameters does not directly apply. In addition to proving that a consistent sequence of MLE exists, we also prove that the overall MLE, computed by maximizing the likelihood over all tree-topologies, is also consistent. Thus, the MLE of tree-topology is consistent as well. The last result is important because local maxima occur in the likelihood of population trees, especially while maximizing the likelihood separately for each tree-topology. Even though MLE is known to be a dependable estimator under this model, our work proves its effectiveness with mathematical certainty.
{"title":"Consistency of the maximum likelihood estimator of population tree in a coalescent framework","authors":"Arindam RoyChoudhury","doi":"10.1016/j.jspi.2024.106172","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106172","url":null,"abstract":"<div><p>We present a proof of consistency of the maximum likelihood estimator (MLE) of population tree in a previously proposed coalescent model. As the model involves tree-topology as a parameter, the standard proof of consistency for continuous parameters does not directly apply. In addition to proving that a consistent sequence of MLE exists, we also prove that the overall MLE, computed by maximizing the likelihood over all tree-topologies, is also consistent. Thus, the MLE of tree-topology is consistent as well. The last result is important because local maxima occur in the likelihood of population trees, especially while maximizing the likelihood separately for each tree-topology. Even though MLE is known to be a dependable estimator under this model, our work proves its effectiveness with mathematical certainty.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140632639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-19DOI: 10.1016/j.jspi.2024.106185
Miyu Sugimoto , Ryo Okano , Masaaki Imaizumi
The Wasserstein distance is a fundamental tool for comparing probability distributions and has found broad applications in various fields, including image generation using generative adversarial networks. Despite its useful properties, the performance of the Wasserstein distance decreases when data is high-dimensional, known as the curse of dimensionality. To mitigate this issue, an extension of the Wasserstein distance has been developed, such as the sliced Wasserstein distance using one-dimensional projection. However, such an extension loses information on the original data, due to the linear projection onto the one-dimensional space. In this paper, we propose novel distances named augmented projection Wasserstein distances (APWDs) to address these issues, which utilize multi-dimensional projection with a nonlinear surface by a neural network. The APWDs employ a two-step procedure; it first maps data onto a nonlinear surface by a neural network, then linearly projects the mapped data into a multidimensional space. We also give an algorithm to select a subspace for the multi-dimensional projection. The APWDs are computationally effective while preserving nonlinear information of data. We theoretically confirm that the APWDs mitigate the curse of dimensionality from data. Our experiments demonstrate the APWDs’ outstanding performance and robustness to noise, particularly in the context of nonlinear high-dimensional data.
{"title":"Augmented projection Wasserstein distances: Multi-dimensional projection with neural surface","authors":"Miyu Sugimoto , Ryo Okano , Masaaki Imaizumi","doi":"10.1016/j.jspi.2024.106185","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106185","url":null,"abstract":"<div><p>The Wasserstein distance is a fundamental tool for comparing probability distributions and has found broad applications in various fields, including image generation using generative adversarial networks. Despite its useful properties, the performance of the Wasserstein distance decreases when data is high-dimensional, known as the curse of dimensionality. To mitigate this issue, an extension of the Wasserstein distance has been developed, such as the sliced Wasserstein distance using one-dimensional projection. However, such an extension loses information on the original data, due to the linear projection onto the one-dimensional space. In this paper, we propose novel distances named augmented projection Wasserstein distances (APWDs) to address these issues, which utilize multi-dimensional projection with a nonlinear surface by a neural network. The APWDs employ a two-step procedure; it first maps data onto a nonlinear surface by a neural network, then linearly projects the mapped data into a multidimensional space. We also give an algorithm to select a subspace for the multi-dimensional projection. The APWDs are computationally effective while preserving nonlinear information of data. We theoretically confirm that the APWDs mitigate the curse of dimensionality from data. Our experiments demonstrate the APWDs’ outstanding performance and robustness to noise, particularly in the context of nonlinear high-dimensional data.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000429/pdfft?md5=d9eef2f8ec0fb76099ca4281dc2a0b63&pid=1-s2.0-S0378375824000429-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140632638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}