首页 > 最新文献

Journal of Statistical Planning and Inference最新文献

英文 中文
Robust Integrative Analysis via Quantile Regression with Homogeneity and Sparsity 通过具有同质性和稀疏性的量子回归进行稳健的综合分析
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-06-01 DOI: 10.1016/j.jspi.2024.106196
Hao Zeng , Chuang Wan , Wei Zhong , Tuo Liu

Integrative analysis plays a critical role in integrating heterogeneous data from multiple datasets to provide a comprehensive view of the overall data features. However, in multiple datasets, outliers and heavy-tailed data can render least squares estimation unreliable. In response, we propose a Robust Integrative Analysis via Quantile Regression (RIAQ) that accounts for homogeneity and sparsity in multiple datasets. The RIAQ approach is not only able to identify latent homogeneous coefficient structures but also recover the sparsity of high-dimensional covariates via double penalty terms. The integration of sample information across multiple datasets improves estimation efficiency, while a sparse model improves model interpretability. Furthermore, quantile regression allows the detection of subgroup structures under different quantile levels, providing a comprehensive picture of the relationship between response and high-dimensional covariates. We develop an efficient alternating direction method of multipliers (ADMM) algorithm to solve the optimization problem and study its convergence. We also derive the parameter selection consistency of the modified Bayesian information criterion. Numerical studies demonstrate that our proposed estimator has satisfactory finite-sample performance, especially in heavy-tailed cases.

整合分析在整合来自多个数据集的异构数据以提供整体数据特征的全面视图方面发挥着至关重要的作用。然而,在多个数据集中,异常值和重尾数据会使最小二乘法估计变得不可靠。为此,我们提出了一种考虑到多个数据集的同质性和稀疏性的 "稳健的定量回归综合分析法"(RIAQ)。RIAQ 方法不仅能识别潜在的同质系数结构,还能通过双重惩罚项恢复高维协变量的稀疏性。整合多个数据集的样本信息提高了估计效率,而稀疏模型则提高了模型的可解释性。此外,量子回归还可以检测不同量子水平下的亚组结构,从而全面反映响应与高维协变量之间的关系。我们开发了一种高效的交替乘法(ADMM)算法来解决优化问题,并对其收敛性进行了研究。我们还推导了修正贝叶斯信息准则的参数选择一致性。数值研究表明,我们提出的估计器具有令人满意的有限样本性能,尤其是在重尾情况下。
{"title":"Robust Integrative Analysis via Quantile Regression with Homogeneity and Sparsity","authors":"Hao Zeng ,&nbsp;Chuang Wan ,&nbsp;Wei Zhong ,&nbsp;Tuo Liu","doi":"10.1016/j.jspi.2024.106196","DOIUrl":"10.1016/j.jspi.2024.106196","url":null,"abstract":"<div><p>Integrative analysis plays a critical role in integrating heterogeneous data from multiple datasets to provide a comprehensive view of the overall data features. However, in multiple datasets, outliers and heavy-tailed data can render least squares estimation unreliable. In response, we propose a Robust Integrative Analysis via Quantile Regression (RIAQ) that accounts for homogeneity and sparsity in multiple datasets. The RIAQ approach is not only able to identify latent homogeneous coefficient structures but also recover the sparsity of high-dimensional covariates via double penalty terms. The integration of sample information across multiple datasets improves estimation efficiency, while a sparse model improves model interpretability. Furthermore, quantile regression allows the detection of subgroup structures under different quantile levels, providing a comprehensive picture of the relationship between response and high-dimensional covariates. We develop an efficient alternating direction method of multipliers (ADMM) algorithm to solve the optimization problem and study its convergence. We also derive the parameter selection consistency of the modified Bayesian information criterion. Numerical studies demonstrate that our proposed estimator has satisfactory finite-sample performance, especially in heavy-tailed cases.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing truncation dependence: The Gumbel–Barnett copula 测试截断依赖性Gumbel-Barnett copula
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-05-28 DOI: 10.1016/j.jspi.2024.106194
Anne-Marie Toparkus, Rafael Weißbach

In studies on lifetimes, occasionally, the population contains statistical units that are born before the data collection has started. Left-truncated are units that deceased before this start. For all other units, the age at the study start often is recorded and we aim at testing whether this second measurement is independent of the genuine measure of interest, the lifetime. Our basic model of dependence is the one-parameter Gumbel–Barnett copula. For simplicity, the marginal distribution of the lifetime is assumed to be Exponential and for the age-at-study-start, namely the distribution of birth dates, we assume a Uniform. Also for simplicity, and to fit our application, we assume that units that die later than our study period, are also truncated. As a result from point process theory, we can approximate the truncated sample by a Poisson process and thereby derive its likelihood. Identification, consistency and asymptotic distribution of the maximum-likelihood estimator are derived. Testing for positive truncation dependence must include the hypothetical independence which coincides with the boundary of the copula’s parameter space. By non-standard theory, the maximum likelihood estimator of the exponential and the copula parameter is distributed as a mixture of a two- and a one-dimensional normal distribution. For the proof, the third parameter, the unobservable sample size, is profiled out. An interesting result is, that it differs to view the data as truncated sample, or, as simple sample from the truncated population, but not by much. The application are 55 thousand double-truncated lifetimes of German businesses that closed down over the period 2014 to 2016. The likelihood has its maximum for the copula parameter at the parameter space boundary so that the p-value of test is 0.5. The life expectancy does not increase relative to the year of foundation. Using a Farlie–Gumbel–Morgenstern copula, which models positive and negative dependence, finds that life expectancy of German enterprises even decreases significantly over time. A simulation under the condition of the application suggests that the tests retain the nominal level and have good power.

在有关生命周期的研究中,人口中偶尔会包含在数据收集开始前出生的统计单位。左截断是指在数据收集开始前死亡的单位。对于所有其他单位,研究开始时的年龄往往会被记录下来,我们的目的是检验这第二个测量值是否独立于真正感兴趣的测量值,即寿命。我们的基本依赖模型是单参数 Gumbel-Barnett copula。为简单起见,我们假定寿命的边际分布为指数分布,而对于研究开始时的年龄,即出生日期的分布,我们假定为均匀分布。同样,为了简单起见,并符合我们的应用,我们假定晚于研究期死亡的单位也会被截断。根据点过程理论,我们可以用泊松过程来近似截断样本,从而得出其可能性。最大似然估计值的识别性、一致性和渐近分布也由此得出。检验正截断依赖性必须包括假设的独立性,这种独立性与 copula 参数空间的边界重合。根据非标准理论,指数和 copula 参数的最大似然估计值是二维正态分布和一维正态分布的混合分布。为了证明这一点,第三个参数,即不可观测的样本大小,被剖析出来。一个有趣的结果是,将数据视为截断样本或从截断人口中抽取的简单样本会有不同,但差别不大。应用的数据是 2014 年至 2016 年期间倒闭的 5.5 万家德国企业的双截断生命周期。在参数空间边界处,copula 参数的似然值为最大值,因此检验的 p 值为 0.5。预期寿命不会相对于成立年份而增加。使用建立正负依赖模型的 Farlie-Gumbel-Morgenstern copula 发现,德国企业的预期寿命甚至会随着时间的推移而显著下降。在应用条件下进行的模拟表明,检验结果保持了名义水平,并具有良好的说服力。
{"title":"Testing truncation dependence: The Gumbel–Barnett copula","authors":"Anne-Marie Toparkus,&nbsp;Rafael Weißbach","doi":"10.1016/j.jspi.2024.106194","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106194","url":null,"abstract":"<div><p>In studies on lifetimes, occasionally, the population contains statistical units that are born before the data collection has started. Left-truncated are units that deceased before this start. For all other units, the age at the study start often is recorded and we aim at testing whether this second measurement is independent of the genuine measure of interest, the lifetime. Our basic model of dependence is the one-parameter Gumbel–Barnett copula. For simplicity, the marginal distribution of the lifetime is assumed to be Exponential and for the age-at-study-start, namely the distribution of birth dates, we assume a Uniform. Also for simplicity, and to fit our application, we assume that units that die later than our study period, are also truncated. As a result from point process theory, we can approximate the truncated sample by a Poisson process and thereby derive its likelihood. Identification, consistency and asymptotic distribution of the maximum-likelihood estimator are derived. Testing for positive truncation dependence must include the hypothetical independence which coincides with the boundary of the copula’s parameter space. By non-standard theory, the maximum likelihood estimator of the exponential and the copula parameter is distributed as a mixture of a two- and a one-dimensional normal distribution. For the proof, the third parameter, the unobservable sample size, is profiled out. An interesting result is, that it differs to view the data as truncated sample, or, as simple sample from the truncated population, but not by much. The application are 55 thousand double-truncated lifetimes of German businesses that closed down over the period 2014 to 2016. The likelihood has its maximum for the copula parameter at the parameter space boundary so that the <span><math><mi>p</mi></math></span>-value of test is 0.5. The life expectancy does not increase relative to the year of foundation. Using a Farlie–Gumbel–Morgenstern copula, which models positive and negative dependence, finds that life expectancy of German enterprises even decreases significantly over time. A simulation under the condition of the application suggests that the tests retain the nominal level and have good power.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S037837582400051X/pdfft?md5=a5bc737bb68bd11a1a31f4aeb333c40e&pid=1-s2.0-S037837582400051X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141240222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of 2fi-optimal row–column designs 构建 2fi- 最佳行列设计
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-05-21 DOI: 10.1016/j.jspi.2024.106192
Yingnan Zhang, Jiangmin Pan, Lei Shi

Row–column designs that provide unconfounded estimation of all main effects and the maximum number of two-factor interactions (2fi’s) are called 2fi-optimal. This issue has been paid great attention recently for its wide application in industrial or physical experiments. The constructions of 2fi-optimal two-level and three-level full factorial and fractional factorial row–column designs have been proposed. However, the results for higher prime levels have not been achieved yet. In this paper, we give theoretical constructions of 2fi-optimal sn full factorial row–column designs for any odd prime level s and any parameter combination, and theoretical constructions of 2fi-optimal sn1 fractional factorial row–column designs for any prime level s and any parameter combination.

能对所有主效应和最大数量的双因素交互作用(2fi)进行无约束估计的行列式设计被称为 2fi 最佳设计。最近,这一问题因其在工业或物理实验中的广泛应用而备受关注。已有人提出了 2fi-optimal 两级和三级全因子和分数因子行列式设计的构造。但是,对于更高的素数级,目前还没有结果。在本文中,我们给出了针对任意奇数素数级 s 和任意参数组合的 2fi-optimal sn 全因子行列式设计的理论构造,以及针对任意素数级 s 和任意参数组合的 2fi-optimal sn-1 小数因子行列式设计的理论构造。
{"title":"Construction of 2fi-optimal row–column designs","authors":"Yingnan Zhang,&nbsp;Jiangmin Pan,&nbsp;Lei Shi","doi":"10.1016/j.jspi.2024.106192","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106192","url":null,"abstract":"<div><p>Row–column designs that provide unconfounded estimation of all main effects and the maximum number of two-factor interactions (2fi’s) are called 2fi-optimal. This issue has been paid great attention recently for its wide application in industrial or physical experiments. The constructions of 2fi-optimal two-level and three-level full factorial and fractional factorial row–column designs have been proposed. However, the results for higher prime levels have not been achieved yet. In this paper, we give theoretical constructions of 2fi-optimal <span><math><msup><mrow><mi>s</mi></mrow><mrow><mi>n</mi></mrow></msup></math></span> full factorial row–column designs for any odd prime level <span><math><mi>s</mi></math></span> and any parameter combination, and theoretical constructions of 2fi-optimal <span><math><msup><mrow><mi>s</mi></mrow><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msup></math></span> fractional factorial row–column designs for any prime level <span><math><mi>s</mi></math></span> and any parameter combination.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141164387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beta regression misspecification tests 贝塔回归失当检验
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-05-21 DOI: 10.1016/j.jspi.2024.106193
Francisco Cribari-Neto, José Jairo Santana-e-Silva, Klaus L.P. Vasconcellos

The beta regression model is tailored for responses that assume values in the standard unit interval. It comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which holds when the model is correctly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show that it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. An empirical application is presented and discussed.

贝塔回归模型适用于在标准单位区间内取值的响应。它包括两个子模型,一个用于平均响应,另一个用于精确参数。我们对这种模型的正确规范进行了检验。这些检验以信息矩阵相等为基础,当模型被正确指定时,信息矩阵相等成立。我们确定了检验在不同精度贝塔回归类中的有效性,提供了检验统计中使用的量的闭式表达式,并提出了检验的无效和非无效行为的模拟证据。我们表明,在采用数据重采样时,可以很好地控制 I 类错误概率,而且检验能够可靠地检测出错误的模型规范,尤其是在样本量不小的情况下。本文介绍并讨论了一个经验应用。
{"title":"Beta regression misspecification tests","authors":"Francisco Cribari-Neto,&nbsp;José Jairo Santana-e-Silva,&nbsp;Klaus L.P. Vasconcellos","doi":"10.1016/j.jspi.2024.106193","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106193","url":null,"abstract":"<div><p>The beta regression model is tailored for responses that assume values in the standard unit interval. It comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which holds when the model is correctly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show that it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. An empirical application is presented and discussed.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-normalized inference for stationarity of irregular spatial data 不规则空间数据静止性的自归一化推论
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-05-15 DOI: 10.1016/j.jspi.2024.106191
Richeng Hu , Ngai-Hang Chan , Rongmao Zhang

A self-normalized approach for testing the stationarity of a d-dimensional random field is considered in this paper. Because the discrete Fourier transforms (DFT) at fundamental frequencies of a second-order stationary random field are asymptotically uncorrelated (see Bandyopadhyay and Subba Rao, 2017), one can construct a stationarity test based on the sample covariance of the DFTs. Such a test is usually inferior because it involves an overestimated scale parameter that leads to low size and power. To circumvent this shortcoming, this paper proposes two self-normalized statistics based on extreme value and partial sum of the sample covariance of the DFTs. Under certain regularity conditions, it is shown that the proposed tests converge to functionals of Brownian motion. Simulations and a data analysis demonstrate the outstanding performance of the proposed tests.

本文考虑采用自归一化方法来测试 d 维随机场的静止性。由于二阶静止随机场基频的离散傅里叶变换(DFT)近似不相关(见 Bandyopadhyay 和 Subba Rao,2017 年),因此可以根据 DFT 的样本协方差构建静止性检验。这种检验通常效果较差,因为它涉及到一个被高估的尺度参数,导致规模和功率都较低。为了规避这一缺陷,本文提出了两种基于 DFT 样本协方差极值和偏和的自归一化统计量。在一定的正则条件下,本文证明了所提出的检验收敛于布朗运动的函数。模拟和数据分析证明了所提检验的卓越性能。
{"title":"Self-normalized inference for stationarity of irregular spatial data","authors":"Richeng Hu ,&nbsp;Ngai-Hang Chan ,&nbsp;Rongmao Zhang","doi":"10.1016/j.jspi.2024.106191","DOIUrl":"10.1016/j.jspi.2024.106191","url":null,"abstract":"<div><p>A self-normalized approach for testing the stationarity of a <span><math><mi>d</mi></math></span>-dimensional random field is considered in this paper. Because the discrete Fourier transforms (DFT) at fundamental frequencies of a second-order stationary random field are asymptotically uncorrelated (see Bandyopadhyay and Subba Rao, 2017), one can construct a stationarity test based on the sample covariance of the DFTs. Such a test is usually inferior because it involves an overestimated scale parameter that leads to low size and power. To circumvent this shortcoming, this paper proposes two self-normalized statistics based on extreme value and partial sum of the sample covariance of the DFTs. Under certain regularity conditions, it is shown that the proposed tests converge to functionals of Brownian motion. Simulations and a data analysis demonstrate the outstanding performance of the proposed tests.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141046356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reduced-bias estimation of the extreme conditional tail expectation for Box–Cox transforms of heavy-tailed distributions 重尾分布的 Box-Cox 变换的极端条件尾期望的减偏差估计
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-05-10 DOI: 10.1016/j.jspi.2024.106189
Michaël Allouche , Jonathan El Methni , Stéphane Girard

Conditional tail expectation (CTE) is a coherent risk measure defined as the mean of the loss distribution above a high quantile. The existence of the CTE as well as the asymptotic properties of associated estimators however require integrability conditions that may be violated when dealing with heavy-tailed distributions. We introduce Box–Cox transforms of the CTE that have two benefits. First, they alleviate these theoretical issues. Second, they enable to recover a number of risk measures such as conditional tail expectation, expected shortfall, conditional value-at-risk or conditional tail variance. The construction of dedicated estimators is based on the investigation of the asymptotic relationship between Box–Cox transforms of the CTE and quantiles at extreme probability levels, as well as on an extrapolation formula established in the heavy-tailed context. We quantify and estimate the bias induced by the use of these approximations and then introduce reduced-bias estimators whose asymptotic properties are rigorously shown. Their finite-sample properties are assessed on a simulation study and illustrated on real data, highlighting the practical interest of both the bias reduction and the Box–Cox transform.

条件尾期望(CTE)是一种连贯的风险度量,它被定义为高分量点以上损失分布的平均值。然而,CTE 的存在以及相关估计值的渐近特性需要可整性条件,而在处理重尾分布时,这些条件可能会被违反。我们引入的 CTE Box-Cox 变换有两个好处。首先,它们缓解了这些理论问题。其次,它们能够恢复一系列风险度量,如条件尾期望、预期缺口、条件风险值或条件尾方差。专用估计器的构建基于对 CTE 的 Box-Cox 变量与极端概率水平上的量化值之间渐近关系的研究,以及在重尾情况下建立的外推公式。我们对使用这些近似值所引起的偏差进行了量化和估计,然后引入了减少偏差估计器,并严格显示了其渐近特性。我们在模拟研究中评估了它们的有限样本特性,并在真实数据中进行了说明,从而突出了偏差减少和 Box-Cox 变换的实际意义。
{"title":"Reduced-bias estimation of the extreme conditional tail expectation for Box–Cox transforms of heavy-tailed distributions","authors":"Michaël Allouche ,&nbsp;Jonathan El Methni ,&nbsp;Stéphane Girard","doi":"10.1016/j.jspi.2024.106189","DOIUrl":"10.1016/j.jspi.2024.106189","url":null,"abstract":"<div><p>Conditional tail expectation (CTE) is a coherent risk measure defined as the mean of the loss distribution above a high quantile. The existence of the CTE as well as the asymptotic properties of associated estimators however require integrability conditions that may be violated when dealing with heavy-tailed distributions. We introduce Box–Cox transforms of the CTE that have two benefits. First, they alleviate these theoretical issues. Second, they enable to recover a number of risk measures such as conditional tail expectation, expected shortfall, conditional value-at-risk or conditional tail variance. The construction of dedicated estimators is based on the investigation of the asymptotic relationship between Box–Cox transforms of the CTE and quantiles at extreme probability levels, as well as on an extrapolation formula established in the heavy-tailed context. We quantify and estimate the bias induced by the use of these approximations and then introduce reduced-bias estimators whose asymptotic properties are rigorously shown. Their finite-sample properties are assessed on a simulation study and illustrated on real data, highlighting the practical interest of both the bias reduction and the Box–Cox transform.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141035013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient inference of parent-of-origin effect using case-control mother–child genotype data 利用病例对照母子基因型数据有效推断原生父母效应
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-05-09 DOI: 10.1016/j.jspi.2024.106190
Yuang Tian , Hong Zhang , Alexandre Bureau , Hagit Hochner , Jinbo Chen

Parent-of-origin effect plays an important role in mammal development and disorder. Case-control mother–child pair genotype data can be used to detect parent-of-origin effect and is often convenient to collect in practice. Most existing methods for assessing parent-of-origin effect do not incorporate any covariates, which may be required to control for confounding factors. We propose to model the parent-of-origin effect through a logistic regression model, with predictors including maternal and child genotypes, parental origins, and covariates. The parental origins may not be fully inferred from genotypes of a target genetic marker, so we propose to use genotypes of markers tightly linked to the target marker to increase inference efficiency. A robust statistical inference procedure is developed based on a modified profile log-likelihood in a retrospective way. A computationally feasible expectation–maximization algorithm is devised to estimate all unknown parameters involved in the modified profile log-likelihood. This algorithm differs from the conventional expectation–maximization algorithm in the sense that it is based on a modified instead of the original profile log-likelihood function. The convergence of the algorithm is established under some mild regularity conditions. This expectation–maximization algorithm also allows convenient handling of missing child genotypes. Large sample properties, including weak consistency, asymptotic normality, and asymptotic efficiency, are established for the proposed estimator under some mild regularity conditions. Finite sample properties are evaluated through extensive simulation studies and the application to a real dataset.

亲本效应在哺乳动物的发育和疾病中发挥着重要作用。病例对照母子配对基因型数据可用于检测亲本效应,而且在实践中通常很容易收集。大多数现有的评估亲本效应的方法都没有纳入任何协变量,而控制混杂因素可能需要协变量。我们建议通过逻辑回归模型来模拟父母-原籍效应,预测因子包括母子基因型、父母原籍和协变量。根据目标遗传标记的基因型可能无法完全推断出父母的来源,因此我们建议使用与目标标记紧密相连的标记的基因型来提高推断效率。我们开发了一种稳健的统计推断程序,该程序以追溯的方式基于修正的轮廓对数概率。设计了一种计算上可行的期望最大化算法来估计修正的剖面对数似然所涉及的所有未知参数。该算法与传统的期望最大化算法不同,它是基于修正后的轮廓对数似然函数,而不是原始的轮廓对数似然函数。该算法的收敛性是在一些温和的正则条件下确定的。这种期望最大化算法还能方便地处理缺失的子基因型。在一些温和的正则性条件下,为所提出的估计器建立了大样本特性,包括弱一致性、渐近正则性和渐近效率。通过广泛的模拟研究和对真实数据集的应用,对有限样本特性进行了评估。
{"title":"Efficient inference of parent-of-origin effect using case-control mother–child genotype data","authors":"Yuang Tian ,&nbsp;Hong Zhang ,&nbsp;Alexandre Bureau ,&nbsp;Hagit Hochner ,&nbsp;Jinbo Chen","doi":"10.1016/j.jspi.2024.106190","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106190","url":null,"abstract":"<div><p>Parent-of-origin effect plays an important role in mammal development and disorder. Case-control mother–child pair genotype data can be used to detect parent-of-origin effect and is often convenient to collect in practice. Most existing methods for assessing parent-of-origin effect do not incorporate any covariates, which may be required to control for confounding factors. We propose to model the parent-of-origin effect through a logistic regression model, with predictors including maternal and child genotypes, parental origins, and covariates. The parental origins may not be fully inferred from genotypes of a target genetic marker, so we propose to use genotypes of markers tightly linked to the target marker to increase inference efficiency. A robust statistical inference procedure is developed based on a modified profile log-likelihood in a retrospective way. A computationally feasible expectation–maximization algorithm is devised to estimate all unknown parameters involved in the modified profile log-likelihood. This algorithm differs from the conventional expectation–maximization algorithm in the sense that it is based on a modified instead of the original profile log-likelihood function. The convergence of the algorithm is established under some mild regularity conditions. This expectation–maximization algorithm also allows convenient handling of missing child genotypes. Large sample properties, including weak consistency, asymptotic normality, and asymptotic efficiency, are established for the proposed estimator under some mild regularity conditions. Finite sample properties are evaluated through extensive simulation studies and the application to a real dataset.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140950303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic count process 动态计数过程
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-26 DOI: 10.1016/j.jspi.2024.106187
Namhyun Kim , Pipat Wongsa-art , Yingcun Xia

The current paper aims to complement the recent development of the observation-driven models of dynamic counts with a parametric-driven one for a general case, particularly discrete two parameters exponential family distributions. The current paper proposes a finite semiparametric exponential mixture of SETAR processes of the conditional mean of counts to capture the nonlinearity and complexity. Because of the intrinsic latency of the conditional mean, the general additive state-space representation of dynamic counts is firstly proposed then stationarity and geometric ergodicity are established under a mild set of conditions. We also propose to estimate the unknown parameters by using quasi maximum likelihood estimation and establishes the asymptotic properties of the quasi maximum likelihood estimators (QMLEs), particularly T-consistency and normality under the relatively mild set of conditions. Furthermore, the finite sample properties of the QMLEs are investigated via simulation exercises and an illustration of the proposed process is presented by applying the proposed method to the intraday transaction counts per minute of AstraZeneca stock.

本文旨在用一般情况下的参数驱动动态计数模型,特别是离散双参数指数族分布模型,来补充观测驱动动态计数模型的最新发展。本文提出了一种计数条件均值 SETAR 过程的有限半参数指数混合物,以捕捉非线性和复杂性。由于条件均值的内在延迟性,本文首先提出了动态计数的一般加法状态空间表示法,然后在一组温和的条件下建立了静态性和几何遍历性。我们还提出用准极大似然估计法来估计未知参数,并建立了准极大似然估计器(QMLEs)的渐近特性,特别是在相对温和的条件下的 T 一致性和正态性。此外,还通过模拟练习研究了 QMLE 的有限样本特性,并通过将所提方法应用于阿斯利康股票每分钟的盘中交易计数,对所提过程进行了说明。
{"title":"A dynamic count process","authors":"Namhyun Kim ,&nbsp;Pipat Wongsa-art ,&nbsp;Yingcun Xia","doi":"10.1016/j.jspi.2024.106187","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106187","url":null,"abstract":"<div><p>The current paper aims to complement the recent development of the observation-driven models of dynamic counts with a parametric-driven one for a general case, particularly discrete two parameters exponential family distributions. The current paper proposes a finite semiparametric exponential mixture of SETAR processes of the conditional mean of counts to capture the nonlinearity and complexity. Because of the intrinsic latency of the conditional mean, the general additive state-space representation of dynamic counts is firstly proposed then stationarity and geometric ergodicity are established under a mild set of conditions. We also propose to estimate the unknown parameters by using quasi maximum likelihood estimation and establishes the asymptotic properties of the quasi maximum likelihood estimators (QMLEs), particularly <span><math><msqrt><mrow><mi>T</mi></mrow></msqrt></math></span>-consistency and normality under the relatively mild set of conditions. Furthermore, the finite sample properties of the QMLEs are investigated via simulation exercises and an illustration of the proposed process is presented by applying the proposed method to the intraday transaction counts per minute of AstraZeneca stock.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140894991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistency of the maximum likelihood estimator of population tree in a coalescent framework 聚合框架中种群树最大似然估计的一致性
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-21 DOI: 10.1016/j.jspi.2024.106172
Arindam RoyChoudhury

We present a proof of consistency of the maximum likelihood estimator (MLE) of population tree in a previously proposed coalescent model. As the model involves tree-topology as a parameter, the standard proof of consistency for continuous parameters does not directly apply. In addition to proving that a consistent sequence of MLE exists, we also prove that the overall MLE, computed by maximizing the likelihood over all tree-topologies, is also consistent. Thus, the MLE of tree-topology is consistent as well. The last result is important because local maxima occur in the likelihood of population trees, especially while maximizing the likelihood separately for each tree-topology. Even though MLE is known to be a dependable estimator under this model, our work proves its effectiveness with mathematical certainty.

我们提出了种群树最大似然估计值(MLE)在之前提出的聚合模型中的一致性证明。由于该模型涉及作为参数的树顶结构,因此连续参数的标准一致性证明并不直接适用。除了证明存在一致的 MLE 序列外,我们还证明了通过最大化所有树状结构的似然计算得出的整体 MLE 也是一致的。因此,树状结构的 MLE 也是一致的。最后一个结果非常重要,因为种群树的可能性会出现局部最大值,尤其是在对每种树形分别进行可能性最大化时。尽管众所周知 MLE 是该模型下可靠的估计器,但我们的工作还是用数学上的确定性证明了它的有效性。
{"title":"Consistency of the maximum likelihood estimator of population tree in a coalescent framework","authors":"Arindam RoyChoudhury","doi":"10.1016/j.jspi.2024.106172","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106172","url":null,"abstract":"<div><p>We present a proof of consistency of the maximum likelihood estimator (MLE) of population tree in a previously proposed coalescent model. As the model involves tree-topology as a parameter, the standard proof of consistency for continuous parameters does not directly apply. In addition to proving that a consistent sequence of MLE exists, we also prove that the overall MLE, computed by maximizing the likelihood over all tree-topologies, is also consistent. Thus, the MLE of tree-topology is consistent as well. The last result is important because local maxima occur in the likelihood of population trees, especially while maximizing the likelihood separately for each tree-topology. Even though MLE is known to be a dependable estimator under this model, our work proves its effectiveness with mathematical certainty.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140632639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented projection Wasserstein distances: Multi-dimensional projection with neural surface 增强投影瓦瑟斯坦距离:带神经表面的多维投影
IF 0.9 4区 数学 Q2 Mathematics Pub Date : 2024-04-19 DOI: 10.1016/j.jspi.2024.106185
Miyu Sugimoto , Ryo Okano , Masaaki Imaizumi

The Wasserstein distance is a fundamental tool for comparing probability distributions and has found broad applications in various fields, including image generation using generative adversarial networks. Despite its useful properties, the performance of the Wasserstein distance decreases when data is high-dimensional, known as the curse of dimensionality. To mitigate this issue, an extension of the Wasserstein distance has been developed, such as the sliced Wasserstein distance using one-dimensional projection. However, such an extension loses information on the original data, due to the linear projection onto the one-dimensional space. In this paper, we propose novel distances named augmented projection Wasserstein distances (APWDs) to address these issues, which utilize multi-dimensional projection with a nonlinear surface by a neural network. The APWDs employ a two-step procedure; it first maps data onto a nonlinear surface by a neural network, then linearly projects the mapped data into a multidimensional space. We also give an algorithm to select a subspace for the multi-dimensional projection. The APWDs are computationally effective while preserving nonlinear information of data. We theoretically confirm that the APWDs mitigate the curse of dimensionality from data. Our experiments demonstrate the APWDs’ outstanding performance and robustness to noise, particularly in the context of nonlinear high-dimensional data.

瓦瑟斯坦距离是比较概率分布的基本工具,在各个领域都有广泛应用,包括使用生成式对抗网络生成图像。尽管瓦瑟斯坦距离具有有用的特性,但当数据维度较高时,它的性能就会下降,这就是所谓的 "维度诅咒"。为了缓解这一问题,人们开发了瓦瑟斯坦距离的扩展,如使用一维投影的切片瓦瑟斯坦距离。然而,由于要线性投影到一维空间,这种扩展会丢失原始数据的信息。为了解决这些问题,我们在本文中提出了名为 "增强投影瓦瑟斯坦距离(APWD)"的新型距离,它利用神经网络的非线性表面进行多维投影。APWD 采用两步程序:首先通过神经网络将数据映射到非线性曲面上,然后将映射数据线性投影到多维空间中。我们还给出了一种为多维投影选择子空间的算法。APWD 在保留数据非线性信息的同时,计算效率也很高。我们从理论上证实,APWD 可减轻数据的维度诅咒。我们的实验证明了 APWDs 的卓越性能和对噪声的鲁棒性,尤其是在非线性高维数据的情况下。
{"title":"Augmented projection Wasserstein distances: Multi-dimensional projection with neural surface","authors":"Miyu Sugimoto ,&nbsp;Ryo Okano ,&nbsp;Masaaki Imaizumi","doi":"10.1016/j.jspi.2024.106185","DOIUrl":"https://doi.org/10.1016/j.jspi.2024.106185","url":null,"abstract":"<div><p>The Wasserstein distance is a fundamental tool for comparing probability distributions and has found broad applications in various fields, including image generation using generative adversarial networks. Despite its useful properties, the performance of the Wasserstein distance decreases when data is high-dimensional, known as the curse of dimensionality. To mitigate this issue, an extension of the Wasserstein distance has been developed, such as the sliced Wasserstein distance using one-dimensional projection. However, such an extension loses information on the original data, due to the linear projection onto the one-dimensional space. In this paper, we propose novel distances named augmented projection Wasserstein distances (APWDs) to address these issues, which utilize multi-dimensional projection with a nonlinear surface by a neural network. The APWDs employ a two-step procedure; it first maps data onto a nonlinear surface by a neural network, then linearly projects the mapped data into a multidimensional space. We also give an algorithm to select a subspace for the multi-dimensional projection. The APWDs are computationally effective while preserving nonlinear information of data. We theoretically confirm that the APWDs mitigate the curse of dimensionality from data. Our experiments demonstrate the APWDs’ outstanding performance and robustness to noise, particularly in the context of nonlinear high-dimensional data.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0378375824000429/pdfft?md5=d9eef2f8ec0fb76099ca4281dc2a0b63&pid=1-s2.0-S0378375824000429-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140632638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Statistical Planning and Inference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1