首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Empirical likelihood in a partially linear single-index model with censored response data 有删减响应数据的部分线性单指数模型中的经验似然法
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2024-01-10 DOI: 10.1016/j.csda.2023.107912
Liugen Xue

An empirical likelihood (EL) approach for a partial linear single-index model with censored response data is studied. A bias-corrected EL ratio is proposed, and the asymptotic chi-squared distribution of this ratio is obtained. The result can be directly used to construct the confidence regions of the regression parameters. The estimators of regression parameters and link function are constructed, and their asymptotic distributions are obtained. Also, a confidence band of the link function is constructed. The proposed method has two main features: The first feature is that the EL ratio is calibrated directly from within, instead of multiplying an adjustment factor by an EL ratio, which reflects the nature of EL. The second feature is avoiding undersmoothing of nonparametric functions, thus ensuring that the n-consistency of the parameter estimator. As a byproduct, the EL and estimation of a single-index model with censored response data are studied. The performance of the bias-corrected EL is evaluated by the simulation studies. The proposed method is illustrated with an example of a real data analysis.

本文研究了具有删失响应数据的偏线性单指数模型的经验似然法(EL)。提出了一种偏差校正 EL 比率,并得到了该比率的渐近奇平方分布。该结果可直接用于构建回归参数的置信区间。构建了回归参数和链接函数的估计值,并得到了它们的渐近分布。此外,还构建了链接函数的置信区间。所提出的方法有两个主要特点:第一个特点是直接从内部校准 EL 比率,而不是用 EL 比率乘以调整系数,这反映了 EL 的性质。第二个特点是避免非参数函数的欠平滑,从而确保参数估计值的 n 一致性。作为副产品,我们研究了有删减响应数据的单指数模型的 EL 和估计。通过模拟研究评估了偏差校正 EL 的性能。并以真实数据分析为例说明了所提出的方法。
{"title":"Empirical likelihood in a partially linear single-index model with censored response data","authors":"Liugen Xue","doi":"10.1016/j.csda.2023.107912","DOIUrl":"10.1016/j.csda.2023.107912","url":null,"abstract":"<div><p><span><span>An empirical likelihood (EL) approach for a partial linear single-index model with censored response data is studied. A bias-corrected EL ratio is proposed, and the asymptotic chi-squared distribution of this ratio is obtained. The result can be directly used to construct the confidence regions of the regression parameters. The estimators of regression parameters and link function are constructed, and their </span>asymptotic distributions are obtained. Also, a confidence band of the link function is constructed. The proposed method has two main features: The first feature is that the EL ratio is calibrated directly from within, instead of multiplying an adjustment factor by an EL ratio, which reflects the nature of EL. The second feature is avoiding undersmoothing of nonparametric functions, thus ensuring that the </span><span><math><msqrt><mrow><mi>n</mi></mrow></msqrt></math></span>-consistency of the parameter estimator. As a byproduct, the EL and estimation of a single-index model with censored response data are studied. The performance of the bias-corrected EL is evaluated by the simulation studies. The proposed method is illustrated with an example of a real data analysis.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139456535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized latent space model for one-mode networks with awareness of two-mode networks 具有双模网络意识的单模网络广义潜空间模型
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2024-01-10 DOI: 10.1016/j.csda.2023.107915
Xinyan Fan , Kuangnan Fang , Dan Pu , Ruixuan Qin

Latent space models have been widely studied for one-mode networks, in which the same type of nodes connect with each other. In many applications, one-mode networks are often observed along with two-mode networks, which reflect connections between different types of nodes and provide important information for understanding the one-mode network structure. However, the classical one-mode latent space models have several limitations in incorporating two-mode networks. To address this gap, a generalized latent space model is proposed to capture common structures and heterogeneous connecting patterns across one-mode and two-mode networks. Specifically, each node is embedded with a latent vector and network-specific degree parameters that determine the connection probabilities between nodes. A projected gradient descent algorithm is developed to estimate the latent vectors and degree parameters. Moreover, the theoretical properties of the estimators are established and it has been proven that the estimation accuracy of the shared latent vectors can be improved through incorporating two-mode networks. Finally, simulation studies and applications on two real-world datasets demonstrate the usefulness of the proposed model.

潜空间模型是针对单模网络进行广泛研究的,在单模网络中,相同类型的节点相互连接。在许多应用中,一模网络通常与二模网络一起观察,二模网络反映了不同类型节点之间的连接,为理解一模网络结构提供了重要信息。然而,经典的单模潜空间模型在纳入双模网络方面存在一些局限性。针对这一缺陷,我们提出了一种广义潜空间模型,以捕捉一模和二模网络的共同结构和异质连接模式。具体来说,每个节点都嵌入了一个潜在向量和网络特定度参数,这些参数决定了节点之间的连接概率。我们开发了一种投影梯度下降算法来估计潜在向量和度参数。此外,还建立了估计器的理论属性,并证明了通过结合双模式网络可以提高共享潜向量的估计精度。最后,在两个真实世界数据集上的模拟研究和应用证明了所提模型的实用性。
{"title":"Generalized latent space model for one-mode networks with awareness of two-mode networks","authors":"Xinyan Fan ,&nbsp;Kuangnan Fang ,&nbsp;Dan Pu ,&nbsp;Ruixuan Qin","doi":"10.1016/j.csda.2023.107915","DOIUrl":"10.1016/j.csda.2023.107915","url":null,"abstract":"<div><p>Latent space models have been widely studied for one-mode networks, in which the same type of nodes connect with each other. In many applications, one-mode networks are often observed along with two-mode networks, which reflect connections between different types of nodes and provide important information for understanding the one-mode network structure. However, the classical one-mode latent space models have several limitations in incorporating two-mode networks. To address this gap, a generalized latent space model is proposed to capture common structures and heterogeneous connecting patterns across one-mode and two-mode networks. Specifically, each node is embedded with a latent vector and network-specific degree parameters that determine the connection probabilities<span> between nodes. A projected gradient descent algorithm is developed to estimate the latent vectors and degree parameters. Moreover, the theoretical properties of the estimators are established and it has been proven that the estimation accuracy of the shared latent vectors can be improved through incorporating two-mode networks. Finally, simulation studies and applications on two real-world datasets demonstrate the usefulness of the proposed model.</span></p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139455082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oracle-efficient estimation and trend inference in non-stationary time series with trend and heteroscedastic ARMA error 具有趋势和异方差 ARMA 误差的非平稳时间序列中的 Oracle 高效估计和趋势推断
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2024-01-09 DOI: 10.1016/j.csda.2024.107917
Chen Zhong

The non-stationary time series often contain an unknown trend and unobserved error terms. The error terms in the proposed model consist of a smooth variance function and the latent stationary ARMA series, which allows heteroscedasticity at different time points. The theoretically justified two-step B-spline estimation method is proposed for the trend and variance function in the model, and then residuals are obtained by removing the trend and variance function estimators from the data. The maximum likelihood estimator (MLE) for the latent ARMA error coefficients based on the residuals is shown to be oracally efficient in the sense that it has the same asymptotic distribution as the infeasible MLE if the trend and variance function were known. In addition to the oracle efficiency, a kernel estimator is obtained for the trend function and shown to converge to the Gumbel distribution. It yields an asymptotically correct simultaneous confidence band (SCB) for the trend function, which can be used to test the specific form of trend. A simulation-based procedure is proposed to implement the SCB, and simulation and real data analysis illustrate the finite sample performance.

非平稳时间序列通常包含未知趋势和未观测到的误差项。拟议模型中的误差项由平稳方差函数和潜在的静态 ARMA 序列组成,允许不同时间点存在异方差。对模型中的趋势和方差函数提出了理论上合理的两步 B-样条估计方法,然后通过从数据中去除趋势和方差函数估计器得到残差。结果表明,基于残差的潜在 ARMA 误差系数最大似然估计器(MLE)是有效的,因为它与已知趋势和方差函数的不可行 MLE 具有相同的渐近分布。除了神谕效率外,还获得了趋势函数的核估计器,并证明其收敛于 Gumbel 分布。它为趋势函数提供了一个渐近正确的同步置信带(SCB),可用于检验趋势的具体形式。提出了一个基于模拟的程序来实现 SCB,并通过模拟和实际数据分析说明了有限样本的性能。
{"title":"Oracle-efficient estimation and trend inference in non-stationary time series with trend and heteroscedastic ARMA error","authors":"Chen Zhong","doi":"10.1016/j.csda.2024.107917","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107917","url":null,"abstract":"<div><p><span>The non-stationary time series often contain an unknown trend and unobserved error terms. The error terms in the proposed model consist of a smooth variance function and the latent stationary ARMA series, which allows heteroscedasticity at different </span>time points<span>. The theoretically justified two-step B-spline estimation method is proposed for the trend and variance function in the model, and then residuals are obtained by removing the trend and variance function estimators from the data. The maximum likelihood estimator<span><span><span> (MLE) for the latent ARMA error coefficients based on the residuals is shown to be oracally efficient in the sense that it has the same </span>asymptotic distribution<span> as the infeasible MLE if the trend and variance function were known. In addition to the oracle efficiency, a kernel estimator is obtained for the trend function and shown to converge to the </span></span>Gumbel distribution. It yields an asymptotically correct simultaneous confidence band (SCB) for the trend function, which can be used to test the specific form of trend. A simulation-based procedure is proposed to implement the SCB, and simulation and real data analysis illustrate the finite sample performance.</span></span></p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139433912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Change point detection via feedforward neural networks with theoretical guarantees 通过具有理论保证的前馈神经网络进行变化点检测
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2024-01-09 DOI: 10.1016/j.csda.2023.107913
Houlin Zhou, Hanbing Zhu, Xuejun Wang

This article mainly studies change point detection for mean shift change point model. An estimation method is proposed to estimate the change point via feedforward neural networks. The complete f-moment consistency of the proposed estimator is obtained. Numerical simulation results show that the performance of the proposed estimator is better than that of cumulative sum type estimator which is widely used in the change point detection, especially when the mean shift signal size is small. Finally, we demonstrate the proposed method by empirically analyzing a stock data set.

本文主要研究均值移动变化点模型的变化点检测。提出了一种通过前馈神经网络估计变化点的估计方法。得到了所提估计器的完全 F-时刻一致性。数值模拟结果表明,所提估计器的性能优于变化点检测中广泛使用的累积和式估计器,尤其是当均值偏移信号较小时。最后,我们通过对股票数据集进行实证分析,证明了所提出的方法。
{"title":"Change point detection via feedforward neural networks with theoretical guarantees","authors":"Houlin Zhou,&nbsp;Hanbing Zhu,&nbsp;Xuejun Wang","doi":"10.1016/j.csda.2023.107913","DOIUrl":"https://doi.org/10.1016/j.csda.2023.107913","url":null,"abstract":"<div><p><span>This article mainly studies change point detection for mean shift<span> change point model. An estimation method is proposed to estimate the change point via feedforward neural networks. The complete </span></span><em>f</em><span>-moment consistency of the proposed estimator is obtained. Numerical simulation results show that the performance of the proposed estimator is better than that of cumulative sum type estimator which is widely used in the change point detection, especially when the mean shift signal size is small. Finally, we demonstrate the proposed method by empirically analyzing a stock data set.</span></p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139434483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Group variable selection via group sparse neural network 通过群体稀疏神经网络选择群体变量
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2023-12-29 DOI: 10.1016/j.csda.2023.107911
Xin Zhang , Junlong Zhao

Group variable selection is an important issue in high-dimensional data modeling and most of existing methods consider only the linear model. Therefore, a new method based on the deep neural network (DNN), an increasingly popular nonlinear method in both statistics and deep learning communities, is proposed. The method is applicable to general nonlinear models, including the linear model as a special case. Specifically, a group sparse neural network (GSNN) is designed, where the definition of nonlinear group high-level features (NGHFs) is generalized to the network structure. A two-stage group sparse (TGS) algorithm is employed to induce group variables selection by performing group structure selection on the network. GSNN is promising for complex nonlinear systems with interactions and correlated predictors, overcoming the shortcomings of linear or marginal variable selection methods. Theoretical results on convergence and group-level selection consistency are also given. Simulations results and real data analysis demonstrate the superiority of our method.

分组变量选择是高维数据建模中的一个重要问题,而现有方法大多只考虑线性模型。因此,我们提出了一种基于深度神经网络(DNN)的新方法,这是一种在统计学和深度学习领域日益流行的非线性方法。该方法适用于一般非线性模型,包括作为特例的线性模型。具体来说,本文设计了一个组稀疏神经网络(GSNN),将非线性组高级特征(NGHFs)的定义泛化到网络结构中。采用两阶段组稀疏(TGS)算法,通过对网络进行组结构选择来诱导组变量选择。GSNN 对于具有交互作用和相关预测因子的复杂非线性系统很有前途,克服了线性或边际变量选择方法的缺点。本文还给出了收敛性和组级选择一致性的理论结果。模拟结果和实际数据分析证明了我们方法的优越性。
{"title":"Group variable selection via group sparse neural network","authors":"Xin Zhang ,&nbsp;Junlong Zhao","doi":"10.1016/j.csda.2023.107911","DOIUrl":"10.1016/j.csda.2023.107911","url":null,"abstract":"<div><p><span>Group variable selection is an important issue in high-dimensional data modeling and most of existing methods consider only the linear model. Therefore, a new method based on the deep neural network<span><span><span> (DNN), an increasingly popular nonlinear method in both statistics and </span>deep learning communities, is proposed. The method is applicable to general </span>nonlinear models, including the linear model as a special case. Specifically, a </span></span><span><em>group sparse </em><em>neural network</em></span> (GSNN) is designed, where the definition of <em>nonlinear group high-level features</em> (NGHFs) is generalized to the network structure. A <em>two-stage group sparse</em><span><span> (TGS) algorithm is employed to induce group variables selection by performing group structure selection on the network. GSNN is promising for complex nonlinear systems with interactions and </span>correlated predictors, overcoming the shortcomings of linear or marginal variable selection methods. Theoretical results on convergence and group-level selection consistency are also given. Simulations results and real data analysis demonstrate the superiority of our method.</span></p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139062656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiQR: An efficient algorithm for high-dimensional quadratic regression with penalties HiQR:带惩罚的高维二次回归高效算法
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2023-12-28 DOI: 10.1016/j.csda.2023.107904
Cheng Wang , Haozhe Chen , Binyan Jiang

This paper investigates the efficient solution of penalized quadratic regressions in high-dimensional settings. A novel and efficient algorithm for ridge-penalized quadratic regression is proposed, leveraging the matrix structures of the regression with interactions. Additionally, an alternating direction method of multipliers (ADMM) framework is developed for penalized quadratic regression with general penalties, including both single and hybrid penalty functions. The approach simplifies the calculations to basic matrix-based operations, making it appealing in terms of both memory storage and computational complexity for solving penalized quadratic regressions in high-dimensional settings.

本文研究了高维环境下惩罚性二次回归的高效解决方案。利用交互回归的矩阵结构,提出了一种新颖高效的脊惩罚二次回归算法。此外,还开发了一种交替方向乘法(ADMM)框架,用于具有一般惩罚的惩罚性二次回归,包括单一惩罚函数和混合惩罚函数。该方法将计算简化为基于矩阵的基本操作,因此在内存存储和计算复杂度方面都很有吸引力,可用于解决高维环境中的惩罚二次回归问题。
{"title":"HiQR: An efficient algorithm for high-dimensional quadratic regression with penalties","authors":"Cheng Wang ,&nbsp;Haozhe Chen ,&nbsp;Binyan Jiang","doi":"10.1016/j.csda.2023.107904","DOIUrl":"10.1016/j.csda.2023.107904","url":null,"abstract":"<div><p><span><span>This paper investigates the efficient solution of penalized quadratic regressions in high-dimensional settings. A novel and efficient algorithm for ridge-penalized quadratic regression is proposed, leveraging the matrix structures of the regression with interactions. Additionally, an </span>alternating direction method of multipliers (ADMM) framework is developed for penalized quadratic regression with general penalties, including both single and hybrid penalty functions. The approach simplifies the calculations to basic matrix-based operations, making it appealing in terms of both memory storage and </span>computational complexity for solving penalized quadratic regressions in high-dimensional settings.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139063073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subgroup detection based on partially linear additive individualized model with missing data in response 基于反应数据缺失的部分线性加法个体化模型的分组检测
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2023-12-20 DOI: 10.1016/j.csda.2023.107910
Tingting Cai , Jianbo Li , Qin Zhou , Songlou Yin , Riquan Zhang

Based on partially linear additive individualized model, a fusion-penalized inverse probability weighted least squares method is proposed to detect the subgroup for missing data in response. Firstly, the B-spline technique is used to approximate the unknown additive individualized functions and then an inverse probability weighted quadratic loss function is established with fusion penalty on the difference of subject-wise B-spline coefficients. Secondly, minimization of such quadratic loss function leads to the estimation of linear regression parameters and individualized B spline coefficients. With a proper tuning parameter, some differences in penalty term are shrunk into zero and thus the corresponding subjects will be clustered into the same subgroup. Thirdly, a clustering method is developed to automatically determine the subgroup membership for the subjects with missing data. Fourthly, large sample properties of resulting estimates are given under some regular conditions. Finally, numerical studies are presented to illustrate the performance of the proposed subgroup detection method.

基于部分线性加权个体化模型,提出了一种融合-惩罚逆概率加权最小二乘法来检测响应中的缺失数据子组。首先,使用 B-样条技术来近似未知的加法个体化函数,然后建立反概率加权二次损失函数,并对受试者的 B-样条系数之差进行融合惩罚。其次,通过最小化二次损失函数,可以估计线性回归参数和个性化 B 样条系数。通过适当的调整参数,惩罚项中的一些差异会被缩小为零,从而将相应的受试者聚类到同一分组中。第三,开发了一种聚类方法,用于自动确定数据缺失受试者的子群成员资格。第四,给出了在一些常规条件下估计结果的大样本特性。最后,通过数值研究说明了所提出的亚组检测方法的性能。
{"title":"Subgroup detection based on partially linear additive individualized model with missing data in response","authors":"Tingting Cai ,&nbsp;Jianbo Li ,&nbsp;Qin Zhou ,&nbsp;Songlou Yin ,&nbsp;Riquan Zhang","doi":"10.1016/j.csda.2023.107910","DOIUrl":"https://doi.org/10.1016/j.csda.2023.107910","url":null,"abstract":"<div><p><span>Based on partially linear additive individualized model, a fusion-penalized inverse probability<span> weighted least squares method<span> is proposed to detect the subgroup for missing data in response. Firstly, the B-spline technique is used to approximate the unknown additive individualized functions and then an inverse probability weighted quadratic loss function<span> is established with fusion penalty on the difference of subject-wise B-spline coefficients. Secondly, minimization of such quadratic loss function leads to the estimation of linear regression parameters<span> and individualized B spline coefficients. With a proper tuning parameter, some differences in penalty term are shrunk into zero and thus the corresponding subjects will be clustered into the same subgroup. Thirdly, a </span></span></span></span></span>clustering method<span> is developed to automatically determine the subgroup membership for the subjects with missing data. Fourthly, large sample properties of resulting estimates are given under some regular conditions. Finally, numerical studies are presented to illustrate the performance of the proposed subgroup detection method.</span></p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138838653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrepancy between structured matrices in the power analysis of a separability test 可分性检验功率分析中结构矩阵之间的差异
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2023-12-13 DOI: 10.1016/j.csda.2023.107907
Katarzyna Filipiak , Daniel Klein , Monika Mokrzycka

An important task in the analysis of multivariate data is testing of the covariance matrix structure. In particular, for assessing separability, various tests have been proposed. However, the development of a method of measuring discrepancy between two covariance matrix structures, in relation to the study of the power of the test, remains an open problem. Therefore, a discrepancy measure is proposed such that for two arbitrary alternative hypotheses with the same value of discrepancy, the power of tests remains stable, while for increasing discrepancy the power increases. The basic hypothesis is related to the separable structure of the observation matrix under a doubly multivariate normal model, as assessed by the likelihood ratio and Rao score tests. It is shown that the particular one-parameter method and the Frobenius norm fail in the power analysis of tests, while the entropy and quadratic loss functions can be efficiently used to measure the discrepancy between separable and non-separable covariance structures for a multivariate normal distribution.

多元数据分析中的一项重要任务是检验协方差矩阵结构。特别是在评估可分性方面,已经提出了各种检验方法。然而,如何开发一种方法来测量两个协方差矩阵结构之间的差异,从而研究检验的威力,仍然是一个有待解决的问题。因此,我们提出了一种差异度量方法,即对于差异值相同的两个任意替代假设,检验功率保持稳定,而对于差异值增大的假设,检验功率则会增大。基本假设与双多元正态模型下观察矩阵的可分离结构有关,可通过似然比和 Rao 分数检验进行评估。结果表明,特定的单参数方法和弗罗贝尼斯准则在检验功率分析中失效,而熵和二次损失函数可以有效地用于测量多元正态分布的可分离协方差结构和不可分离协方差结构之间的差异。
{"title":"Discrepancy between structured matrices in the power analysis of a separability test","authors":"Katarzyna Filipiak ,&nbsp;Daniel Klein ,&nbsp;Monika Mokrzycka","doi":"10.1016/j.csda.2023.107907","DOIUrl":"10.1016/j.csda.2023.107907","url":null,"abstract":"<div><p><span><span>An important task in the analysis of multivariate data is testing of the </span>covariance matrix<span><span> structure. In particular, for assessing separability, various tests have been proposed. However, the development of a method of measuring discrepancy between two covariance matrix structures, in relation to the study of the power of the test, remains an open problem. Therefore, a </span>discrepancy measure is proposed such that for two arbitrary alternative hypotheses with the same value of discrepancy, the power of tests remains stable, while for increasing discrepancy the power increases. The basic hypothesis is related to the separable structure of the </span></span>observation matrix<span><span><span><span> under a doubly multivariate normal model, as assessed by the likelihood ratio and Rao score tests. It is shown that the particular one-parameter method and the </span>Frobenius norm fail in the power analysis of tests, while the entropy and </span>quadratic loss functions<span> can be efficiently used to measure the discrepancy between separable and non-separable covariance structures for a </span></span>multivariate normal distribution.</span></p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138680557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Laplace-based model with flexible tail behavior 具有灵活尾部行为的拉普拉斯模型
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2023-12-12 DOI: 10.1016/j.csda.2023.107909
Cristina Tortora , Brian C. Franczak , Luca Bagnato , Antonio Punzo

The proposed multiple scaled contaminated asymmetric Laplace (MSCAL) distribution is an extension of the multivariate asymmetric Laplace distribution to allow for a different excess kurtosis on each dimension and for more flexible shapes of the hyper-contours. These peculiarities are obtained by working on the principal component (PC) space. The structure of the MSCAL distribution has the further advantage of allowing for automatic PC-wise outlier detection – i.e., detection of outliers separately on each PC – when convenient constraints on the parameters are imposed. The MSCAL is fitted using a Monte Carlo expectation-maximization (MCEM) algorithm that uses a Monte Carlo method to estimate the orthogonal matrix of eigenvectors. A simulation study is used to assess the proposed MCEM in terms of computational efficiency and parameter recovery. In a real data application, the MSCAL is fitted to a real data set containing the anthropometric measurements of monozygotic/dizygotic twins. Both a skewed bivariate subset of the full data, perturbed by some outlying points, and the full data are considered.

所提出的多重比例污染非对称拉普拉斯(MSCAL)分布是对多元非对称拉普拉斯分布的扩展,允许每个维度上不同的过量峰度和更灵活的超轮廓形状。这些特性都是通过在主成分(PC)空间工作而获得的。MSCAL 分布结构的另一个优点是,当对参数施加方便的约束条件时,可以自动检测 PC 中的离群值,即在每个 PC 上分别检测离群值。MSCAL 采用蒙特卡罗期望最大化(MCEM)算法拟合,该算法使用蒙特卡罗方法估计特征向量的正交矩阵。模拟研究用于评估所提出的 MCEM 在计算效率和参数恢复方面的效果。在真实数据应用中,MSCAL 适合于包含单卵/双卵双胞胎人体测量数据的真实数据集。既考虑了完整数据的偏斜双变量子集(受到一些离群点的扰动),也考虑了完整数据。
{"title":"A Laplace-based model with flexible tail behavior","authors":"Cristina Tortora ,&nbsp;Brian C. Franczak ,&nbsp;Luca Bagnato ,&nbsp;Antonio Punzo","doi":"10.1016/j.csda.2023.107909","DOIUrl":"10.1016/j.csda.2023.107909","url":null,"abstract":"<div><p>The proposed multiple scaled contaminated asymmetric Laplace (MSCAL) distribution is an extension of the multivariate asymmetric Laplace distribution to allow for a different excess kurtosis on each dimension and for more flexible shapes of the hyper-contours. These peculiarities are obtained by working on the principal component (PC) space. The structure of the MSCAL distribution has the further advantage of allowing for automatic PC-wise outlier detection – i.e., detection of outliers separately on each PC – when convenient constraints on the parameters are imposed. The MSCAL is fitted using a Monte Carlo expectation-maximization (MCEM) algorithm that uses a Monte Carlo method to estimate the orthogonal matrix of eigenvectors. A simulation study is used to assess the proposed MCEM in terms of computational efficiency and parameter recovery. In a real data application, the MSCAL is fitted to a real data set containing the anthropometric measurements of monozygotic/dizygotic twins. Both a skewed bivariate subset of the full data, perturbed by some outlying points, and the full data are considered.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947323002207/pdfft?md5=d2a7615bc71ed59a59a646714a4b93c6&pid=1-s2.0-S0167947323002207-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138680780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based spatial segmentation of areal data 基于图形的区域数据空间分割
IF 1.8 3区 数学 Q1 Mathematics Pub Date : 2023-12-12 DOI: 10.1016/j.csda.2023.107908
Vivien Goepp , Jan van de Kassteele

Smoothing is often used to improve the readability and interpretability of noisy areal data. However, there are many instances where the underlying quantity is discontinuous. For such cases, specific methods are needed to estimate the piecewise constant spatial process. A well-known approach in this setting is to perform segmentation of the signal using the adjacency graph, such as the graph-based fused lasso. However, this method does not scale well to large graphs. A new method is introduced for piecewise constant spatial estimation that (i) is faster to compute on large graphs and (ii) yields sparser models than the fused lasso (for the same amount of regularization), resulting in estimates that are easier to interpret. The method is illustrated on simulated data and applied to real data on overweight prevalence in the Netherlands. Healthy and unhealthy zones are identified, which cannot be explained by demographic or socio-economic characteristics alone. The method is found capable of identifying such zones and can assist policymakers with their health improving strategies.

平滑法通常用于提高噪声等值线数据的可读性和可解释性。然而,在许多情况下,基本量是不连续的。在这种情况下,需要使用特定的方法来估计片断恒定的空间过程。在这种情况下,一种众所周知的方法是使用邻接图对信号进行分割,例如基于图的融合套索。然而,这种方法不能很好地扩展到大型图。本文介绍了一种用于片断恒定空间估计的新方法,该方法(i)在大型图上计算速度更快,(ii)比融合套索(正则化程度相同)产生的模型更稀疏,从而使估计结果更易于解释。该方法在模拟数据上进行了说明,并应用于荷兰超重率的真实数据。确定了健康区和不健康区,这些区域不能仅由人口或社会经济特征来解释。该方法能够确定这些区域,有助于决策者制定改善健康的战略。
{"title":"Graph-based spatial segmentation of areal data","authors":"Vivien Goepp ,&nbsp;Jan van de Kassteele","doi":"10.1016/j.csda.2023.107908","DOIUrl":"10.1016/j.csda.2023.107908","url":null,"abstract":"<div><p><span>Smoothing is often used to improve the readability and interpretability of noisy areal data. However, there are many instances where the underlying quantity is discontinuous. For such cases, specific methods are needed to estimate the piecewise constant spatial process. A well-known approach in this setting is to perform segmentation of the signal using the adjacency graph, such as the graph-based fused lasso. However, this method does not scale well to large graphs. A new method is introduced for piecewise constant spatial estimation that </span><em>(i)</em> is faster to compute on large graphs and <em>(ii)</em> yields sparser models than the fused lasso (for the same amount of regularization), resulting in estimates that are easier to interpret. The method is illustrated on simulated data and applied to real data on overweight prevalence in the Netherlands. Healthy and unhealthy zones are identified, which cannot be explained by demographic or socio-economic characteristics alone. The method is found capable of identifying such zones and can assist policymakers with their health improving strategies.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.8,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138680561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1