首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Minimum profile Hellinger distance estimation of general covariate models 一般协变量模型的最小剖面海灵格距离估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108054
Bowei Ding , Rohana J. Karunamuni , Jingjing Wu

Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.

协变量模型,如多项式回归模型、广义线性模型和异方差模型,在统计应用中被广泛使用。统计文献中有关协变量模型的文章越来越多,这充分说明了这些模型在统计分析中的重要性。由于协变量模型具有灵活性,因此越来越多的人将其作为一种方便的方法来建立数据模型,这种数据模型由一个响应变量和一个或多个影响响应变量结果的协变量组成。本文研究了广义半参数协变量模型的高效稳健估计,并为此采用了最小距离方法。一般来说,最小距离估计器对被估计量的稳定性具有自动稳健性。尤其是参数模型的最小海灵格距离估计,其估计值在模型密度上具有渐近效率,同时还具有极佳的稳健性。对于半参数协变量模型,对最小海灵格距离方法进行了扩展,并提出了最小轮廓海灵格距离估计器。通过蒙特卡罗模拟和三项真实数据分析,研究了其渐近特性(如一致性)、有限样本性能和稳健性。此外,还开发了一种计算算法来简化估计器的计算。
{"title":"Minimum profile Hellinger distance estimation of general covariate models","authors":"Bowei Ding ,&nbsp;Rohana J. Karunamuni ,&nbsp;Jingjing Wu","doi":"10.1016/j.csda.2024.108054","DOIUrl":"10.1016/j.csda.2024.108054","url":null,"abstract":"<div><p>Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108054"},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001385/pdfft?md5=cefa2d178122667194291a858ff4b934&pid=1-s2.0-S0167947324001385-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust direction estimation in single-index models via cumulative divergence 通过累积发散在单指数模型中进行稳健的方向估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108052
Shuaida He , Jiarui Zhang , Xin Chen

In this paper, we address direction estimation in single-index models, with a focus on heavy-tailed data applications. Our method utilizes cumulative divergence to directly capture the conditional mean dependence between the response variable and the index predictor, resulting in a model-free property that obviates the need for initial link function estimation. Furthermore, our approach allows heavy-tailed predictors and is robust against the presence of outliers, leveraging the rank-based nature of cumulative divergence. We establish theoretical properties for our proposal under mild regularity conditions and illustrate its solid performance through comprehensive simulations and real data analysis.

在本文中,我们讨论了单指数模型中的方向估计,重点是重尾数据应用。我们的方法利用累积发散来直接捕捉响应变量与指数预测因子之间的条件均值依赖关系,从而实现了无模型属性,无需进行初始链接函数估计。此外,我们的方法允许重尾预测因子,并利用累积发散基于等级的特性,对异常值的存在具有稳健性。我们在温和的规则性条件下为我们的建议建立了理论属性,并通过综合模拟和实际数据分析说明了它的可靠性能。
{"title":"Robust direction estimation in single-index models via cumulative divergence","authors":"Shuaida He ,&nbsp;Jiarui Zhang ,&nbsp;Xin Chen","doi":"10.1016/j.csda.2024.108052","DOIUrl":"10.1016/j.csda.2024.108052","url":null,"abstract":"<div><p>In this paper, we address direction estimation in single-index models, with a focus on heavy-tailed data applications. Our method utilizes cumulative divergence to directly capture the conditional mean dependence between the response variable and the index predictor, resulting in a model-free property that obviates the need for initial link function estimation. Furthermore, our approach allows heavy-tailed predictors and is robust against the presence of outliers, leveraging the rank-based nature of cumulative divergence. We establish theoretical properties for our proposal under mild regularity conditions and illustrate its solid performance through comprehensive simulations and real data analysis.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108052"},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian cluster validity index 贝叶斯聚类有效性指数
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108053
Onthada Preedasawakul , Nathakhun Wiroonsri

Selecting the appropriate number of clusters is a critical step in applying clustering algorithms. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their specific applications. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes either Dirichlet or generalized Dirichlet priors, resulting in the same posterior distribution. The proposed BCVI is evaluated using the Calinski-Harabasz, CVNN, Davies–Bouldin, silhouette, Starczewski, and Wiroonsri indices for hard clustering and the KWON2, Wiroonsri–Preedasawakul, and Xie–Beni indices for soft clustering as underlying indices. The performance of the proposed BCVI with that of the original underlying indices has been compared. The BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters. To illustrate this, experiments classified into three different scenarios are conducted. Additionally, the practical applicability of the proposed approach through real-world datasets, such as MRI brain tumor images are presented. These tools are published as a recent R package ‘BayesCVI’.

选择合适的聚类数量是应用聚类算法的关键一步。为了协助这一过程,人们开发了各种聚类有效性指数(CVI)。这些指数旨在确定数据集中的最佳聚类数量。然而,用户可能并不总是寻求绝对的最佳聚类数量,而是寻求更符合其特定应用的次要选项。这种认识促使我们在现有指数的基础上引入了贝叶斯聚类有效性指数(BCVI)。BCVI 采用 Dirichlet 或广义 Dirichlet 前验,产生相同的后验分布。使用 Calinski-Harabasz、CVNN、Davies-Bouldin、silhouette、Starczewski 和 Wiroonsri 指数作为硬聚类的基础指数,使用 KWON2、Wiroonsri-Preedasawakul 和 Xie-Beni 指数作为软聚类的基础指数,对提出的 BCVI 进行了评估。比较了提议的 BCVI 与原始基础指数的性能。BCVI 在用户专业知识非常宝贵的情况下具有明显的优势,允许用户指定其所需的最终聚类数量范围。为了说明这一点,我们进行了三种不同情况的实验。此外,还介绍了通过真实世界数据集(如核磁共振成像脑肿瘤图像)提出的方法的实际应用性。这些工具已作为最新的 R 软件包 "BayesCVI "发布。
{"title":"A Bayesian cluster validity index","authors":"Onthada Preedasawakul ,&nbsp;Nathakhun Wiroonsri","doi":"10.1016/j.csda.2024.108053","DOIUrl":"10.1016/j.csda.2024.108053","url":null,"abstract":"<div><p>Selecting the appropriate number of clusters is a critical step in applying clustering algorithms. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their specific applications. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes either Dirichlet or generalized Dirichlet priors, resulting in the same posterior distribution. The proposed BCVI is evaluated using the Calinski-Harabasz, CVNN, Davies–Bouldin, silhouette, Starczewski, and Wiroonsri indices for hard clustering and the KWON2, Wiroonsri–Preedasawakul, and Xie–Beni indices for soft clustering as underlying indices. The performance of the proposed BCVI with that of the original underlying indices has been compared. The BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters. To illustrate this, experiments classified into three different scenarios are conducted. Additionally, the practical applicability of the proposed approach through real-world datasets, such as MRI brain tumor images are presented. These tools are published as a recent R package ‘BayesCVI’.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108053"},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of the cumulant generating function for inference on time series 关于使用累积生成函数推断时间序列
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-28 DOI: 10.1016/j.csda.2024.108044
A. Moor, D. La Vecchia, E. Ronchetti

Innovative inference procedures for analyzing time series data are introduced. The methodology covers density approximation and composite hypothesis testing based on Whittle's estimator, which is a widely applied M-estimator in the frequency domain. Its core feature involves the cumulant generating function of Whittle's score obtained using an approximated distribution of the periodogram ordinates. A testing algorithm not only significantly expands the applicability of the state-of-the-art saddlepoint test, but also maintains the numerical accuracy of the saddlepoint approximation. Connections are made with three other prevalent frequency domain techniques: the bootstrap, empirical likelihood, and exponential tilting. Numerical examples using both simulated and real data illustrate the advantages and accuracy of the saddlepoint methods.

介绍了用于分析时间序列数据的创新推理程序。该方法涵盖了基于惠特尔估计器的密度近似和复合假设检验,惠特尔估计器是频域中广泛应用的 M 估计器。其核心特征是利用周期图序数的近似分布获得惠特尔评分的累积生成函数。测试算法不仅大大扩展了最先进的鞍点测试的适用性,而且保持了鞍点近似的数值精度。与其他三种流行的频域技术:自举法、经验似然法和指数倾斜法建立了联系。使用模拟和真实数据的数值示例说明了鞍点方法的优势和准确性。
{"title":"On the use of the cumulant generating function for inference on time series","authors":"A. Moor,&nbsp;D. La Vecchia,&nbsp;E. Ronchetti","doi":"10.1016/j.csda.2024.108044","DOIUrl":"10.1016/j.csda.2024.108044","url":null,"abstract":"<div><p>Innovative inference procedures for analyzing time series data are introduced. The methodology covers density approximation and composite hypothesis testing based on Whittle's estimator, which is a widely applied M-estimator in the frequency domain. Its core feature involves the cumulant generating function of Whittle's score obtained using an approximated distribution of the periodogram ordinates. A testing algorithm not only significantly expands the applicability of the state-of-the-art saddlepoint test, but also maintains the numerical accuracy of the saddlepoint approximation. Connections are made with three other prevalent frequency domain techniques: the bootstrap, empirical likelihood, and exponential tilting. Numerical examples using both simulated and real data illustrate the advantages and accuracy of the saddlepoint methods.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108044"},"PeriodicalIF":1.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001282/pdfft?md5=9b20083653468ba252743f2a96727926&pid=1-s2.0-S0167947324001282-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142098072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimax rates of convergence for sliced inverse regression with differential privacy 具有微分隐私的切片反回归的最小收敛率
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108041
Wenbiao Zhao , Xuehu Zhu , Lixing Zhu

Sliced inverse regression (SIR) is a highly efficient paradigm used for the purpose of dimension reduction by replacing high-dimensional covariates with a limited number of linear combinations. This paper focuses on the implementation of the classical SIR approach integrated with a Gaussian differential privacy mechanism to estimate the central space while preserving privacy. We illustrate the tradeoff between statistical accuracy and privacy in sufficient dimension reduction problems under both the classical low- dimensional and modern high-dimensional settings. Additionally, we achieve the minimax rate of the proposed estimator with Gaussian differential privacy constraint and illustrate that this rate is also optimal for multiple index models with bounded dimension of the central space. Extensive numerical studies on synthetic data sets are conducted to assess the effectiveness of the proposed technique in finite sample scenarios, and a real data analysis is presented to showcase its practical application.

切片反回归(SIR)是一种高效的范式,通过用数量有限的线性组合替代高维协变量来达到降维的目的。本文的重点是将经典的 SIR 方法与高斯差分隐私机制相结合,在保护隐私的同时估计中心空间。我们说明了在经典低维和现代高维设置下的充分降维问题中,统计精度和隐私之间的权衡。此外,我们还在高斯差分隐私约束下实现了所提估计器的最小率,并说明该率对于中心空间维度有界的多指数模型也是最优的。我们对合成数据集进行了广泛的数值研究,以评估所提出的技术在有限样本情况下的有效性,并通过实际数据分析展示了该技术的实际应用。
{"title":"Minimax rates of convergence for sliced inverse regression with differential privacy","authors":"Wenbiao Zhao ,&nbsp;Xuehu Zhu ,&nbsp;Lixing Zhu","doi":"10.1016/j.csda.2024.108041","DOIUrl":"10.1016/j.csda.2024.108041","url":null,"abstract":"<div><p>Sliced inverse regression (SIR) is a highly efficient paradigm used for the purpose of dimension reduction by replacing high-dimensional covariates with a limited number of linear combinations. This paper focuses on the implementation of the classical SIR approach integrated with a Gaussian differential privacy mechanism to estimate the central space while preserving privacy. We illustrate the tradeoff between statistical accuracy and privacy in sufficient dimension reduction problems under both the classical low- dimensional and modern high-dimensional settings. Additionally, we achieve the minimax rate of the proposed estimator with Gaussian differential privacy constraint and illustrate that this rate is also optimal for multiple index models with bounded dimension of the central space. Extensive numerical studies on synthetic data sets are conducted to assess the effectiveness of the proposed technique in finite sample scenarios, and a real data analysis is presented to showcase its practical application.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108041"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001257/pdfft?md5=cab1d33929cc2c1071e939e0580ca683&pid=1-s2.0-S0167947324001257-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142084124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test for the mean of high-dimensional functional time series 高维函数时间序列均值检验
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108040
Lin Yang , Zhenghui Feng , Qing Jiang

The one-sample test and two-sample test for the mean of high-dimensional functional time series are considered in this study. The proposed tests are built on the dimension-wise max-norm of the sum of squares of diverging projections. The null distribution of the test statistics is investigated using normal approximation, and the asymptotic behavior under the alternative is studied. The approach is robust to the cross-series dependence of unknown forms and magnitude. To approximate the critical values, a blockwise wild bootstrap method for functional time series is employed. Both fully and partially observed data are analyzed in theoretical research and numerical studies. Evidence from simulation studies and an IT stock data case study demonstrates the usefulness of the test in practice. The proposed methods have been implemented in a R package.

本研究考虑了高维函数时间序列均值的单样本检验和双样本检验。提出的检验建立在发散投影平方和的维度最大正值基础上。使用正态近似法研究了检验统计量的零分布,并研究了备选方案下的渐近行为。该方法对未知形式和幅度的跨序列依赖性具有鲁棒性。为了近似临界值,采用了功能时间序列的 blockwise wild bootstrap 方法。在理论研究和数值研究中,对完全观测数据和部分观测数据都进行了分析。来自模拟研究和 IT 股票数据案例研究的证据证明了该检验方法在实践中的实用性。所提出的方法已在 R 软件包中实现。
{"title":"Test for the mean of high-dimensional functional time series","authors":"Lin Yang ,&nbsp;Zhenghui Feng ,&nbsp;Qing Jiang","doi":"10.1016/j.csda.2024.108040","DOIUrl":"10.1016/j.csda.2024.108040","url":null,"abstract":"<div><p>The one-sample test and two-sample test for the mean of high-dimensional functional time series are considered in this study. The proposed tests are built on the dimension-wise max-norm of the sum of squares of diverging projections. The null distribution of the test statistics is investigated using normal approximation, and the asymptotic behavior under the alternative is studied. The approach is robust to the cross-series dependence of unknown forms and magnitude. To approximate the critical values, a blockwise wild bootstrap method for functional time series is employed. Both fully and partially observed data are analyzed in theoretical research and numerical studies. Evidence from simulation studies and an IT stock data case study demonstrates the usefulness of the test in practice. The proposed methods have been implemented in a R package.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108040"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001245/pdfft?md5=a3ba37187b9ba57e45af87f61b64c9c8&pid=1-s2.0-S0167947324001245-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142084125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Community influence analysis in social networks 社交网络中的社区影响力分析
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108037
Yuanxing Chen , Kuangnan Fang , Wei Lan , Chih-Ling Tsai , Qingzhao Zhang

Heterogeneous influence detection across network nodes is an important task in network analysis. A community influence model (CIM) is proposed to allow nodes to be classified into different communities (i.e., clusters or groups) such that the nodes within the same community share the common influence parameter. Employing the quasi-maximum likelihood approach, together with the fused lasso-type penalty, both the number of communities and the influence parameters can be estimated without imposing any specific distribution assumption on the error terms. The resulting estimators are shown to enjoy the oracle property; namely, they perform as well as if the true underlying network structure were known in advance. The proposed approach is also applicable for identifying influential nodes in a homogeneous setting. The performance of our method is illustrated via simulation studies and two empirical examples using stock data and coauthor citation data, respectively.

网络节点间异质影响力检测是网络分析中的一项重要任务。本文提出了一种社群影响模型(CIM),允许将节点划分为不同的社群(即簇或组),从而使同一社群中的节点共享共同的影响参数。利用准最大似然法和融合拉索式惩罚,无需对误差项施加任何特定的分布假设,就能估算出群落数量和影响参数。结果表明,所得到的估计值具有甲骨文特性;也就是说,这些估计值的表现与事先已知的真实底层网络结构一样好。所提出的方法也适用于在同质环境中识别有影响力的节点。我们通过模拟研究和两个分别使用股票数据和合著者引用数据的实证例子来说明我们方法的性能。
{"title":"Community influence analysis in social networks","authors":"Yuanxing Chen ,&nbsp;Kuangnan Fang ,&nbsp;Wei Lan ,&nbsp;Chih-Ling Tsai ,&nbsp;Qingzhao Zhang","doi":"10.1016/j.csda.2024.108037","DOIUrl":"10.1016/j.csda.2024.108037","url":null,"abstract":"<div><p>Heterogeneous influence detection across network nodes is an important task in network analysis. A community influence model (CIM) is proposed to allow nodes to be classified into different communities (i.e., clusters or groups) such that the nodes within the same community share the common influence parameter. Employing the quasi-maximum likelihood approach, together with the fused lasso-type penalty, both the number of communities and the influence parameters can be estimated without imposing any specific distribution assumption on the error terms. The resulting estimators are shown to enjoy the oracle property; namely, they perform as well as if the true underlying network structure were known in advance. The proposed approach is also applicable for identifying influential nodes in a homogeneous setting. The performance of our method is illustrated via simulation studies and two empirical examples using stock data and coauthor citation data, respectively.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108037"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feasible model-based principal component analysis: Joint estimation of rank and error covariance matrix 基于模型的可行主成分分析:秩和误差协方差矩阵的联合估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108042
Tak-Shing T. Chan, Alex Gibberd

Real-world inputs to principal component analysis are often corrupted by temporally or spatially correlated errors. There are several methods to mitigate this, e.g., generalized least-square matrix decomposition and maximum likelihood approaches; however, they all require that the number of components or the error covariances to be known in advance, rendering the methods infeasible. To address this issue, a novel method is developed which estimates the number of components and the error covariances at the same time. The method is based on working covariance models, an idea adapted from generalized estimating equations, where the user only specifies the structural form of the error covariances. If the structural form is also unknown, working covariance selection can be used to search for the best structure from a user-defined library. Experiments on synthetic and real data confirm the efficacy of the proposed approach.

现实世界中的主成分分析输入往往会受到时间或空间相关误差的干扰。有几种方法可以缓解这种情况,例如广义最小二乘法矩阵分解法和最大似然法;但是,这些方法都要求事先知道成分数或误差协方差,因此不可行。为了解决这个问题,我们开发了一种新方法,可以同时估算成分数量和误差协方差。该方法以工作协方差模型为基础,这一思想源自广义估计方程,用户只需指定误差协方差的结构形式。如果结构形式也是未知的,则可以使用工作协方差选择从用户定义的库中搜索最佳结构。对合成数据和真实数据的实验证实了所建议方法的有效性。
{"title":"Feasible model-based principal component analysis: Joint estimation of rank and error covariance matrix","authors":"Tak-Shing T. Chan,&nbsp;Alex Gibberd","doi":"10.1016/j.csda.2024.108042","DOIUrl":"10.1016/j.csda.2024.108042","url":null,"abstract":"<div><p>Real-world inputs to principal component analysis are often corrupted by temporally or spatially correlated errors. There are several methods to mitigate this, e.g., generalized least-square matrix decomposition and maximum likelihood approaches; however, they all require that the number of components or the error covariances to be known in advance, rendering the methods infeasible. To address this issue, a novel method is developed which estimates the number of components and the error covariances at the same time. The method is based on working covariance models, an idea adapted from generalized estimating equations, where the user only specifies the structural form of the error covariances. If the structural form is also unknown, working covariance selection can be used to search for the best structure from a user-defined library. Experiments on synthetic and real data confirm the efficacy of the proposed approach.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108042"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001269/pdfft?md5=ac444320856de4406b797dc038c23d54&pid=1-s2.0-S0167947324001269-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Bayesian spectral regression with shape constraints for multi-group data 针对多组数据的带形状约束的分层贝叶斯光谱回归
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-08 DOI: 10.1016/j.csda.2024.108036
Peter Lenk , Jangwon Lee , Dongu Han , Jichan Park , Taeryon Choi

We propose a hierarchical Bayesian (HB) model for multi-group analysis with group–specific, flexible regression functions. The lower–level (within group) and upper–level (between groups) regression functions have hierarchical Gaussian process priors. HB smoothing priors are developed for the spectral coefficients. The HB priors smooth the estimated functions within and between groups. The HB model is particularly useful when data within groups are sparse because it shares information across groups, and provides more accurate estimates than fitting separate nonparametric models to each group. The proposed model also allows shape constraints, such as monotone, U and S–shaped, and multi-modal constraints. When appropriate, shape constraints improve estimation by recognizing violations of the shape constraints as noise. The model is illustrated by two examples: monotone growth curves for children, and happiness as a convex, U-shaped function of age in multiple countries. Various basis functions could also be used, and the paper also implements versions with B-splines and orthogonal polynomials.

我们提出了一种分层贝叶斯(HB)模型,用于多组分析,具有针对特定组的灵活回归函数。下层(组内)和上层(组间)回归函数具有分层高斯过程先验。为频谱系数开发了 HB 平滑先验。HB 先验可平滑组内和组间的估计函数。在组内数据稀少的情况下,HB 模型尤其有用,因为它可以共享各组间的信息,并且比为每个组分别拟合非参数模型提供更精确的估计值。建议的模型还允许形状约束,如单调、U 形和 S 形以及多模式约束。在适当的情况下,形状约束可将违反形状约束的行为视为噪声,从而改进估计结果。该模型通过两个例子进行了说明:儿童的单调增长曲线,以及多个国家的幸福感与年龄的凸 U 型函数。还可以使用各种基函数,本文还使用 B-样条函数和正交多项式实现了各种版本。
{"title":"Hierarchical Bayesian spectral regression with shape constraints for multi-group data","authors":"Peter Lenk ,&nbsp;Jangwon Lee ,&nbsp;Dongu Han ,&nbsp;Jichan Park ,&nbsp;Taeryon Choi","doi":"10.1016/j.csda.2024.108036","DOIUrl":"10.1016/j.csda.2024.108036","url":null,"abstract":"<div><p>We propose a hierarchical Bayesian (HB) model for multi-group analysis with group–specific, flexible regression functions. The lower–level (within group) and upper–level (between groups) regression functions have hierarchical Gaussian process priors. HB smoothing priors are developed for the spectral coefficients. The HB priors smooth the estimated functions within and between groups. The HB model is particularly useful when data within groups are sparse because it shares information across groups, and provides more accurate estimates than fitting separate nonparametric models to each group. The proposed model also allows shape constraints, such as monotone, U and S–shaped, and multi-modal constraints. When appropriate, shape constraints improve estimation by recognizing violations of the shape constraints as noise. The model is illustrated by two examples: monotone growth curves for children, and happiness as a convex, U-shaped function of age in multiple countries. Various basis functions could also be used, and the paper also implements versions with B-splines and orthogonal polynomials.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"200 ","pages":"Article 108036"},"PeriodicalIF":1.5,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141979432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal splitk-plot designs 最佳分割图设计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-07-31 DOI: 10.1016/j.csda.2024.108028
Mathias Born , Peter Goos

Completely randomized designs are often infeasible due to the hard-to-change nature of one or more experimental factors. In those cases, restrictions are imposed on the order of the experimental tests. The resulting experimental designs are often split-plot or split-split-plot designs in which the levels of certain hard-to-change factors are varied only a limited number of times. In agricultural machinery optimization, the number of hard-to-change factors is so large and the available time for experimentation is so short that split-plot or split-split-plot designs are infeasible as well. The only feasible kinds of designs are generalizations of split-split-plot designs, which are referred to as splitk-designs, where k is larger than 2. The coordinate-exchange algorithm is extended to construct optimal splitk-plot designs and the added value of the algorithm is demonstrated by applying it to an experiment involving a self propelled forage harvester. The optimal design generated using the extended algorithm is substantially more efficient than the design that was actually used. Update formulas for the determinant and the inverse of the information matrix speed up the coordinate-exchange algorithm, making it feasible for large designs.

由于一个或多个实验因素难以改变,完全随机化设计往往是不可行的。在这种情况下,就需要限制实验测试的顺序。由此产生的实验设计通常是分割图或分割-分割-图设计,其中某些难以改变的因素的水平只变化有限的次数。在农业机械优化中,难以改变的因素数量非常多,而可用于试验的时间非常短,因此分割图或分割-分割-图设计也是不可行的。唯一可行的设计是分割-分割-绘图设计的一般化,称为分割 k-设计,其中 k 大于 2。坐标交换算法被扩展用于构建最佳分割 k-绘图设计,并通过应用于涉及自走式牧草收割机的实验来证明该算法的附加值。使用扩展算法生成的最优设计比实际使用的设计更有效。行列式和信息矩阵逆的更新公式加快了坐标交换算法的速度,使其适用于大型设计。
{"title":"Optimal splitk-plot designs","authors":"Mathias Born ,&nbsp;Peter Goos","doi":"10.1016/j.csda.2024.108028","DOIUrl":"10.1016/j.csda.2024.108028","url":null,"abstract":"<div><p>Completely randomized designs are often infeasible due to the hard-to-change nature of one or more experimental factors. In those cases, restrictions are imposed on the order of the experimental tests. The resulting experimental designs are often split-plot or split-split-plot designs in which the levels of certain hard-to-change factors are varied only a limited number of times. In agricultural machinery optimization, the number of hard-to-change factors is so large and the available time for experimentation is so short that split-plot or split-split-plot designs are infeasible as well. The only feasible kinds of designs are generalizations of split-split-plot designs, which are referred to as split<sup><em>k</em></sup>-designs, where <em>k</em> is larger than 2. The coordinate-exchange algorithm is extended to construct optimal split<sup><em>k</em></sup>-plot designs and the added value of the algorithm is demonstrated by applying it to an experiment involving a self propelled forage harvester. The optimal design generated using the extended algorithm is substantially more efficient than the design that was actually used. Update formulas for the determinant and the inverse of the information matrix speed up the coordinate-exchange algorithm, making it feasible for large designs.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108028"},"PeriodicalIF":1.5,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001129/pdfft?md5=a6856543c46f3f3fa3089527fd43efb7&pid=1-s2.0-S0167947324001129-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142075844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1