首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
A variational inference framework for inverse problems 逆问题的变分推理框架
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108055
Luca Maestrini , Robert G. Aykroyd , Matt P. Wand
A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.
本文提出了一个通过变分贝叶斯近似拟合逆问题模型的框架。这种方法保证了统计模型规范在广泛应用中的灵活性、良好的准确性和更短的模型拟合时间。此外,还介绍了变异贝叶斯的消息传递和因子图片段方法,这有助于简化近似推理算法的实施,并允许在逆问题模型中加入多种响应分布和惩罚。本文研究了一维和二维响应变量的模型,并建立了一个基础架构,在此基础上,基于变量间弱交互作用的高效算法更新也可以推导出更高维度的逆问题。一个图像处理应用和一个以生物医学问题为动机的模拟练习揭示了有效实施变异贝叶斯而非马尔可夫链蒙特卡罗所带来的计算优势。
{"title":"A variational inference framework for inverse problems","authors":"Luca Maestrini ,&nbsp;Robert G. Aykroyd ,&nbsp;Matt P. Wand","doi":"10.1016/j.csda.2024.108055","DOIUrl":"10.1016/j.csda.2024.108055","url":null,"abstract":"<div><div>A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108055"},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001397/pdfft?md5=85a537d37759205b0ecbf4270e7221f7&pid=1-s2.0-S0167947324001397-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spline regression with automatic knot selection 带有自动结点选择功能的样条回归
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108043
Vivien Goepp , Olivier Bouaziz , Grégory Nuel
Spline regression has proven to be a useful tool for nonparametric regression. The flexibility of this function family is based on basepoints defining shifts in the behavior of the function – called knots. The question of setting the adequate number of knots and their placement is usually overcome by penalizing over the spline's overall smoothness (e.g. P-splines). However, there are areas of application where finding the best knot placement is of interest. A new method is introduced for automatically selecting knots in spline regression. The approach consists in setting many initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge, which discards the least relevant knots. The method – called A-splines, for adaptive splines – compares favorably with other knot selection methods: it runs way faster (∼10 to ∼400 faster) than comparable methods and has close to equal predictive performance. A-splines are applied to both simulated and real datasets.
事实证明,样条回归是一种有用的非参数回归工具。该函数系列的灵活性基于定义函数行为偏移的基点(称为节点)。通常通过对样条曲线的整体平滑度进行惩罚(如 P 样条曲线)来解决设置足够数量的节点及其位置的问题。然而,在某些应用领域中,寻找最佳的节点位置也很重要。本文介绍了一种在样条回归中自动选择节点的新方法。该方法包括设置许多初始节点,并通过一种称为自适应脊的惩罚似然程序拟合样条回归,从而舍弃最不相关的节点。这种方法被称为 A-splines(自适应样条曲线),与其他节点选择方法相比,它的运行速度更快(10 到 400 倍),预测性能也接近相同。A 样条法同时适用于模拟数据集和真实数据集。
{"title":"Spline regression with automatic knot selection","authors":"Vivien Goepp ,&nbsp;Olivier Bouaziz ,&nbsp;Grégory Nuel","doi":"10.1016/j.csda.2024.108043","DOIUrl":"10.1016/j.csda.2024.108043","url":null,"abstract":"<div><div>Spline regression has proven to be a useful tool for nonparametric regression. The flexibility of this function family is based on basepoints defining shifts in the behavior of the function – called <em>knots</em>. The question of setting the adequate number of knots and their placement is usually overcome by penalizing over the spline's overall smoothness (e.g. P-splines). However, there are areas of application where finding the best knot placement is of interest. A new method is introduced for automatically selecting knots in spline regression. The approach consists in setting many initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge, which discards the least relevant knots. The method – called A-splines, for <em>adaptive splines</em> – compares favorably with other knot selection methods: it runs way faster (∼10 to ∼400 faster) than comparable methods and has close to equal predictive performance. A-splines are applied to both simulated and real datasets.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108043"},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beta-CoRM: A Bayesian approach for n-gram profiles analysis Beta-CoRM:用于 n-gram 剖面分析的贝叶斯方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-10 DOI: 10.1016/j.csda.2024.108056
José A. Perusquía , Jim E. Griffin , Cristiano Villa

n-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for n-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

n-gram 剖面图已被成功地广泛用于分析长度可能不同的长序列,以进行聚类或分类。机器学习算法主要用于此目的,但这些方法尽管具有预测性能,却无法发现隐藏结构或提供数据的完整概率表示。为了解决这个问题,我们设计了一类新型贝叶斯生成模型,专门用于作为二进制属性的 n-gram 剖面。所建议的建模方式非常灵活,可以考虑在生成模型中直接进行特征选择。此外,还为快速推断程序推导出了一种切片采样算法,并将其应用于合成和真实数据场景,结果表明特征选择可以提高分类准确性。
{"title":"Beta-CoRM: A Bayesian approach for n-gram profiles analysis","authors":"José A. Perusquía ,&nbsp;Jim E. Griffin ,&nbsp;Cristiano Villa","doi":"10.1016/j.csda.2024.108056","DOIUrl":"10.1016/j.csda.2024.108056","url":null,"abstract":"<div><p><em>n</em>-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for <em>n</em>-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108056"},"PeriodicalIF":1.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001403/pdfft?md5=9000ddccd99ed2327e978f13456b5381&pid=1-s2.0-S0167947324001403-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142228880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimum profile Hellinger distance estimation of general covariate models 一般协变量模型的最小剖面海灵格距离估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108054
Bowei Ding , Rohana J. Karunamuni , Jingjing Wu

Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.

协变量模型,如多项式回归模型、广义线性模型和异方差模型,在统计应用中被广泛使用。统计文献中有关协变量模型的文章越来越多,这充分说明了这些模型在统计分析中的重要性。由于协变量模型具有灵活性,因此越来越多的人将其作为一种方便的方法来建立数据模型,这种数据模型由一个响应变量和一个或多个影响响应变量结果的协变量组成。本文研究了广义半参数协变量模型的高效稳健估计,并为此采用了最小距离方法。一般来说,最小距离估计器对被估计量的稳定性具有自动稳健性。尤其是参数模型的最小海灵格距离估计,其估计值在模型密度上具有渐近效率,同时还具有极佳的稳健性。对于半参数协变量模型,对最小海灵格距离方法进行了扩展,并提出了最小轮廓海灵格距离估计器。通过蒙特卡罗模拟和三项真实数据分析,研究了其渐近特性(如一致性)、有限样本性能和稳健性。此外,还开发了一种计算算法来简化估计器的计算。
{"title":"Minimum profile Hellinger distance estimation of general covariate models","authors":"Bowei Ding ,&nbsp;Rohana J. Karunamuni ,&nbsp;Jingjing Wu","doi":"10.1016/j.csda.2024.108054","DOIUrl":"10.1016/j.csda.2024.108054","url":null,"abstract":"<div><p>Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108054"},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001385/pdfft?md5=cefa2d178122667194291a858ff4b934&pid=1-s2.0-S0167947324001385-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust direction estimation in single-index models via cumulative divergence 通过累积发散在单指数模型中进行稳健的方向估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108052
Shuaida He , Jiarui Zhang , Xin Chen

In this paper, we address direction estimation in single-index models, with a focus on heavy-tailed data applications. Our method utilizes cumulative divergence to directly capture the conditional mean dependence between the response variable and the index predictor, resulting in a model-free property that obviates the need for initial link function estimation. Furthermore, our approach allows heavy-tailed predictors and is robust against the presence of outliers, leveraging the rank-based nature of cumulative divergence. We establish theoretical properties for our proposal under mild regularity conditions and illustrate its solid performance through comprehensive simulations and real data analysis.

在本文中,我们讨论了单指数模型中的方向估计,重点是重尾数据应用。我们的方法利用累积发散来直接捕捉响应变量与指数预测因子之间的条件均值依赖关系,从而实现了无模型属性,无需进行初始链接函数估计。此外,我们的方法允许重尾预测因子,并利用累积发散基于等级的特性,对异常值的存在具有稳健性。我们在温和的规则性条件下为我们的建议建立了理论属性,并通过综合模拟和实际数据分析说明了它的可靠性能。
{"title":"Robust direction estimation in single-index models via cumulative divergence","authors":"Shuaida He ,&nbsp;Jiarui Zhang ,&nbsp;Xin Chen","doi":"10.1016/j.csda.2024.108052","DOIUrl":"10.1016/j.csda.2024.108052","url":null,"abstract":"<div><p>In this paper, we address direction estimation in single-index models, with a focus on heavy-tailed data applications. Our method utilizes cumulative divergence to directly capture the conditional mean dependence between the response variable and the index predictor, resulting in a model-free property that obviates the need for initial link function estimation. Furthermore, our approach allows heavy-tailed predictors and is robust against the presence of outliers, leveraging the rank-based nature of cumulative divergence. We establish theoretical properties for our proposal under mild regularity conditions and illustrate its solid performance through comprehensive simulations and real data analysis.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108052"},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian cluster validity index 贝叶斯聚类有效性指数
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108053
Onthada Preedasawakul , Nathakhun Wiroonsri

Selecting the appropriate number of clusters is a critical step in applying clustering algorithms. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their specific applications. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes either Dirichlet or generalized Dirichlet priors, resulting in the same posterior distribution. The proposed BCVI is evaluated using the Calinski-Harabasz, CVNN, Davies–Bouldin, silhouette, Starczewski, and Wiroonsri indices for hard clustering and the KWON2, Wiroonsri–Preedasawakul, and Xie–Beni indices for soft clustering as underlying indices. The performance of the proposed BCVI with that of the original underlying indices has been compared. The BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters. To illustrate this, experiments classified into three different scenarios are conducted. Additionally, the practical applicability of the proposed approach through real-world datasets, such as MRI brain tumor images are presented. These tools are published as a recent R package ‘BayesCVI’.

选择合适的聚类数量是应用聚类算法的关键一步。为了协助这一过程,人们开发了各种聚类有效性指数(CVI)。这些指数旨在确定数据集中的最佳聚类数量。然而,用户可能并不总是寻求绝对的最佳聚类数量,而是寻求更符合其特定应用的次要选项。这种认识促使我们在现有指数的基础上引入了贝叶斯聚类有效性指数(BCVI)。BCVI 采用 Dirichlet 或广义 Dirichlet 前验,产生相同的后验分布。使用 Calinski-Harabasz、CVNN、Davies-Bouldin、silhouette、Starczewski 和 Wiroonsri 指数作为硬聚类的基础指数,使用 KWON2、Wiroonsri-Preedasawakul 和 Xie-Beni 指数作为软聚类的基础指数,对提出的 BCVI 进行了评估。比较了提议的 BCVI 与原始基础指数的性能。BCVI 在用户专业知识非常宝贵的情况下具有明显的优势,允许用户指定其所需的最终聚类数量范围。为了说明这一点,我们进行了三种不同情况的实验。此外,还介绍了通过真实世界数据集(如核磁共振成像脑肿瘤图像)提出的方法的实际应用性。这些工具已作为最新的 R 软件包 "BayesCVI "发布。
{"title":"A Bayesian cluster validity index","authors":"Onthada Preedasawakul ,&nbsp;Nathakhun Wiroonsri","doi":"10.1016/j.csda.2024.108053","DOIUrl":"10.1016/j.csda.2024.108053","url":null,"abstract":"<div><p>Selecting the appropriate number of clusters is a critical step in applying clustering algorithms. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their specific applications. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes either Dirichlet or generalized Dirichlet priors, resulting in the same posterior distribution. The proposed BCVI is evaluated using the Calinski-Harabasz, CVNN, Davies–Bouldin, silhouette, Starczewski, and Wiroonsri indices for hard clustering and the KWON2, Wiroonsri–Preedasawakul, and Xie–Beni indices for soft clustering as underlying indices. The performance of the proposed BCVI with that of the original underlying indices has been compared. The BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters. To illustrate this, experiments classified into three different scenarios are conducted. Additionally, the practical applicability of the proposed approach through real-world datasets, such as MRI brain tumor images are presented. These tools are published as a recent R package ‘BayesCVI’.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108053"},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of the cumulant generating function for inference on time series 关于使用累积生成函数推断时间序列
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-28 DOI: 10.1016/j.csda.2024.108044
A. Moor, D. La Vecchia, E. Ronchetti

Innovative inference procedures for analyzing time series data are introduced. The methodology covers density approximation and composite hypothesis testing based on Whittle's estimator, which is a widely applied M-estimator in the frequency domain. Its core feature involves the cumulant generating function of Whittle's score obtained using an approximated distribution of the periodogram ordinates. A testing algorithm not only significantly expands the applicability of the state-of-the-art saddlepoint test, but also maintains the numerical accuracy of the saddlepoint approximation. Connections are made with three other prevalent frequency domain techniques: the bootstrap, empirical likelihood, and exponential tilting. Numerical examples using both simulated and real data illustrate the advantages and accuracy of the saddlepoint methods.

介绍了用于分析时间序列数据的创新推理程序。该方法涵盖了基于惠特尔估计器的密度近似和复合假设检验,惠特尔估计器是频域中广泛应用的 M 估计器。其核心特征是利用周期图序数的近似分布获得惠特尔评分的累积生成函数。测试算法不仅大大扩展了最先进的鞍点测试的适用性,而且保持了鞍点近似的数值精度。与其他三种流行的频域技术:自举法、经验似然法和指数倾斜法建立了联系。使用模拟和真实数据的数值示例说明了鞍点方法的优势和准确性。
{"title":"On the use of the cumulant generating function for inference on time series","authors":"A. Moor,&nbsp;D. La Vecchia,&nbsp;E. Ronchetti","doi":"10.1016/j.csda.2024.108044","DOIUrl":"10.1016/j.csda.2024.108044","url":null,"abstract":"<div><p>Innovative inference procedures for analyzing time series data are introduced. The methodology covers density approximation and composite hypothesis testing based on Whittle's estimator, which is a widely applied M-estimator in the frequency domain. Its core feature involves the cumulant generating function of Whittle's score obtained using an approximated distribution of the periodogram ordinates. A testing algorithm not only significantly expands the applicability of the state-of-the-art saddlepoint test, but also maintains the numerical accuracy of the saddlepoint approximation. Connections are made with three other prevalent frequency domain techniques: the bootstrap, empirical likelihood, and exponential tilting. Numerical examples using both simulated and real data illustrate the advantages and accuracy of the saddlepoint methods.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108044"},"PeriodicalIF":1.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001282/pdfft?md5=9b20083653468ba252743f2a96727926&pid=1-s2.0-S0167947324001282-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142098072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimax rates of convergence for sliced inverse regression with differential privacy 具有微分隐私的切片反回归的最小收敛率
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108041
Wenbiao Zhao , Xuehu Zhu , Lixing Zhu

Sliced inverse regression (SIR) is a highly efficient paradigm used for the purpose of dimension reduction by replacing high-dimensional covariates with a limited number of linear combinations. This paper focuses on the implementation of the classical SIR approach integrated with a Gaussian differential privacy mechanism to estimate the central space while preserving privacy. We illustrate the tradeoff between statistical accuracy and privacy in sufficient dimension reduction problems under both the classical low- dimensional and modern high-dimensional settings. Additionally, we achieve the minimax rate of the proposed estimator with Gaussian differential privacy constraint and illustrate that this rate is also optimal for multiple index models with bounded dimension of the central space. Extensive numerical studies on synthetic data sets are conducted to assess the effectiveness of the proposed technique in finite sample scenarios, and a real data analysis is presented to showcase its practical application.

切片反回归(SIR)是一种高效的范式,通过用数量有限的线性组合替代高维协变量来达到降维的目的。本文的重点是将经典的 SIR 方法与高斯差分隐私机制相结合,在保护隐私的同时估计中心空间。我们说明了在经典低维和现代高维设置下的充分降维问题中,统计精度和隐私之间的权衡。此外,我们还在高斯差分隐私约束下实现了所提估计器的最小率,并说明该率对于中心空间维度有界的多指数模型也是最优的。我们对合成数据集进行了广泛的数值研究,以评估所提出的技术在有限样本情况下的有效性,并通过实际数据分析展示了该技术的实际应用。
{"title":"Minimax rates of convergence for sliced inverse regression with differential privacy","authors":"Wenbiao Zhao ,&nbsp;Xuehu Zhu ,&nbsp;Lixing Zhu","doi":"10.1016/j.csda.2024.108041","DOIUrl":"10.1016/j.csda.2024.108041","url":null,"abstract":"<div><p>Sliced inverse regression (SIR) is a highly efficient paradigm used for the purpose of dimension reduction by replacing high-dimensional covariates with a limited number of linear combinations. This paper focuses on the implementation of the classical SIR approach integrated with a Gaussian differential privacy mechanism to estimate the central space while preserving privacy. We illustrate the tradeoff between statistical accuracy and privacy in sufficient dimension reduction problems under both the classical low- dimensional and modern high-dimensional settings. Additionally, we achieve the minimax rate of the proposed estimator with Gaussian differential privacy constraint and illustrate that this rate is also optimal for multiple index models with bounded dimension of the central space. Extensive numerical studies on synthetic data sets are conducted to assess the effectiveness of the proposed technique in finite sample scenarios, and a real data analysis is presented to showcase its practical application.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108041"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001257/pdfft?md5=cab1d33929cc2c1071e939e0580ca683&pid=1-s2.0-S0167947324001257-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142084124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test for the mean of high-dimensional functional time series 高维函数时间序列均值检验
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108040
Lin Yang , Zhenghui Feng , Qing Jiang

The one-sample test and two-sample test for the mean of high-dimensional functional time series are considered in this study. The proposed tests are built on the dimension-wise max-norm of the sum of squares of diverging projections. The null distribution of the test statistics is investigated using normal approximation, and the asymptotic behavior under the alternative is studied. The approach is robust to the cross-series dependence of unknown forms and magnitude. To approximate the critical values, a blockwise wild bootstrap method for functional time series is employed. Both fully and partially observed data are analyzed in theoretical research and numerical studies. Evidence from simulation studies and an IT stock data case study demonstrates the usefulness of the test in practice. The proposed methods have been implemented in a R package.

本研究考虑了高维函数时间序列均值的单样本检验和双样本检验。提出的检验建立在发散投影平方和的维度最大正值基础上。使用正态近似法研究了检验统计量的零分布,并研究了备选方案下的渐近行为。该方法对未知形式和幅度的跨序列依赖性具有鲁棒性。为了近似临界值,采用了功能时间序列的 blockwise wild bootstrap 方法。在理论研究和数值研究中,对完全观测数据和部分观测数据都进行了分析。来自模拟研究和 IT 股票数据案例研究的证据证明了该检验方法在实践中的实用性。所提出的方法已在 R 软件包中实现。
{"title":"Test for the mean of high-dimensional functional time series","authors":"Lin Yang ,&nbsp;Zhenghui Feng ,&nbsp;Qing Jiang","doi":"10.1016/j.csda.2024.108040","DOIUrl":"10.1016/j.csda.2024.108040","url":null,"abstract":"<div><p>The one-sample test and two-sample test for the mean of high-dimensional functional time series are considered in this study. The proposed tests are built on the dimension-wise max-norm of the sum of squares of diverging projections. The null distribution of the test statistics is investigated using normal approximation, and the asymptotic behavior under the alternative is studied. The approach is robust to the cross-series dependence of unknown forms and magnitude. To approximate the critical values, a blockwise wild bootstrap method for functional time series is employed. Both fully and partially observed data are analyzed in theoretical research and numerical studies. Evidence from simulation studies and an IT stock data case study demonstrates the usefulness of the test in practice. The proposed methods have been implemented in a R package.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108040"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001245/pdfft?md5=a3ba37187b9ba57e45af87f61b64c9c8&pid=1-s2.0-S0167947324001245-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142084125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Community influence analysis in social networks 社交网络中的社区影响力分析
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108037
Yuanxing Chen , Kuangnan Fang , Wei Lan , Chih-Ling Tsai , Qingzhao Zhang

Heterogeneous influence detection across network nodes is an important task in network analysis. A community influence model (CIM) is proposed to allow nodes to be classified into different communities (i.e., clusters or groups) such that the nodes within the same community share the common influence parameter. Employing the quasi-maximum likelihood approach, together with the fused lasso-type penalty, both the number of communities and the influence parameters can be estimated without imposing any specific distribution assumption on the error terms. The resulting estimators are shown to enjoy the oracle property; namely, they perform as well as if the true underlying network structure were known in advance. The proposed approach is also applicable for identifying influential nodes in a homogeneous setting. The performance of our method is illustrated via simulation studies and two empirical examples using stock data and coauthor citation data, respectively.

网络节点间异质影响力检测是网络分析中的一项重要任务。本文提出了一种社群影响模型(CIM),允许将节点划分为不同的社群(即簇或组),从而使同一社群中的节点共享共同的影响参数。利用准最大似然法和融合拉索式惩罚,无需对误差项施加任何特定的分布假设,就能估算出群落数量和影响参数。结果表明,所得到的估计值具有甲骨文特性;也就是说,这些估计值的表现与事先已知的真实底层网络结构一样好。所提出的方法也适用于在同质环境中识别有影响力的节点。我们通过模拟研究和两个分别使用股票数据和合著者引用数据的实证例子来说明我们方法的性能。
{"title":"Community influence analysis in social networks","authors":"Yuanxing Chen ,&nbsp;Kuangnan Fang ,&nbsp;Wei Lan ,&nbsp;Chih-Ling Tsai ,&nbsp;Qingzhao Zhang","doi":"10.1016/j.csda.2024.108037","DOIUrl":"10.1016/j.csda.2024.108037","url":null,"abstract":"<div><p>Heterogeneous influence detection across network nodes is an important task in network analysis. A community influence model (CIM) is proposed to allow nodes to be classified into different communities (i.e., clusters or groups) such that the nodes within the same community share the common influence parameter. Employing the quasi-maximum likelihood approach, together with the fused lasso-type penalty, both the number of communities and the influence parameters can be estimated without imposing any specific distribution assumption on the error terms. The resulting estimators are shown to enjoy the oracle property; namely, they perform as well as if the true underlying network structure were known in advance. The proposed approach is also applicable for identifying influential nodes in a homogeneous setting. The performance of our method is illustrated via simulation studies and two empirical examples using stock data and coauthor citation data, respectively.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108037"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1