首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
A dual-penalized approach to hypothesis testing in high-dimensional linear mediation models 高维线性中介模型假设检验的双重惩罚方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-24 DOI: 10.1016/j.csda.2024.108064
The field of mediation analysis, specifically high-dimensional mediation analysis, has been arousing great interest due to its applications in genetics, economics and other areas. Mediation analysis aims to investigate how exposure variables influence outcome variable via mediators, and it is categorized into direct and indirect effects based on whether the influence is mediated. A novel hypothesis testing method, called the dual-penalized method, is proposed to test direct and indirect effects. This method offers mild conditions and sound theoretical properties. Additionally, the asymptotic distributions of the proposed estimators are established to perform hypothesis testing. Results from simulation studies demonstrate that the dual-penalized method is highly effective, especially in weak signal settings. Further more, the application of this method to the childhood trauma data set reveals a new mediator with a credible basis in biological processes.
中介分析,特别是高维中介分析,因其在遗传学、经济学等领域的应用而备受关注。中介分析旨在研究暴露变量如何通过中介影响结果变量,根据影响是否被中介分为直接影响和间接影响。本文提出了一种新的假设检验方法,即双重惩罚法,用于检验直接效应和间接效应。该方法条件温和,理论性强。此外,还建立了所提估计值的渐近分布,以进行假设检验。模拟研究结果表明,双惩罚法非常有效,尤其是在弱信号环境下。此外,该方法在儿童创伤数据集中的应用揭示了一个新的中介因子,它在生物过程中具有可信的基础。
{"title":"A dual-penalized approach to hypothesis testing in high-dimensional linear mediation models","authors":"","doi":"10.1016/j.csda.2024.108064","DOIUrl":"10.1016/j.csda.2024.108064","url":null,"abstract":"<div><div>The field of mediation analysis, specifically high-dimensional mediation analysis, has been arousing great interest due to its applications in genetics, economics and other areas. Mediation analysis aims to investigate how exposure variables influence outcome variable via mediators, and it is categorized into direct and indirect effects based on whether the influence is mediated. A novel hypothesis testing method, called the dual-penalized method, is proposed to test direct and indirect effects. This method offers mild conditions and sound theoretical properties. Additionally, the asymptotic distributions of the proposed estimators are established to perform hypothesis testing. Results from simulation studies demonstrate that the dual-penalized method is highly effective, especially in weak signal settings. Further more, the application of this method to the childhood trauma data set reveals a new mediator with a credible basis in biological processes.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142322181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A tree approach for variable selection and its random forest 变量选择树方法及其随机森林
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-18 DOI: 10.1016/j.csda.2024.108068
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its “sure screening property”. Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers. The proposed approaches can be implemented using R package “SIStree” at https://github.com/liuyu-star/SIStree.
确定独立筛选(SIS)为超高维回归提供了一种快速高效的变量重要性排序方法。然而,经典的 SIS 无法消除排序中的虚假重要性,这在非参数设置中更为严重。为了解决这个问题,我们提出了一种新颖的筛选方法,即依次将样本划分为若干子集,并创建一个树状结构的子样本,称为 SIS-树。SIS-tree 简单易用,可与各种依赖性测量方法相结合。支持这种方法的理论结果已经确立,包括其 "确定筛选属性"。此外,SIS-树还扩展到了森林,性能得到了提高。通过模拟,证明了所提出的方法与现有的 SIS 方法相比有很大改进。此外,还通过理论论证和实验研究探讨了筛选截止值的选择。作为直接应用,我们考虑了高维数据的分类,发现筛选和截断可以大大提高现有分类器的性能。建议的方法可以使用 https://github.com/liuyu-star/SIStree 上的 R 软件包 "SIStree "来实现。
{"title":"A tree approach for variable selection and its random forest","authors":"","doi":"10.1016/j.csda.2024.108068","DOIUrl":"10.1016/j.csda.2024.108068","url":null,"abstract":"<div><div>The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its “sure screening property”. Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers. The proposed approaches can be implemented using R package “SIStree” at <span><span>https://github.com/liuyu-star/SIStree</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online graph topology learning from matrix-valued time series 从矩阵值时间序列在线图拓扑学习
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108065

The focus is on the statistical analysis of matrix-valued time series, where data is collected over a network of sensors, typically at spatial locations, over time. Each sensor records a vector of features at each time point, creating a vectorial time series for each sensor. The goal is to identify the dependency structure among these sensors and represent it with a graph. When only one feature per sensor is observed, vector auto-regressive (VAR) models are commonly used to infer Granger causality, resulting in a causal graph. The first contribution extends VAR models to matrix-variate models for the purpose of graph learning. Additionally, two online procedures are proposed for both low and high dimensions, enabling rapid updates of coefficient estimates as new samples arrive. In the high-dimensional setting, a novel Lasso-type approach is introduced, and homotopy algorithms are developed for online learning. An adaptive tuning procedure for the regularization parameter is also provided. Given that the application of auto-regressive models to data typically requires detrending, which is not feasible in an online context, the proposed AR models are augmented by incorporating trend as an additional parameter, with a particular focus on periodic trends. The online algorithms are adapted to these augmented data models, allowing for simultaneous learning of the graph and trend from streaming samples. Numerical experiments using both synthetic and real data demonstrate the effectiveness of the proposed methods.

重点是对矩阵值时间序列进行统计分析,其中数据是通过传感器网络收集的,通常在空间位置上,随着时间的推移而变化。每个传感器在每个时间点记录一个特征向量,为每个传感器创建一个向量时间序列。我们的目标是识别这些传感器之间的依赖结构,并用图形表示出来。当每个传感器只观测到一个特征时,通常使用向量自回归(VAR)模型来推断格兰杰因果关系,从而得出因果图。第一个贡献是将 VAR 模型扩展为矩阵变量模型,用于图学习。此外,还针对低维和高维提出了两种在线程序,从而在新样本到来时快速更新系数估计值。在高维设置中,引入了一种新颖的 Lasso 类型方法,并为在线学习开发了同调算法。此外,还提供了正则化参数的自适应调整程序。鉴于将自动回归模型应用到数据中通常需要去趋势,而这在在线环境中并不可行,因此通过将趋势作为附加参数来增强所提出的自回归模型,并特别关注周期性趋势。在线算法适用于这些增强的数据模型,可同时从流样本中学习图形和趋势。使用合成数据和真实数据进行的数值实验证明了所提方法的有效性。
{"title":"Online graph topology learning from matrix-valued time series","authors":"","doi":"10.1016/j.csda.2024.108065","DOIUrl":"10.1016/j.csda.2024.108065","url":null,"abstract":"<div><p>The focus is on the statistical analysis of matrix-valued time series, where data is collected over a network of sensors, typically at spatial locations, over time. Each sensor records a vector of features at each time point, creating a vectorial time series for each sensor. The goal is to identify the dependency structure among these sensors and represent it with a graph. When only one feature per sensor is observed, vector auto-regressive (VAR) models are commonly used to infer Granger causality, resulting in a causal graph. The first contribution extends VAR models to matrix-variate models for the purpose of graph learning. Additionally, two online procedures are proposed for both low and high dimensions, enabling rapid updates of coefficient estimates as new samples arrive. In the high-dimensional setting, a novel Lasso-type approach is introduced, and homotopy algorithms are developed for online learning. An adaptive tuning procedure for the regularization parameter is also provided. Given that the application of auto-regressive models to data typically requires detrending, which is not feasible in an online context, the proposed AR models are augmented by incorporating trend as an additional parameter, with a particular focus on periodic trends. The online algorithms are adapted to these augmented data models, allowing for simultaneous learning of the graph and trend from streaming samples. Numerical experiments using both synthetic and real data demonstrate the effectiveness of the proposed methods.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A variational inference framework for inverse problems 逆问题的变分推理框架
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108055
A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.
本文提出了一个通过变分贝叶斯近似拟合逆问题模型的框架。这种方法保证了统计模型规范在广泛应用中的灵活性、良好的准确性和更短的模型拟合时间。此外,还介绍了变异贝叶斯的消息传递和因子图片段方法,这有助于简化近似推理算法的实施,并允许在逆问题模型中加入多种响应分布和惩罚。本文研究了一维和二维响应变量的模型,并建立了一个基础架构,在此基础上,基于变量间弱交互作用的高效算法更新也可以推导出更高维度的逆问题。一个图像处理应用和一个以生物医学问题为动机的模拟练习揭示了有效实施变异贝叶斯而非马尔可夫链蒙特卡罗所带来的计算优势。
{"title":"A variational inference framework for inverse problems","authors":"","doi":"10.1016/j.csda.2024.108055","DOIUrl":"10.1016/j.csda.2024.108055","url":null,"abstract":"<div><div>A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001397/pdfft?md5=85a537d37759205b0ecbf4270e7221f7&pid=1-s2.0-S0167947324001397-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beta-CoRM: A Bayesian approach for n-gram profiles analysis Beta-CoRM:用于 n-gram 剖面分析的贝叶斯方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-10 DOI: 10.1016/j.csda.2024.108056

n-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for n-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

n-gram 剖面图已被成功地广泛用于分析长度可能不同的长序列,以进行聚类或分类。机器学习算法主要用于此目的,但这些方法尽管具有预测性能,却无法发现隐藏结构或提供数据的完整概率表示。为了解决这个问题,我们设计了一类新型贝叶斯生成模型,专门用于作为二进制属性的 n-gram 剖面。所建议的建模方式非常灵活,可以考虑在生成模型中直接进行特征选择。此外,还为快速推断程序推导出了一种切片采样算法,并将其应用于合成和真实数据场景,结果表明特征选择可以提高分类准确性。
{"title":"Beta-CoRM: A Bayesian approach for n-gram profiles analysis","authors":"","doi":"10.1016/j.csda.2024.108056","DOIUrl":"10.1016/j.csda.2024.108056","url":null,"abstract":"<div><p><em>n</em>-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for <em>n</em>-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001403/pdfft?md5=9000ddccd99ed2327e978f13456b5381&pid=1-s2.0-S0167947324001403-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142228880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimum profile Hellinger distance estimation of general covariate models 一般协变量模型的最小剖面海灵格距离估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108054

Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.

协变量模型,如多项式回归模型、广义线性模型和异方差模型,在统计应用中被广泛使用。统计文献中有关协变量模型的文章越来越多,这充分说明了这些模型在统计分析中的重要性。由于协变量模型具有灵活性,因此越来越多的人将其作为一种方便的方法来建立数据模型,这种数据模型由一个响应变量和一个或多个影响响应变量结果的协变量组成。本文研究了广义半参数协变量模型的高效稳健估计,并为此采用了最小距离方法。一般来说,最小距离估计器对被估计量的稳定性具有自动稳健性。尤其是参数模型的最小海灵格距离估计,其估计值在模型密度上具有渐近效率,同时还具有极佳的稳健性。对于半参数协变量模型,对最小海灵格距离方法进行了扩展,并提出了最小轮廓海灵格距离估计器。通过蒙特卡罗模拟和三项真实数据分析,研究了其渐近特性(如一致性)、有限样本性能和稳健性。此外,还开发了一种计算算法来简化估计器的计算。
{"title":"Minimum profile Hellinger distance estimation of general covariate models","authors":"","doi":"10.1016/j.csda.2024.108054","DOIUrl":"10.1016/j.csda.2024.108054","url":null,"abstract":"<div><p>Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001385/pdfft?md5=cefa2d178122667194291a858ff4b934&pid=1-s2.0-S0167947324001385-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust direction estimation in single-index models via cumulative divergence 通过累积发散在单指数模型中进行稳健的方向估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108052

In this paper, we address direction estimation in single-index models, with a focus on heavy-tailed data applications. Our method utilizes cumulative divergence to directly capture the conditional mean dependence between the response variable and the index predictor, resulting in a model-free property that obviates the need for initial link function estimation. Furthermore, our approach allows heavy-tailed predictors and is robust against the presence of outliers, leveraging the rank-based nature of cumulative divergence. We establish theoretical properties for our proposal under mild regularity conditions and illustrate its solid performance through comprehensive simulations and real data analysis.

在本文中,我们讨论了单指数模型中的方向估计,重点是重尾数据应用。我们的方法利用累积发散来直接捕捉响应变量与指数预测因子之间的条件均值依赖关系,从而实现了无模型属性,无需进行初始链接函数估计。此外,我们的方法允许重尾预测因子,并利用累积发散基于等级的特性,对异常值的存在具有稳健性。我们在温和的规则性条件下为我们的建议建立了理论属性,并通过综合模拟和实际数据分析说明了它的可靠性能。
{"title":"Robust direction estimation in single-index models via cumulative divergence","authors":"","doi":"10.1016/j.csda.2024.108052","DOIUrl":"10.1016/j.csda.2024.108052","url":null,"abstract":"<div><p>In this paper, we address direction estimation in single-index models, with a focus on heavy-tailed data applications. Our method utilizes cumulative divergence to directly capture the conditional mean dependence between the response variable and the index predictor, resulting in a model-free property that obviates the need for initial link function estimation. Furthermore, our approach allows heavy-tailed predictors and is robust against the presence of outliers, leveraging the rank-based nature of cumulative divergence. We establish theoretical properties for our proposal under mild regularity conditions and illustrate its solid performance through comprehensive simulations and real data analysis.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian cluster validity index 贝叶斯聚类有效性指数
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108053

Selecting the appropriate number of clusters is a critical step in applying clustering algorithms. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their specific applications. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes either Dirichlet or generalized Dirichlet priors, resulting in the same posterior distribution. The proposed BCVI is evaluated using the Calinski-Harabasz, CVNN, Davies–Bouldin, silhouette, Starczewski, and Wiroonsri indices for hard clustering and the KWON2, Wiroonsri–Preedasawakul, and Xie–Beni indices for soft clustering as underlying indices. The performance of the proposed BCVI with that of the original underlying indices has been compared. The BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters. To illustrate this, experiments classified into three different scenarios are conducted. Additionally, the practical applicability of the proposed approach through real-world datasets, such as MRI brain tumor images are presented. These tools are published as a recent R package ‘BayesCVI’.

选择合适的聚类数量是应用聚类算法的关键一步。为了协助这一过程,人们开发了各种聚类有效性指数(CVI)。这些指数旨在确定数据集中的最佳聚类数量。然而,用户可能并不总是寻求绝对的最佳聚类数量,而是寻求更符合其特定应用的次要选项。这种认识促使我们在现有指数的基础上引入了贝叶斯聚类有效性指数(BCVI)。BCVI 采用 Dirichlet 或广义 Dirichlet 前验,产生相同的后验分布。使用 Calinski-Harabasz、CVNN、Davies-Bouldin、silhouette、Starczewski 和 Wiroonsri 指数作为硬聚类的基础指数,使用 KWON2、Wiroonsri-Preedasawakul 和 Xie-Beni 指数作为软聚类的基础指数,对提出的 BCVI 进行了评估。比较了提议的 BCVI 与原始基础指数的性能。BCVI 在用户专业知识非常宝贵的情况下具有明显的优势,允许用户指定其所需的最终聚类数量范围。为了说明这一点,我们进行了三种不同情况的实验。此外,还介绍了通过真实世界数据集(如核磁共振成像脑肿瘤图像)提出的方法的实际应用性。这些工具已作为最新的 R 软件包 "BayesCVI "发布。
{"title":"A Bayesian cluster validity index","authors":"","doi":"10.1016/j.csda.2024.108053","DOIUrl":"10.1016/j.csda.2024.108053","url":null,"abstract":"<div><p>Selecting the appropriate number of clusters is a critical step in applying clustering algorithms. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal number of clusters within a dataset. However, users may not always seek the absolute optimal number of clusters but rather a secondary option that better aligns with their specific applications. This realization has led us to introduce a Bayesian cluster validity index (BCVI), which builds upon existing indices. The BCVI utilizes either Dirichlet or generalized Dirichlet priors, resulting in the same posterior distribution. The proposed BCVI is evaluated using the Calinski-Harabasz, CVNN, Davies–Bouldin, silhouette, Starczewski, and Wiroonsri indices for hard clustering and the KWON2, Wiroonsri–Preedasawakul, and Xie–Beni indices for soft clustering as underlying indices. The performance of the proposed BCVI with that of the original underlying indices has been compared. The BCVI offers clear advantages in situations where user expertise is valuable, allowing users to specify their desired range for the final number of clusters. To illustrate this, experiments classified into three different scenarios are conducted. Additionally, the practical applicability of the proposed approach through real-world datasets, such as MRI brain tumor images are presented. These tools are published as a recent R package ‘BayesCVI’.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of the cumulant generating function for inference on time series 关于使用累积生成函数推断时间序列
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-28 DOI: 10.1016/j.csda.2024.108044

Innovative inference procedures for analyzing time series data are introduced. The methodology covers density approximation and composite hypothesis testing based on Whittle's estimator, which is a widely applied M-estimator in the frequency domain. Its core feature involves the cumulant generating function of Whittle's score obtained using an approximated distribution of the periodogram ordinates. A testing algorithm not only significantly expands the applicability of the state-of-the-art saddlepoint test, but also maintains the numerical accuracy of the saddlepoint approximation. Connections are made with three other prevalent frequency domain techniques: the bootstrap, empirical likelihood, and exponential tilting. Numerical examples using both simulated and real data illustrate the advantages and accuracy of the saddlepoint methods.

介绍了用于分析时间序列数据的创新推理程序。该方法涵盖了基于惠特尔估计器的密度近似和复合假设检验,惠特尔估计器是频域中广泛应用的 M 估计器。其核心特征是利用周期图序数的近似分布获得惠特尔评分的累积生成函数。测试算法不仅大大扩展了最先进的鞍点测试的适用性,而且保持了鞍点近似的数值精度。与其他三种流行的频域技术:自举法、经验似然法和指数倾斜法建立了联系。使用模拟和真实数据的数值示例说明了鞍点方法的优势和准确性。
{"title":"On the use of the cumulant generating function for inference on time series","authors":"","doi":"10.1016/j.csda.2024.108044","DOIUrl":"10.1016/j.csda.2024.108044","url":null,"abstract":"<div><p>Innovative inference procedures for analyzing time series data are introduced. The methodology covers density approximation and composite hypothesis testing based on Whittle's estimator, which is a widely applied M-estimator in the frequency domain. Its core feature involves the cumulant generating function of Whittle's score obtained using an approximated distribution of the periodogram ordinates. A testing algorithm not only significantly expands the applicability of the state-of-the-art saddlepoint test, but also maintains the numerical accuracy of the saddlepoint approximation. Connections are made with three other prevalent frequency domain techniques: the bootstrap, empirical likelihood, and exponential tilting. Numerical examples using both simulated and real data illustrate the advantages and accuracy of the saddlepoint methods.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001282/pdfft?md5=9b20083653468ba252743f2a96727926&pid=1-s2.0-S0167947324001282-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142098072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test for the mean of high-dimensional functional time series 高维函数时间序列均值检验
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-22 DOI: 10.1016/j.csda.2024.108040

The one-sample test and two-sample test for the mean of high-dimensional functional time series are considered in this study. The proposed tests are built on the dimension-wise max-norm of the sum of squares of diverging projections. The null distribution of the test statistics is investigated using normal approximation, and the asymptotic behavior under the alternative is studied. The approach is robust to the cross-series dependence of unknown forms and magnitude. To approximate the critical values, a blockwise wild bootstrap method for functional time series is employed. Both fully and partially observed data are analyzed in theoretical research and numerical studies. Evidence from simulation studies and an IT stock data case study demonstrates the usefulness of the test in practice. The proposed methods have been implemented in a R package.

本研究考虑了高维函数时间序列均值的单样本检验和双样本检验。提出的检验建立在发散投影平方和的维度最大正值基础上。使用正态近似法研究了检验统计量的零分布,并研究了备选方案下的渐近行为。该方法对未知形式和幅度的跨序列依赖性具有鲁棒性。为了近似临界值,采用了功能时间序列的 blockwise wild bootstrap 方法。在理论研究和数值研究中,对完全观测数据和部分观测数据都进行了分析。来自模拟研究和 IT 股票数据案例研究的证据证明了该检验方法在实践中的实用性。所提出的方法已在 R 软件包中实现。
{"title":"Test for the mean of high-dimensional functional time series","authors":"","doi":"10.1016/j.csda.2024.108040","DOIUrl":"10.1016/j.csda.2024.108040","url":null,"abstract":"<div><p>The one-sample test and two-sample test for the mean of high-dimensional functional time series are considered in this study. The proposed tests are built on the dimension-wise max-norm of the sum of squares of diverging projections. The null distribution of the test statistics is investigated using normal approximation, and the asymptotic behavior under the alternative is studied. The approach is robust to the cross-series dependence of unknown forms and magnitude. To approximate the critical values, a blockwise wild bootstrap method for functional time series is employed. Both fully and partially observed data are analyzed in theoretical research and numerical studies. Evidence from simulation studies and an IT stock data case study demonstrates the usefulness of the test in practice. The proposed methods have been implemented in a R package.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001245/pdfft?md5=a3ba37187b9ba57e45af87f61b64c9c8&pid=1-s2.0-S0167947324001245-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142084125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1