首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Online kernel sliced inverse regression 在线核切片反回归
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-16 DOI: 10.1016/j.csda.2024.108071
Jianjun Xu , Yue Zhao , Haoyang Cheng
Online dimension reduction techniques are widely utilized for handling high-dimensional streaming data. Extensive research has been conducted on various methods, including Online Principal Component Analysis, Online Sliced Inverse Regression (OSIR), and Online Kernel Principal Component Analysis (OKPCA). However, it is important to note that the exploration of online supervised nonlinear dimension reduction techniques is still limited. This article presents a novel approach called Online Kernel Sliced Inverse Regression (OKSIR), which specifically tackles the challenge of dealing with the increasing dimension of the kernel matrix as the sample size grows. The proposed method incorporates two key components: the approximate linear dependence condition and dictionary variable sets. These components enable a reduced-order approach for online variable updates, improving the efficiency of the process. To solve the OKSIR problem, we formulate it as an online generalized eigen-decomposition problem and employ stochastic optimization techniques to update the dimension reduction directions. Theoretical properties of this online learner are established, providing a solid foundation for its application. Through extensive simulations and real data analysis, we demonstrate that the proposed OKSIR method achieves performance comparable to that of batch processing kernel sliced inverse regression. This research significantly contributes to the advancement of online dimension reduction techniques, enhancing their effectiveness in practical applications.
在线降维技术被广泛用于处理高维流数据。人们对各种方法进行了广泛的研究,包括在线主成分分析、在线切片反回归(OSIR)和在线核主成分分析(OKPCA)。然而,值得注意的是,对在线监督非线性降维技术的探索仍然有限。本文提出了一种名为 "在线内核切片反回归"(Online Kernel Sliced Inverse Regression,OKSIR)的新方法,专门解决随着样本量的增加,内核矩阵维度不断增加的难题。所提出的方法包含两个关键部分:近似线性依赖条件和字典变量集。这两个部分使得在线变量更新的阶次降低,从而提高了整个过程的效率。为了解决 OKSIR 问题,我们将其表述为一个在线广义特征分解问题,并采用随机优化技术来更新降维方向。我们建立了这种在线学习器的理论特性,为其应用奠定了坚实的基础。通过大量的仿真和实际数据分析,我们证明了所提出的 OKSIR 方法的性能可与批处理核切片反回归方法相媲美。这项研究极大地推动了在线降维技术的发展,提高了其在实际应用中的有效性。
{"title":"Online kernel sliced inverse regression","authors":"Jianjun Xu ,&nbsp;Yue Zhao ,&nbsp;Haoyang Cheng","doi":"10.1016/j.csda.2024.108071","DOIUrl":"10.1016/j.csda.2024.108071","url":null,"abstract":"<div><div>Online dimension reduction techniques are widely utilized for handling high-dimensional streaming data. Extensive research has been conducted on various methods, including Online Principal Component Analysis, Online Sliced Inverse Regression (OSIR), and Online Kernel Principal Component Analysis (OKPCA). However, it is important to note that the exploration of online supervised nonlinear dimension reduction techniques is still limited. This article presents a novel approach called Online Kernel Sliced Inverse Regression (OKSIR), which specifically tackles the challenge of dealing with the increasing dimension of the kernel matrix as the sample size grows. The proposed method incorporates two key components: the approximate linear dependence condition and dictionary variable sets. These components enable a reduced-order approach for online variable updates, improving the efficiency of the process. To solve the OKSIR problem, we formulate it as an online generalized eigen-decomposition problem and employ stochastic optimization techniques to update the dimension reduction directions. Theoretical properties of this online learner are established, providing a solid foundation for its application. Through extensive simulations and real data analysis, we demonstrate that the proposed OKSIR method achieves performance comparable to that of batch processing kernel sliced inverse regression. This research significantly contributes to the advancement of online dimension reduction techniques, enhancing their effectiveness in practical applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple network embedding for anomaly detection in time series of graphs 用于时间序列图异常检测的多重网络嵌入
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-05 DOI: 10.1016/j.csda.2024.108070
Guodong Chen , Jesús Arroyo , Avanti Athreya , Joshua Cape , Joshua T. Vogelstein , Youngser Park , Chris White , Jonathan Larson , Weiwei Yang , Carey E. Priebe
The problem of anomaly detection in time series of graphs is considered, focusing on two related inference tasks: the detection of anomalous graphs within a time series and the detection of temporally anomalous vertices. These tasks are approached via the adaptation of multiple adjacency spectral embedding (MASE), a statistically principled method for joint graph inference. The effectiveness of the method is demonstrated for these inference tasks, and its performance is assessed based on the nature of detectable anomalies. Theoretical justification is provided, along with insights into its use. The approach identifies anomalous vertices beyond just large degree changes when applied to the Enron communication graph, a large-scale commercial search engine time series, and a larval Drosophila connectome.
本研究考虑了图形时间序列中的异常检测问题,重点关注两个相关的推理任务:时间序列中异常图形的检测和时间异常顶点的检测。这些任务是通过调整多邻接谱嵌入(MASE)来完成的,MASE 是一种用于联合图推理的统计学原理方法。该方法在这些推理任务中的有效性得到了证明,其性能也根据可检测异常的性质进行了评估。该方法提供了理论依据,并对其使用进行了深入分析。在应用于安然通讯图、大规模商业搜索引擎时间序列和果蝇幼虫连接组时,该方法不仅能识别大的度数变化,还能识别异常顶点。
{"title":"Multiple network embedding for anomaly detection in time series of graphs","authors":"Guodong Chen ,&nbsp;Jesús Arroyo ,&nbsp;Avanti Athreya ,&nbsp;Joshua Cape ,&nbsp;Joshua T. Vogelstein ,&nbsp;Youngser Park ,&nbsp;Chris White ,&nbsp;Jonathan Larson ,&nbsp;Weiwei Yang ,&nbsp;Carey E. Priebe","doi":"10.1016/j.csda.2024.108070","DOIUrl":"10.1016/j.csda.2024.108070","url":null,"abstract":"<div><div>The problem of anomaly detection in time series of graphs is considered, focusing on two related inference tasks: the detection of anomalous graphs within a time series and the detection of temporally anomalous vertices. These tasks are approached via the adaptation of multiple adjacency spectral embedding (MASE), a statistically principled method for joint graph inference. The effectiveness of the method is demonstrated for these inference tasks, and its performance is assessed based on the nature of detectable anomalies. Theoretical justification is provided, along with insights into its use. The approach identifies anomalous vertices beyond just large degree changes when applied to the Enron communication graph, a large-scale commercial search engine time series, and a larval Drosophila connectome.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion regression methods with repeated functional data 重复功能数据的融合回归方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-27 DOI: 10.1016/j.csda.2024.108069
Issam-Ali Moindjié , Cristian Preda , Sophie Dabo-Niang
Linear regression and classification methods with repeated functional data are considered. For each statistical unit in the sample, a real-valued parameter is observed over time under different conditions related by some neighborhood structure (spatial, group, etc.). Two regression methods based on fusion penalties are proposed to consider the dependence induced by this structure. These methods aim to obtain parsimonious coefficient regression functions, by determining if close conditions are associated with common regression coefficient functions. The first method is a generalization to functional data of the variable fusion methodology based on the 1-nearest neighbor. The second one relies on the group fusion lasso penalty which assumes some grouping structure of conditions and allows for homogeneity among the regression coefficient functions within groups. Numerical simulations and an application of electroencephalography data are presented.
考虑了重复功能数据的线性回归和分类方法。对于样本中的每个统计单元,在与某些邻域结构(空间、群体等)相关的不同条件下,会随时间观测到一个实值参数。为了考虑这种结构引起的依赖性,提出了两种基于融合惩罚的回归方法。这些方法旨在通过确定近似条件是否与共同的回归系数函数相关联,从而获得简洁的系数回归函数。第一种方法是将基于 1-nearest neighbor 的变量融合方法推广到函数数据中。第二种方法依赖于分组融合套索惩罚,它假定条件具有一定的分组结构,并允许组内回归系数函数之间具有同质性。本文介绍了数值模拟和脑电图数据的应用。
{"title":"Fusion regression methods with repeated functional data","authors":"Issam-Ali Moindjié ,&nbsp;Cristian Preda ,&nbsp;Sophie Dabo-Niang","doi":"10.1016/j.csda.2024.108069","DOIUrl":"10.1016/j.csda.2024.108069","url":null,"abstract":"<div><div>Linear regression and classification methods with repeated functional data are considered. For each statistical unit in the sample, a real-valued parameter is observed over time under different conditions related by some neighborhood structure (spatial, group, etc.). Two regression methods based on fusion penalties are proposed to consider the dependence induced by this structure. These methods aim to obtain parsimonious coefficient regression functions, by determining if close conditions are associated with common regression coefficient functions. The first method is a generalization to functional data of the variable fusion methodology based on the 1-nearest neighbor. The second one relies on the group fusion lasso penalty which assumes some grouping structure of conditions and allows for homogeneity among the regression coefficient functions within groups. Numerical simulations and an application of electroencephalography data are presented.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142433513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dual-penalized approach to hypothesis testing in high-dimensional linear mediation models 高维线性中介模型假设检验的双重惩罚方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-24 DOI: 10.1016/j.csda.2024.108064
Chenxuan He , Yiran He , Wangli Xu
The field of mediation analysis, specifically high-dimensional mediation analysis, has been arousing great interest due to its applications in genetics, economics and other areas. Mediation analysis aims to investigate how exposure variables influence outcome variable via mediators, and it is categorized into direct and indirect effects based on whether the influence is mediated. A novel hypothesis testing method, called the dual-penalized method, is proposed to test direct and indirect effects. This method offers mild conditions and sound theoretical properties. Additionally, the asymptotic distributions of the proposed estimators are established to perform hypothesis testing. Results from simulation studies demonstrate that the dual-penalized method is highly effective, especially in weak signal settings. Further more, the application of this method to the childhood trauma data set reveals a new mediator with a credible basis in biological processes.
中介分析,特别是高维中介分析,因其在遗传学、经济学等领域的应用而备受关注。中介分析旨在研究暴露变量如何通过中介影响结果变量,根据影响是否被中介分为直接影响和间接影响。本文提出了一种新的假设检验方法,即双重惩罚法,用于检验直接效应和间接效应。该方法条件温和,理论性强。此外,还建立了所提估计值的渐近分布,以进行假设检验。模拟研究结果表明,双惩罚法非常有效,尤其是在弱信号环境下。此外,该方法在儿童创伤数据集中的应用揭示了一个新的中介因子,它在生物过程中具有可信的基础。
{"title":"A dual-penalized approach to hypothesis testing in high-dimensional linear mediation models","authors":"Chenxuan He ,&nbsp;Yiran He ,&nbsp;Wangli Xu","doi":"10.1016/j.csda.2024.108064","DOIUrl":"10.1016/j.csda.2024.108064","url":null,"abstract":"<div><div>The field of mediation analysis, specifically high-dimensional mediation analysis, has been arousing great interest due to its applications in genetics, economics and other areas. Mediation analysis aims to investigate how exposure variables influence outcome variable via mediators, and it is categorized into direct and indirect effects based on whether the influence is mediated. A novel hypothesis testing method, called the dual-penalized method, is proposed to test direct and indirect effects. This method offers mild conditions and sound theoretical properties. Additionally, the asymptotic distributions of the proposed estimators are established to perform hypothesis testing. Results from simulation studies demonstrate that the dual-penalized method is highly effective, especially in weak signal settings. Further more, the application of this method to the childhood trauma data set reveals a new mediator with a credible basis in biological processes.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142322181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A tree approach for variable selection and its random forest 变量选择树方法及其随机森林
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-18 DOI: 10.1016/j.csda.2024.108068
Yu Liu , Xu Qin , Zhibo Cai
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its “sure screening property”. Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers. The proposed approaches can be implemented using R package “SIStree” at https://github.com/liuyu-star/SIStree.
确定独立筛选(SIS)为超高维回归提供了一种快速高效的变量重要性排序方法。然而,经典的 SIS 无法消除排序中的虚假重要性,这在非参数设置中更为严重。为了解决这个问题,我们提出了一种新颖的筛选方法,即依次将样本划分为若干子集,并创建一个树状结构的子样本,称为 SIS-树。SIS-tree 简单易用,可与各种依赖性测量方法相结合。支持这种方法的理论结果已经确立,包括其 "确定筛选属性"。此外,SIS-树还扩展到了森林,性能得到了提高。通过模拟,证明了所提出的方法与现有的 SIS 方法相比有很大改进。此外,还通过理论论证和实验研究探讨了筛选截止值的选择。作为直接应用,我们考虑了高维数据的分类,发现筛选和截断可以大大提高现有分类器的性能。建议的方法可以使用 https://github.com/liuyu-star/SIStree 上的 R 软件包 "SIStree "来实现。
{"title":"A tree approach for variable selection and its random forest","authors":"Yu Liu ,&nbsp;Xu Qin ,&nbsp;Zhibo Cai","doi":"10.1016/j.csda.2024.108068","DOIUrl":"10.1016/j.csda.2024.108068","url":null,"abstract":"<div><div>The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its “sure screening property”. Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers. The proposed approaches can be implemented using R package “SIStree” at <span><span>https://github.com/liuyu-star/SIStree</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online graph topology learning from matrix-valued time series 从矩阵值时间序列在线图拓扑学习
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108065
Yiye Jiang , Jérémie Bigot , Sofian Maabout

The focus is on the statistical analysis of matrix-valued time series, where data is collected over a network of sensors, typically at spatial locations, over time. Each sensor records a vector of features at each time point, creating a vectorial time series for each sensor. The goal is to identify the dependency structure among these sensors and represent it with a graph. When only one feature per sensor is observed, vector auto-regressive (VAR) models are commonly used to infer Granger causality, resulting in a causal graph. The first contribution extends VAR models to matrix-variate models for the purpose of graph learning. Additionally, two online procedures are proposed for both low and high dimensions, enabling rapid updates of coefficient estimates as new samples arrive. In the high-dimensional setting, a novel Lasso-type approach is introduced, and homotopy algorithms are developed for online learning. An adaptive tuning procedure for the regularization parameter is also provided. Given that the application of auto-regressive models to data typically requires detrending, which is not feasible in an online context, the proposed AR models are augmented by incorporating trend as an additional parameter, with a particular focus on periodic trends. The online algorithms are adapted to these augmented data models, allowing for simultaneous learning of the graph and trend from streaming samples. Numerical experiments using both synthetic and real data demonstrate the effectiveness of the proposed methods.

重点是对矩阵值时间序列进行统计分析,其中数据是通过传感器网络收集的,通常在空间位置上,随着时间的推移而变化。每个传感器在每个时间点记录一个特征向量,为每个传感器创建一个向量时间序列。我们的目标是识别这些传感器之间的依赖结构,并用图形表示出来。当每个传感器只观测到一个特征时,通常使用向量自回归(VAR)模型来推断格兰杰因果关系,从而得出因果图。第一个贡献是将 VAR 模型扩展为矩阵变量模型,用于图学习。此外,还针对低维和高维提出了两种在线程序,从而在新样本到来时快速更新系数估计值。在高维设置中,引入了一种新颖的 Lasso 类型方法,并为在线学习开发了同调算法。此外,还提供了正则化参数的自适应调整程序。鉴于将自动回归模型应用到数据中通常需要去趋势,而这在在线环境中并不可行,因此通过将趋势作为附加参数来增强所提出的自回归模型,并特别关注周期性趋势。在线算法适用于这些增强的数据模型,可同时从流样本中学习图形和趋势。使用合成数据和真实数据进行的数值实验证明了所提方法的有效性。
{"title":"Online graph topology learning from matrix-valued time series","authors":"Yiye Jiang ,&nbsp;Jérémie Bigot ,&nbsp;Sofian Maabout","doi":"10.1016/j.csda.2024.108065","DOIUrl":"10.1016/j.csda.2024.108065","url":null,"abstract":"<div><p>The focus is on the statistical analysis of matrix-valued time series, where data is collected over a network of sensors, typically at spatial locations, over time. Each sensor records a vector of features at each time point, creating a vectorial time series for each sensor. The goal is to identify the dependency structure among these sensors and represent it with a graph. When only one feature per sensor is observed, vector auto-regressive (VAR) models are commonly used to infer Granger causality, resulting in a causal graph. The first contribution extends VAR models to matrix-variate models for the purpose of graph learning. Additionally, two online procedures are proposed for both low and high dimensions, enabling rapid updates of coefficient estimates as new samples arrive. In the high-dimensional setting, a novel Lasso-type approach is introduced, and homotopy algorithms are developed for online learning. An adaptive tuning procedure for the regularization parameter is also provided. Given that the application of auto-regressive models to data typically requires detrending, which is not feasible in an online context, the proposed AR models are augmented by incorporating trend as an additional parameter, with a particular focus on periodic trends. The online algorithms are adapted to these augmented data models, allowing for simultaneous learning of the graph and trend from streaming samples. Numerical experiments using both synthetic and real data demonstrate the effectiveness of the proposed methods.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A variational inference framework for inverse problems 逆问题的变分推理框架
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108055
Luca Maestrini , Robert G. Aykroyd , Matt P. Wand
A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.
本文提出了一个通过变分贝叶斯近似拟合逆问题模型的框架。这种方法保证了统计模型规范在广泛应用中的灵活性、良好的准确性和更短的模型拟合时间。此外,还介绍了变异贝叶斯的消息传递和因子图片段方法,这有助于简化近似推理算法的实施,并允许在逆问题模型中加入多种响应分布和惩罚。本文研究了一维和二维响应变量的模型,并建立了一个基础架构,在此基础上,基于变量间弱交互作用的高效算法更新也可以推导出更高维度的逆问题。一个图像处理应用和一个以生物医学问题为动机的模拟练习揭示了有效实施变异贝叶斯而非马尔可夫链蒙特卡罗所带来的计算优势。
{"title":"A variational inference framework for inverse problems","authors":"Luca Maestrini ,&nbsp;Robert G. Aykroyd ,&nbsp;Matt P. Wand","doi":"10.1016/j.csda.2024.108055","DOIUrl":"10.1016/j.csda.2024.108055","url":null,"abstract":"<div><div>A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001397/pdfft?md5=85a537d37759205b0ecbf4270e7221f7&pid=1-s2.0-S0167947324001397-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spline regression with automatic knot selection 带有自动结点选择功能的样条回归
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108043
Vivien Goepp , Olivier Bouaziz , Grégory Nuel
Spline regression has proven to be a useful tool for nonparametric regression. The flexibility of this function family is based on basepoints defining shifts in the behavior of the function – called knots. The question of setting the adequate number of knots and their placement is usually overcome by penalizing over the spline's overall smoothness (e.g. P-splines). However, there are areas of application where finding the best knot placement is of interest. A new method is introduced for automatically selecting knots in spline regression. The approach consists in setting many initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge, which discards the least relevant knots. The method – called A-splines, for adaptive splines – compares favorably with other knot selection methods: it runs way faster (∼10 to ∼400 faster) than comparable methods and has close to equal predictive performance. A-splines are applied to both simulated and real datasets.
事实证明,样条回归是一种有用的非参数回归工具。该函数系列的灵活性基于定义函数行为偏移的基点(称为节点)。通常通过对样条曲线的整体平滑度进行惩罚(如 P 样条曲线)来解决设置足够数量的节点及其位置的问题。然而,在某些应用领域中,寻找最佳的节点位置也很重要。本文介绍了一种在样条回归中自动选择节点的新方法。该方法包括设置许多初始节点,并通过一种称为自适应脊的惩罚似然程序拟合样条回归,从而舍弃最不相关的节点。这种方法被称为 A-splines(自适应样条曲线),与其他节点选择方法相比,它的运行速度更快(10 到 400 倍),预测性能也接近相同。A 样条法同时适用于模拟数据集和真实数据集。
{"title":"Spline regression with automatic knot selection","authors":"Vivien Goepp ,&nbsp;Olivier Bouaziz ,&nbsp;Grégory Nuel","doi":"10.1016/j.csda.2024.108043","DOIUrl":"10.1016/j.csda.2024.108043","url":null,"abstract":"<div><div>Spline regression has proven to be a useful tool for nonparametric regression. The flexibility of this function family is based on basepoints defining shifts in the behavior of the function – called <em>knots</em>. The question of setting the adequate number of knots and their placement is usually overcome by penalizing over the spline's overall smoothness (e.g. P-splines). However, there are areas of application where finding the best knot placement is of interest. A new method is introduced for automatically selecting knots in spline regression. The approach consists in setting many initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge, which discards the least relevant knots. The method – called A-splines, for <em>adaptive splines</em> – compares favorably with other knot selection methods: it runs way faster (∼10 to ∼400 faster) than comparable methods and has close to equal predictive performance. A-splines are applied to both simulated and real datasets.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beta-CoRM: A Bayesian approach for n-gram profiles analysis Beta-CoRM:用于 n-gram 剖面分析的贝叶斯方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-09-10 DOI: 10.1016/j.csda.2024.108056
José A. Perusquía , Jim E. Griffin , Cristiano Villa

n-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for n-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

n-gram 剖面图已被成功地广泛用于分析长度可能不同的长序列,以进行聚类或分类。机器学习算法主要用于此目的,但这些方法尽管具有预测性能,却无法发现隐藏结构或提供数据的完整概率表示。为了解决这个问题,我们设计了一类新型贝叶斯生成模型,专门用于作为二进制属性的 n-gram 剖面。所建议的建模方式非常灵活,可以考虑在生成模型中直接进行特征选择。此外,还为快速推断程序推导出了一种切片采样算法,并将其应用于合成和真实数据场景,结果表明特征选择可以提高分类准确性。
{"title":"Beta-CoRM: A Bayesian approach for n-gram profiles analysis","authors":"José A. Perusquía ,&nbsp;Jim E. Griffin ,&nbsp;Cristiano Villa","doi":"10.1016/j.csda.2024.108056","DOIUrl":"10.1016/j.csda.2024.108056","url":null,"abstract":"<div><p><em>n</em>-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for <em>n</em>-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001403/pdfft?md5=9000ddccd99ed2327e978f13456b5381&pid=1-s2.0-S0167947324001403-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142228880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimum profile Hellinger distance estimation of general covariate models 一般协变量模型的最小剖面海灵格距离估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-08-30 DOI: 10.1016/j.csda.2024.108054
Bowei Ding , Rohana J. Karunamuni , Jingjing Wu

Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.

协变量模型,如多项式回归模型、广义线性模型和异方差模型,在统计应用中被广泛使用。统计文献中有关协变量模型的文章越来越多,这充分说明了这些模型在统计分析中的重要性。由于协变量模型具有灵活性,因此越来越多的人将其作为一种方便的方法来建立数据模型,这种数据模型由一个响应变量和一个或多个影响响应变量结果的协变量组成。本文研究了广义半参数协变量模型的高效稳健估计,并为此采用了最小距离方法。一般来说,最小距离估计器对被估计量的稳定性具有自动稳健性。尤其是参数模型的最小海灵格距离估计,其估计值在模型密度上具有渐近效率,同时还具有极佳的稳健性。对于半参数协变量模型,对最小海灵格距离方法进行了扩展,并提出了最小轮廓海灵格距离估计器。通过蒙特卡罗模拟和三项真实数据分析,研究了其渐近特性(如一致性)、有限样本性能和稳健性。此外,还开发了一种计算算法来简化估计器的计算。
{"title":"Minimum profile Hellinger distance estimation of general covariate models","authors":"Bowei Ding ,&nbsp;Rohana J. Karunamuni ,&nbsp;Jingjing Wu","doi":"10.1016/j.csda.2024.108054","DOIUrl":"10.1016/j.csda.2024.108054","url":null,"abstract":"<div><p>Covariate models, such as polynomial regression models, generalized linear models, and heteroscedastic models, are widely used in statistical applications. The importance of such models in statistical analysis is abundantly clear by the ever-increasing rate at which articles on covariate models are appearing in the statistical literature. Because of their flexibility, covariate models are increasingly being exploited as a convenient way to model data that consist of both a response variable and one or more covariate variables that affect the outcome of the response variable. Efficient and robust estimates for broadly defined semiparametric covariate models are investigated, and for this purpose the minimum distance approach is employed. In general, minimum distance estimators are automatically robust with respect to the stability of the quantity being estimated. In particular, minimum Hellinger distance estimation for parametric models produces estimators that are asymptotically efficient at the model density and simultaneously possess excellent robustness properties. For semiparametric covariate models, the minimum Hellinger distance method is extended and a minimum profile Hellinger distance estimator is proposed. Its asymptotic properties such as consistency are studied, and its finite-sample performance and robustness are examined by using Monte Carlo simulations and three real data analyses. Additionally, a computing algorithm is developed to ease the computation of the estimator.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001385/pdfft?md5=cefa2d178122667194291a858ff4b934&pid=1-s2.0-S0167947324001385-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1