Computational Statistics & Data Analysis最新文献

英文中文

Testing sufficiency for transfer learning 测试迁移学习的充分性

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-10-16 DOI: 10.1016/j.csda.2024.108075

Ziqian Lin , Yuan Gao , Feifei Wang , Hansheng Wang

Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes it difficult to estimate high-dimensional statistical models based on target data with limited sample size. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this regard have been developed recently. In this work, we study transfer learning from a different perspective. Specifically, we consider here the problem of testing for transfer learning sufficiency. We denote transfer learning sufficiency to be the null hypothesis. It refers to the situation that, with the help of the source data, the useful information contained in the feature vectors of the target data can be sufficiently extracted for predicting the interested target response. Therefore, the rejection of the null hypothesis implies that information useful for prediction remains in the feature vectors of the target data and thus calls for further exploration. To this end, we develop a novel testing procedure and a centralized and standardized test statistic, whose asymptotic null distribution is analytically derived. Simulation studies are presented to demonstrate the finite sample performance of the proposed method. A deep learning related real data example is presented for illustration purpose.

现代统计分析经常会遇到高维模型但样本量有限的情况。这就给基于有限样本量的目标数据估计高维统计模型带来了困难。那么，如何从另一个大样本数据中借用信息来更准确地估计目标模型就成了一个有趣的问题。这就产生了迁移学习这一有用的想法。最近，人们在这方面开发出了各种估计方法。在这项工作中，我们从另一个角度研究迁移学习。具体来说，我们在此考虑转移学习充分性的检验问题。我们将转移学习充分性视为零假设。它是指在源数据的帮助下，目标数据的特征向量中包含的有用信息可以被充分提取出来，用于预测感兴趣的目标响应。因此，拒绝零假设意味着目标数据的特征向量中仍然存在对预测有用的信息，因此需要进一步探索。为此，我们开发了一种新颖的检验程序和集中标准化检验统计量，并对其渐近零分布进行了分析推导。仿真研究展示了所提方法的有限样本性能。为了便于说明，还介绍了一个与深度学习相关的真实数据示例。

{"title":"Testing sufficiency for transfer learning","authors":"Ziqian Lin , Yuan Gao , Feifei Wang , Hansheng Wang","doi":"10.1016/j.csda.2024.108075","DOIUrl":"10.1016/j.csda.2024.108075","url":null,"abstract":"<div><div>Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes it difficult to estimate high-dimensional statistical models based on target data with limited sample size. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this regard have been developed recently. In this work, we study transfer learning from a different perspective. Specifically, we consider here the problem of testing for transfer learning sufficiency. We denote <em>transfer learning sufficiency</em> to be the null hypothesis. It refers to the situation that, with the help of the source data, the useful information contained in the feature vectors of the target data can be sufficiently extracted for predicting the interested target response. Therefore, the rejection of the null hypothesis implies that information useful for prediction remains in the feature vectors of the target data and thus calls for further exploration. To this end, we develop a novel testing procedure and a centralized and standardized test statistic, whose asymptotic null distribution is analytically derived. Simulation studies are presented to demonstrate the finite sample performance of the proposed method. A deep learning related real data example is presented for illustration purpose.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108075"},"PeriodicalIF":1.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online kernel sliced inverse regression 在线核切片反回归

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-10-16 DOI: 10.1016/j.csda.2024.108071

Jianjun Xu , Yue Zhao , Haoyang Cheng

Online dimension reduction techniques are widely utilized for handling high-dimensional streaming data. Extensive research has been conducted on various methods, including Online Principal Component Analysis, Online Sliced Inverse Regression (OSIR), and Online Kernel Principal Component Analysis (OKPCA). However, it is important to note that the exploration of online supervised nonlinear dimension reduction techniques is still limited. This article presents a novel approach called Online Kernel Sliced Inverse Regression (OKSIR), which specifically tackles the challenge of dealing with the increasing dimension of the kernel matrix as the sample size grows. The proposed method incorporates two key components: the approximate linear dependence condition and dictionary variable sets. These components enable a reduced-order approach for online variable updates, improving the efficiency of the process. To solve the OKSIR problem, we formulate it as an online generalized eigen-decomposition problem and employ stochastic optimization techniques to update the dimension reduction directions. Theoretical properties of this online learner are established, providing a solid foundation for its application. Through extensive simulations and real data analysis, we demonstrate that the proposed OKSIR method achieves performance comparable to that of batch processing kernel sliced inverse regression. This research significantly contributes to the advancement of online dimension reduction techniques, enhancing their effectiveness in practical applications.

在线降维技术被广泛用于处理高维流数据。人们对各种方法进行了广泛的研究，包括在线主成分分析、在线切片反回归（OSIR）和在线核主成分分析（OKPCA）。然而，值得注意的是，对在线监督非线性降维技术的探索仍然有限。本文提出了一种名为 "在线内核切片反回归"（Online Kernel Sliced Inverse Regression，OKSIR）的新方法，专门解决随着样本量的增加，内核矩阵维度不断增加的难题。所提出的方法包含两个关键部分：近似线性依赖条件和字典变量集。这两个部分使得在线变量更新的阶次降低，从而提高了整个过程的效率。为了解决 OKSIR 问题，我们将其表述为一个在线广义特征分解问题，并采用随机优化技术来更新降维方向。我们建立了这种在线学习器的理论特性，为其应用奠定了坚实的基础。通过大量的仿真和实际数据分析，我们证明了所提出的 OKSIR 方法的性能可与批处理核切片反回归方法相媲美。这项研究极大地推动了在线降维技术的发展，提高了其在实际应用中的有效性。

{"title":"Online kernel sliced inverse regression","authors":"Jianjun Xu , Yue Zhao , Haoyang Cheng","doi":"10.1016/j.csda.2024.108071","DOIUrl":"10.1016/j.csda.2024.108071","url":null,"abstract":"<div><div>Online dimension reduction techniques are widely utilized for handling high-dimensional streaming data. Extensive research has been conducted on various methods, including Online Principal Component Analysis, Online Sliced Inverse Regression (OSIR), and Online Kernel Principal Component Analysis (OKPCA). However, it is important to note that the exploration of online supervised nonlinear dimension reduction techniques is still limited. This article presents a novel approach called Online Kernel Sliced Inverse Regression (OKSIR), which specifically tackles the challenge of dealing with the increasing dimension of the kernel matrix as the sample size grows. The proposed method incorporates two key components: the approximate linear dependence condition and dictionary variable sets. These components enable a reduced-order approach for online variable updates, improving the efficiency of the process. To solve the OKSIR problem, we formulate it as an online generalized eigen-decomposition problem and employ stochastic optimization techniques to update the dimension reduction directions. Theoretical properties of this online learner are established, providing a solid foundation for its application. Through extensive simulations and real data analysis, we demonstrate that the proposed OKSIR method achieves performance comparable to that of batch processing kernel sliced inverse regression. This research significantly contributes to the advancement of online dimension reduction techniques, enhancing their effectiveness in practical applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108071"},"PeriodicalIF":1.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiple network embedding for anomaly detection in time series of graphs 用于时间序列图异常检测的多重网络嵌入

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-10-05 DOI: 10.1016/j.csda.2024.108070

Guodong Chen , Jesús Arroyo , Avanti Athreya , Joshua Cape , Joshua T. Vogelstein , Youngser Park , Chris White , Jonathan Larson , Weiwei Yang , Carey E. Priebe

The problem of anomaly detection in time series of graphs is considered, focusing on two related inference tasks: the detection of anomalous graphs within a time series and the detection of temporally anomalous vertices. These tasks are approached via the adaptation of multiple adjacency spectral embedding (MASE), a statistically principled method for joint graph inference. The effectiveness of the method is demonstrated for these inference tasks, and its performance is assessed based on the nature of detectable anomalies. Theoretical justification is provided, along with insights into its use. The approach identifies anomalous vertices beyond just large degree changes when applied to the Enron communication graph, a large-scale commercial search engine time series, and a larval Drosophila connectome.

本研究考虑了图形时间序列中的异常检测问题，重点关注两个相关的推理任务：时间序列中异常图形的检测和时间异常顶点的检测。这些任务是通过调整多邻接谱嵌入（MASE）来完成的，MASE 是一种用于联合图推理的统计学原理方法。该方法在这些推理任务中的有效性得到了证明，其性能也根据可检测异常的性质进行了评估。该方法提供了理论依据，并对其使用进行了深入分析。在应用于安然通讯图、大规模商业搜索引擎时间序列和果蝇幼虫连接组时，该方法不仅能识别大的度数变化，还能识别异常顶点。

引用次数: 0

Fusion regression methods with repeated functional data 重复功能数据的融合回归方法

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-09-27 DOI: 10.1016/j.csda.2024.108069

Issam-Ali Moindjié , Cristian Preda , Sophie Dabo-Niang

Linear regression and classification methods with repeated functional data are considered. For each statistical unit in the sample, a real-valued parameter is observed over time under different conditions related by some neighborhood structure (spatial, group, etc.). Two regression methods based on fusion penalties are proposed to consider the dependence induced by this structure. These methods aim to obtain parsimonious coefficient regression functions, by determining if close conditions are associated with common regression coefficient functions. The first method is a generalization to functional data of the variable fusion methodology based on the 1-nearest neighbor. The second one relies on the group fusion lasso penalty which assumes some grouping structure of conditions and allows for homogeneity among the regression coefficient functions within groups. Numerical simulations and an application of electroencephalography data are presented.

考虑了重复功能数据的线性回归和分类方法。对于样本中的每个统计单元，在与某些邻域结构（空间、群体等）相关的不同条件下，会随时间观测到一个实值参数。为了考虑这种结构引起的依赖性，提出了两种基于融合惩罚的回归方法。这些方法旨在通过确定近似条件是否与共同的回归系数函数相关联，从而获得简洁的系数回归函数。第一种方法是将基于 1-nearest neighbor 的变量融合方法推广到函数数据中。第二种方法依赖于分组融合套索惩罚，它假定条件具有一定的分组结构，并允许组内回归系数函数之间具有同质性。本文介绍了数值模拟和脑电图数据的应用。

引用次数: 0

A dual-penalized approach to hypothesis testing in high-dimensional linear mediation models 高维线性中介模型假设检验的双重惩罚方法

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-09-24 DOI: 10.1016/j.csda.2024.108064

Chenxuan He , Yiran He , Wangli Xu

The field of mediation analysis, specifically high-dimensional mediation analysis, has been arousing great interest due to its applications in genetics, economics and other areas. Mediation analysis aims to investigate how exposure variables influence outcome variable via mediators, and it is categorized into direct and indirect effects based on whether the influence is mediated. A novel hypothesis testing method, called the dual-penalized method, is proposed to test direct and indirect effects. This method offers mild conditions and sound theoretical properties. Additionally, the asymptotic distributions of the proposed estimators are established to perform hypothesis testing. Results from simulation studies demonstrate that the dual-penalized method is highly effective, especially in weak signal settings. Further more, the application of this method to the childhood trauma data set reveals a new mediator with a credible basis in biological processes.

中介分析，特别是高维中介分析，因其在遗传学、经济学等领域的应用而备受关注。中介分析旨在研究暴露变量如何通过中介影响结果变量，根据影响是否被中介分为直接影响和间接影响。本文提出了一种新的假设检验方法，即双重惩罚法，用于检验直接效应和间接效应。该方法条件温和，理论性强。此外，还建立了所提估计值的渐近分布，以进行假设检验。模拟研究结果表明，双惩罚法非常有效，尤其是在弱信号环境下。此外，该方法在儿童创伤数据集中的应用揭示了一个新的中介因子，它在生物过程中具有可信的基础。

引用次数: 0

A tree approach for variable selection and its random forest 变量选择树方法及其随机森林

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-09-18 DOI: 10.1016/j.csda.2024.108068

Yu Liu , Xu Qin , Zhibo Cai

The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its “sure screening property”. Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers. The proposed approaches can be implemented using R package “SIStree” at https://github.com/liuyu-star/SIStree.

确定独立筛选（SIS）为超高维回归提供了一种快速高效的变量重要性排序方法。然而，经典的 SIS 无法消除排序中的虚假重要性，这在非参数设置中更为严重。为了解决这个问题，我们提出了一种新颖的筛选方法，即依次将样本划分为若干子集，并创建一个树状结构的子样本，称为 SIS-树。SIS-tree 简单易用，可与各种依赖性测量方法相结合。支持这种方法的理论结果已经确立，包括其 "确定筛选属性"。此外，SIS-树还扩展到了森林，性能得到了提高。通过模拟，证明了所提出的方法与现有的 SIS 方法相比有很大改进。此外，还通过理论论证和实验研究探讨了筛选截止值的选择。作为直接应用，我们考虑了高维数据的分类，发现筛选和截断可以大大提高现有分类器的性能。建议的方法可以使用 https://github.com/liuyu-star/SIStree 上的 R 软件包 "SIStree "来实现。

{"title":"A tree approach for variable selection and its random forest","authors":"Yu Liu , Xu Qin , Zhibo Cai","doi":"10.1016/j.csda.2024.108068","DOIUrl":"10.1016/j.csda.2024.108068","url":null,"abstract":"<div><div>The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its “sure screening property”. Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers. The proposed approaches can be implemented using R package “SIStree” at <span><span>https://github.com/liuyu-star/SIStree</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108068"},"PeriodicalIF":1.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online graph topology learning from matrix-valued time series 从矩阵值时间序列在线图拓扑学习

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108065

Yiye Jiang , Jérémie Bigot , Sofian Maabout

The focus is on the statistical analysis of matrix-valued time series, where data is collected over a network of sensors, typically at spatial locations, over time. Each sensor records a vector of features at each time point, creating a vectorial time series for each sensor. The goal is to identify the dependency structure among these sensors and represent it with a graph. When only one feature per sensor is observed, vector auto-regressive (VAR) models are commonly used to infer Granger causality, resulting in a causal graph. The first contribution extends VAR models to matrix-variate models for the purpose of graph learning. Additionally, two online procedures are proposed for both low and high dimensions, enabling rapid updates of coefficient estimates as new samples arrive. In the high-dimensional setting, a novel Lasso-type approach is introduced, and homotopy algorithms are developed for online learning. An adaptive tuning procedure for the regularization parameter is also provided. Given that the application of auto-regressive models to data typically requires detrending, which is not feasible in an online context, the proposed AR models are augmented by incorporating trend as an additional parameter, with a particular focus on periodic trends. The online algorithms are adapted to these augmented data models, allowing for simultaneous learning of the graph and trend from streaming samples. Numerical experiments using both synthetic and real data demonstrate the effectiveness of the proposed methods.

重点是对矩阵值时间序列进行统计分析，其中数据是通过传感器网络收集的，通常在空间位置上，随着时间的推移而变化。每个传感器在每个时间点记录一个特征向量，为每个传感器创建一个向量时间序列。我们的目标是识别这些传感器之间的依赖结构，并用图形表示出来。当每个传感器只观测到一个特征时，通常使用向量自回归（VAR）模型来推断格兰杰因果关系，从而得出因果图。第一个贡献是将 VAR 模型扩展为矩阵变量模型，用于图学习。此外，还针对低维和高维提出了两种在线程序，从而在新样本到来时快速更新系数估计值。在高维设置中，引入了一种新颖的 Lasso 类型方法，并为在线学习开发了同调算法。此外，还提供了正则化参数的自适应调整程序。鉴于将自动回归模型应用到数据中通常需要去趋势，而这在在线环境中并不可行，因此通过将趋势作为附加参数来增强所提出的自回归模型，并特别关注周期性趋势。在线算法适用于这些增强的数据模型，可同时从流样本中学习图形和趋势。使用合成数据和真实数据进行的数值实验证明了所提方法的有效性。

{"title":"Online graph topology learning from matrix-valued time series","authors":"Yiye Jiang , Jérémie Bigot , Sofian Maabout","doi":"10.1016/j.csda.2024.108065","DOIUrl":"10.1016/j.csda.2024.108065","url":null,"abstract":"<div><p>The focus is on the statistical analysis of matrix-valued time series, where data is collected over a network of sensors, typically at spatial locations, over time. Each sensor records a vector of features at each time point, creating a vectorial time series for each sensor. The goal is to identify the dependency structure among these sensors and represent it with a graph. When only one feature per sensor is observed, vector auto-regressive (VAR) models are commonly used to infer Granger causality, resulting in a causal graph. The first contribution extends VAR models to matrix-variate models for the purpose of graph learning. Additionally, two online procedures are proposed for both low and high dimensions, enabling rapid updates of coefficient estimates as new samples arrive. In the high-dimensional setting, a novel Lasso-type approach is introduced, and homotopy algorithms are developed for online learning. An adaptive tuning procedure for the regularization parameter is also provided. Given that the application of auto-regressive models to data typically requires detrending, which is not feasible in an online context, the proposed AR models are augmented by incorporating trend as an additional parameter, with a particular focus on periodic trends. The online algorithms are adapted to these augmented data models, allowing for simultaneous learning of the graph and trend from streaming samples. Numerical experiments using both synthetic and real data demonstrate the effectiveness of the proposed methods.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108065"},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A variational inference framework for inverse problems 逆问题的变分推理框架

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108055

Luca Maestrini , Robert G. Aykroyd , Matt P. Wand

A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.

本文提出了一个通过变分贝叶斯近似拟合逆问题模型的框架。这种方法保证了统计模型规范在广泛应用中的灵活性、良好的准确性和更短的模型拟合时间。此外，还介绍了变异贝叶斯的消息传递和因子图片段方法，这有助于简化近似推理算法的实施，并允许在逆问题模型中加入多种响应分布和惩罚。本文研究了一维和二维响应变量的模型，并建立了一个基础架构，在此基础上，基于变量间弱交互作用的高效算法更新也可以推导出更高维度的逆问题。一个图像处理应用和一个以生物医学问题为动机的模拟练习揭示了有效实施变异贝叶斯而非马尔可夫链蒙特卡罗所带来的计算优势。

{"title":"A variational inference framework for inverse problems","authors":"Luca Maestrini , Robert G. Aykroyd , Matt P. Wand","doi":"10.1016/j.csda.2024.108055","DOIUrl":"10.1016/j.csda.2024.108055","url":null,"abstract":"<div><div>A framework is presented for fitting inverse problem models via variational Bayes approximations. This methodology guarantees flexibility to statistical model specification for a broad range of applications, good accuracy and reduced model fitting times. The message passing and factor graph fragment approach to variational Bayes that is also described facilitates streamlined implementation of approximate inference algorithms and allows for supple inclusion of numerous response distributions and penalizations into the inverse problem model. Models for one- and two-dimensional response variables are examined and an infrastructure is laid down where efficient algorithm updates based on nullifying weak interactions between variables can also be derived for inverse problems in higher dimensions. An image processing application and a simulation exercise motivated by biomedical problems reveal the computational advantage offered by efficient implementation of variational Bayes over Markov chain Monte Carlo.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"202 ","pages":"Article 108055"},"PeriodicalIF":1.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001397/pdfft?md5=85a537d37759205b0ecbf4270e7221f7&pid=1-s2.0-S0167947324001397-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142311328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spline regression with automatic knot selection 带有自动结点选择功能的样条回归

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-09-16 DOI: 10.1016/j.csda.2024.108043

Vivien Goepp , Olivier Bouaziz , Grégory Nuel

Spline regression has proven to be a useful tool for nonparametric regression. The flexibility of this function family is based on basepoints defining shifts in the behavior of the function – called knots. The question of setting the adequate number of knots and their placement is usually overcome by penalizing over the spline's overall smoothness (e.g. P-splines). However, there are areas of application where finding the best knot placement is of interest. A new method is introduced for automatically selecting knots in spline regression. The approach consists in setting many initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge, which discards the least relevant knots. The method – called A-splines, for adaptive splines – compares favorably with other knot selection methods: it runs way faster (∼10 to ∼400 faster) than comparable methods and has close to equal predictive performance. A-splines are applied to both simulated and real datasets.

事实证明，样条回归是一种有用的非参数回归工具。该函数系列的灵活性基于定义函数行为偏移的基点（称为节点）。通常通过对样条曲线的整体平滑度进行惩罚（如 P 样条曲线）来解决设置足够数量的节点及其位置的问题。然而，在某些应用领域中，寻找最佳的节点位置也很重要。本文介绍了一种在样条回归中自动选择节点的新方法。该方法包括设置许多初始节点，并通过一种称为自适应脊的惩罚似然程序拟合样条回归，从而舍弃最不相关的节点。这种方法被称为 A-splines（自适应样条曲线），与其他节点选择方法相比，它的运行速度更快（10 到 400 倍），预测性能也接近相同。A 样条法同时适用于模拟数据集和真实数据集。

引用次数: 0

Beta-CoRM: A Bayesian approach for n-gram profiles analysis Beta-CoRM：用于 n-gram 剖面分析的贝叶斯方法

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-09-10 DOI: 10.1016/j.csda.2024.108056

José A. Perusquía , Jim E. Griffin , Cristiano Villa

n-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for n-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

n-gram 剖面图已被成功地广泛用于分析长度可能不同的长序列，以进行聚类或分类。机器学习算法主要用于此目的，但这些方法尽管具有预测性能，却无法发现隐藏结构或提供数据的完整概率表示。为了解决这个问题，我们设计了一类新型贝叶斯生成模型，专门用于作为二进制属性的 n-gram 剖面。所建议的建模方式非常灵活，可以考虑在生成模型中直接进行特征选择。此外，还为快速推断程序推导出了一种切片采样算法，并将其应用于合成和真实数据场景，结果表明特征选择可以提高分类准确性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computational Statistics & Data Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀