Computational Statistics & Data Analysis最新文献

英文中文

A subspace method for large-scale trace ratio problems

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-12-13 DOI: 10.1016/j.csda.2024.108108

Giulia Ferrandi , Michiel E. Hochstenbach , M. Rosário Oliveira

A subspace method is introduced to solve large-scale trace ratio problems. This approach is matrix-free, requiring only the action of the two matrices involved in the trace ratio. At each iteration, a smaller trace ratio problem is addressed in the search subspace. Additionally, the algorithm is endowed with a restarting strategy, that ensures the monotonicity of the trace ratio value throughout the iterations. The behavior of the approximate solution is investigated from a theoretical viewpoint, extending existing results on Ritz values and vectors, as the angle between the search subspace and the exact solution approaches zero. Numerical experiments in multigroup classification show that this new subspace method tends to be more efficient than iterative approaches relying on (partial) eigenvalue decompositions at each step.

引用次数: 0

Subgroup learning for multiple mixed-type outcomes with block-structured covariates

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-12-02 DOI: 10.1016/j.csda.2024.108105

Xun Zhao , Lu Tang , Weijia Zhang , Ling Zhou

The increasing interest in survey research focuses on inferring grouped association patterns between risk factors and questionnaire responses, with grouping shared across multiple response variables that jointly capture one's underlying status. Aiming to identify important risk factors that are simultaneously associated with the health and well-being of senior adults, a study based on the China Health and Retirement Survey (CHRS) is conducted. Previous studies have identified several known risk factors, yet heterogeneity in the outcome-risk factor association exists, prompting the use of subgroup analysis. A subgroup analysis procedure is devised to model a multiple mixed-type outcome which describes one's general health and well-being, while tackling additional challenges including collinearity and weak signals within block-structured covariates. Computationally, an efficient algorithm that alternately updates a set of estimating equations and likelihood functions is proposed. Theoretical results establish the asymptotic consistency and normality of the proposed estimators. The validity of the proposed method is corroborated by simulation experiments. An application of the proposed method to the CHRS data identifies caring for grandchildren as a new risk factor for poor physical and mental health.

{"title":"Subgroup learning for multiple mixed-type outcomes with block-structured covariates","authors":"Xun Zhao , Lu Tang , Weijia Zhang , Ling Zhou","doi":"10.1016/j.csda.2024.108105","DOIUrl":"10.1016/j.csda.2024.108105","url":null,"abstract":"<div><div>The increasing interest in survey research focuses on inferring grouped association patterns between risk factors and questionnaire responses, with grouping shared across multiple response variables that jointly capture one's underlying status. Aiming to identify important risk factors that are simultaneously associated with the health and well-being of senior adults, a study based on the China Health and Retirement Survey (CHRS) is conducted. Previous studies have identified several known risk factors, yet heterogeneity in the outcome-risk factor association exists, prompting the use of subgroup analysis. A subgroup analysis procedure is devised to model a multiple mixed-type outcome which describes one's general health and well-being, while tackling additional challenges including collinearity and weak signals within block-structured covariates. Computationally, an efficient algorithm that alternately updates a set of estimating equations and likelihood functions is proposed. Theoretical results establish the asymptotic consistency and normality of the proposed estimators. The validity of the proposed method is corroborated by simulation experiments. An application of the proposed method to the CHRS data identifies caring for grandchildren as a new risk factor for poor physical and mental health.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108105"},"PeriodicalIF":1.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A copula duration model with dependent states and spells

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-28 DOI: 10.1016/j.csda.2024.108104

Simon M.S. Lo , Shuolin Shi , Ralf A. Wilke

A nested Archimedean copula model for dependent states and spells is introduced and the link to a classical survival model with frailties is established. The model relaxes an important restriction of classical survival models as the distributions of unobservable heterogeneities are permitted to depend on the observable covariates. Its modular structure has practical advantages as the different components can be separately specified and estimation can be done sequentially or separately. This makes the model versatile and adaptable in empirical work. An application to labour market transitions with linked administrative data supports the need for a flexible specification of the dependence structure and the model for the marginal survivals. The conventional Markov Chain Model is shown to give sizeably biased results in the application.

引用次数: 0

Development of a new general class of bivariate distributions based on reversed hazard rate order 基于逆向风险率顺序的一类新的二元分布的发展

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-26 DOI: 10.1016/j.csda.2024.108106

Na Young Yoo , Hyunju Lee , Ji Hwan Cha

Motivated by real data sets to be analyzed in this paper, we develop a new general class of bivariate distributions that can model the effect of the so-called ‘load-sharing configuration’ in a system with two components based on the reversed hazard rate. Under such load-sharing configuration, after the failure of one component, the surviving component has to shoulder extra load, which eventually results in its failure at an earlier time than what is expected under the case of independence. In the developed class of bivariate distributions, it is assumed that the residual lifetime of the remaining component is shortened according to the reversed hazard rate order. We derive the joint survival function, joint probability density function and the marginal distributions. We discuss a bivariate ageing property of the developed class of distributions. Some specific families of bivariate distributions which can be usefully applied in practice are obtained. These families of bivariate distributions are applied to some real data sets to illustrate their usefulness.

在本文要分析的真实数据集的激励下，我们开发了一种新的一般类型的二元分布，它可以基于反向风险率对具有两个组件的系统中所谓的“负载共享配置”的影响进行建模。在这种负载分担配置下，在一个组件失效后，幸存的组件必须承担额外的负载，最终导致该组件的失效时间比独立情况下的预期时间要早。在二元分布的发达类别中，假定剩余成分的剩余寿命按照颠倒的危险率顺序缩短。导出了联合生存函数、联合概率密度函数和边际分布。我们讨论了已发展的一类分布的二元老化性质。得到了一些可以在实际中应用的特殊的二元分布族。这些双变量分布族应用于一些实际数据集来说明它们的有用性。

引用次数: 0

Multi-task optimization with Bayesian neural network surrogates for parameter estimation of a simulation model 利用贝叶斯神经网络代理进行多任务优化，以估算仿真模型参数

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-22 DOI: 10.1016/j.csda.2024.108097

Hyungjin Kim , Chuljin Park , Heeyoung Kim

We propose a novel framework for efficient parameter estimation in simulation models, formulated as an optimization problem that minimizes the discrepancy between physical system observations and simulation model outputs. Our framework, called multi-task optimization with Bayesian neural network surrogates (MOBS), is designed for scenarios that require the simultaneous estimation of multiple sets of parameters, each set corresponding to a distinct set of observations, while also enabling fast parameter estimation essential for real-time process monitoring and control. MOBS integrates a heuristic search algorithm, utilizing a single-layer Bayesian neural network surrogate model trained on an initial simulation dataset. This surrogate model is shared across multiple tasks to select and evaluate candidate parameter values, facilitating efficient multi-task optimization. We provide a closed-form parameter screening rule and demonstrate that the expected number of simulation runs converges to a user-specified threshold. Our framework was applied to a numerical example and a semiconductor manufacturing case study, significantly reducing computational costs while achieving accurate parameter estimation.

我们为仿真模型中的高效参数估计提出了一个新框架，该框架被表述为一个优化问题，可最大限度地减少物理系统观测结果与仿真模型输出结果之间的差异。我们的框架名为 "贝叶斯神经网络代理多任务优化（MOBS）"，专为需要同时估算多组参数（每组参数对应一组不同的观测值）的情况而设计，同时还能实现实时过程监测和控制所必需的快速参数估算。MOBS 集成了启发式搜索算法，利用在初始模拟数据集上训练的单层贝叶斯神经网络代理模型。该代理模型在多个任务中共享，用于选择和评估候选参数值，从而促进高效的多任务优化。我们提供了一个闭式参数筛选规则，并证明预期模拟运行次数会收敛到用户指定的阈值。我们的框架被应用于一个数值示例和一个半导体制造案例研究，在实现精确参数估计的同时大大降低了计算成本。

{"title":"Multi-task optimization with Bayesian neural network surrogates for parameter estimation of a simulation model","authors":"Hyungjin Kim , Chuljin Park , Heeyoung Kim","doi":"10.1016/j.csda.2024.108097","DOIUrl":"10.1016/j.csda.2024.108097","url":null,"abstract":"<div><div>We propose a novel framework for efficient parameter estimation in simulation models, formulated as an optimization problem that minimizes the discrepancy between physical system observations and simulation model outputs. Our framework, called multi-task optimization with Bayesian neural network surrogates (MOBS), is designed for scenarios that require the simultaneous estimation of multiple sets of parameters, each set corresponding to a distinct set of observations, while also enabling fast parameter estimation essential for real-time process monitoring and control. MOBS integrates a heuristic search algorithm, utilizing a single-layer Bayesian neural network surrogate model trained on an initial simulation dataset. This surrogate model is shared across multiple tasks to select and evaluate candidate parameter values, facilitating efficient multi-task optimization. We provide a closed-form parameter screening rule and demonstrate that the expected number of simulation runs converges to a user-specified threshold. Our framework was applied to a numerical example and a semiconductor manufacturing case study, significantly reducing computational costs while achieving accurate parameter estimation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108097"},"PeriodicalIF":1.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal sequential detection by sparsity likelihood 利用稀疏似然法优化顺序检测

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-17 DOI: 10.1016/j.csda.2024.108089

Jingyan Huang, Hock Peng Chan

We propose here a sparsity likelihood stopping rule to detect change-points when there are multiple data streams. It is optimal in the sense of minimizing, asymptotically, the detection delay when the change-points is present in only a small fraction of the data streams. This optimality holds at all levels of change-point sparsity. A key contribution of this paper is that we show optimality when there is extreme sparsity. Extreme sparsity refers to the number of data streams with change-points increasing very slowly as the number of data streams goes to infinity. The theoretical results are backed by a numerical study that shows the sparsity likelihood stopping rule performing well at all levels of sparsity. Applications of the stopping rule on non-normal models are also illustrated here.

在此，我们提出了一种稀疏似然停止规则，用于在存在多个数据流时检测变化点。当变化点只出现在一小部分数据流中时，该规则在渐近最小化检测延迟的意义上是最优的。这种最优性在所有变化点稀疏程度上都成立。本文的一个重要贡献是，我们展示了极端稀疏性时的最优性。所谓极端稀疏性，是指当数据流的数量达到无穷大时，数据流中变化点的数量会非常缓慢地增加。理论结果得到了数值研究的支持，数值研究显示稀疏性似然停止规则在所有稀疏性水平下都表现良好。这里还说明了停止规则在非正态模型中的应用。

引用次数: 0

Inference for the stochastic FitzHugh-Nagumo model from real action potential data via approximate Bayesian computation 通过近似贝叶斯计算从真实动作电位数据推断随机菲茨休-纳古莫模型

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-15 DOI: 10.1016/j.csda.2024.108095

Adeline Samson , Massimiliano Tamborrino , Irene Tubikanec

The stochastic FitzHugh-Nagumo (FHN) model is a two-dimensional nonlinear stochastic differential equation with additive degenerate noise, whose first component, the only one observed, describes the membrane voltage evolution of a single neuron. Due to its low-dimensionality, its analytical and numerical tractability and its neuronal interpretation, it has been used as a case study to test the performance of different statistical methods in estimating the underlying model parameters. Existing methods, however, often require complete observations, non-degeneracy of the noise or a complex architecture (e.g., to estimate the transition density of the process, ‘‘recovering’’ the unobserved second component) and they may not (satisfactorily) estimate all model parameters simultaneously. Moreover, these studies lack real data applications for the stochastic FHN model. The proposed method tackles all challenges (non-globally Lipschitz drift, non-explicit solution, lack of available transition density, degeneracy of the noise and partial observations). It is an intuitive and easy-to-implement sequential Monte Carlo approximate Bayesian computation algorithm, which relies on a recent computationally efficient and structure-preserving numerical splitting scheme for synthetic data generation and on summary statistics exploiting the structural properties of the process. All model parameters are successfully estimated from simulated data and, more remarkably, real action potential data of rats. The presented novel real-data fit may broaden the scope and credibility of this classic and widely used neuronal model.

随机菲茨休-纳古莫（FHN）模型是一个带有加性退化噪声的二维非线性随机微分方程，其第一个分量（唯一观测到的分量）描述了单个神经元的膜电压演变。由于该模型的低维度、分析和数值上的可操作性以及对神经元的解释，该模型已被用作案例研究，以测试不同统计方法在估计基本模型参数方面的性能。然而，现有的方法往往需要完整的观测数据、噪声的非退化性或复杂的结构（例如，估计过程的过渡密度，"恢复 "未观测到的第二分量），而且它们可能无法（令人满意地）同时估计所有模型参数。此外，这些研究缺乏随机 FHN 模型的真实数据应用。所提出的方法可以应对所有挑战（非全局 Lipschitz 漂移、非显式解法、缺乏可用的过渡密度、噪声和部分观测值的退化）。它是一种直观且易于实现的顺序蒙特卡罗近似贝叶斯计算算法，依赖于最新的计算高效且结构保留的数值分裂方案来生成合成数据，并依赖于利用过程结构特性的汇总统计。所有模型参数都能从模拟数据中成功估算出来，更值得一提的是，大鼠的真实动作电位数据也能估算出来。这种新颖的真实数据拟合方法可拓宽这一经典和广泛应用的神经元模型的应用范围和可信度。

{"title":"Inference for the stochastic FitzHugh-Nagumo model from real action potential data via approximate Bayesian computation","authors":"Adeline Samson , Massimiliano Tamborrino , Irene Tubikanec","doi":"10.1016/j.csda.2024.108095","DOIUrl":"10.1016/j.csda.2024.108095","url":null,"abstract":"<div><div>The stochastic FitzHugh-Nagumo (FHN) model is a two-dimensional nonlinear stochastic differential equation with additive degenerate noise, whose first component, the only one observed, describes the membrane voltage evolution of a single neuron. Due to its low-dimensionality, its analytical and numerical tractability and its neuronal interpretation, it has been used as a case study to test the performance of different statistical methods in estimating the underlying model parameters. Existing methods, however, often require complete observations, non-degeneracy of the noise or a complex architecture (e.g., to estimate the transition density of the process, ‘‘recovering’’ the unobserved second component) and they may not (satisfactorily) estimate all model parameters simultaneously. Moreover, these studies lack real data applications for the stochastic FHN model. The proposed method tackles all challenges (non-globally Lipschitz drift, non-explicit solution, lack of available transition density, degeneracy of the noise and partial observations). It is an intuitive and easy-to-implement sequential Monte Carlo approximate Bayesian computation algorithm, which relies on a recent computationally efficient and structure-preserving numerical splitting scheme for synthetic data generation and on summary statistics exploiting the structural properties of the process. All model parameters are successfully estimated from simulated data and, more remarkably, real action potential data of rats. The presented novel real-data fit may broaden the scope and credibility of this classic and widely used neuronal model.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108095"},"PeriodicalIF":1.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-dimensional copula-based Wasserstein dependence 基于 Wasserstein 依赖性的高维协程

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-15 DOI: 10.1016/j.csda.2024.108096

Steven De Keyser, Irène Gijbels

The aim is to generalize 2-Wasserstein dependence coefficients to measure dependence between a finite number of random vectors. This generalization includes theoretical properties, and in particular focuses on an interpretation of maximal dependence and an asymptotic normality result for a proposed semi-parametric estimator under a Gaussian copula assumption. In addition, it is of interest to look at general axioms for dependence measures between multiple random vectors, at plausible normalizations, and at various examples. Afterwards, it is important to study plug-in estimators based on penalized empirical covariance matrices in order to deal with high dimensionality issues and taking possible marginal independencies into account by inducing (block) sparsity. The latter ideas are investigated via a simulation study, considering other dependence coefficients as well. The use of the developed methods is illustrated in two real data applications.

目的是将 2-Wasserstein 依赖系数推广到测量有限数量随机向量之间的依赖性。这种概括包括理论属性，尤其侧重于对最大依赖性的解释，以及在高斯共轭假设下所提出的半参数估计器的渐近正态性结果。此外，研究多个随机向量之间依赖性度量的一般公理、可信的归一化以及各种实例也很有意义。之后，重要的是研究基于惩罚性经验协方差矩阵的插件估计器，以处理高维度问题，并通过诱导（块）稀疏性将可能的边际独立性考虑在内。通过模拟研究，同时考虑其他依赖系数，对后一种想法进行了研究。在两个真实数据应用中说明了所开发方法的用途。

引用次数: 0

Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves 对不规则多变量曲线进行高效的贝叶斯函数主成分分析

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-12 DOI: 10.1016/j.csda.2024.108094

Tui H. Nolan , Sylvia Richardson , Hélène Ruffieux

The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However, it is common for real-world settings to present longitudinal data that are both irregularly and sparsely observed, which introduces important challenges for the current functional data methodology. A Bayesian hierarchical framework for multivariate functional principal component analysis is proposed, which accommodates the intricacies of such irregular observation settings by flexibly pooling information across subjects and correlated curves. The model represents common latent dynamics via shared functional principal component scores, thereby effectively borrowing strength across curves while circumventing the computationally challenging task of estimating covariance matrices. These scores also provide a parsimonious representation of the major modes of joint variation of the curves and constitute interpretable scalar summaries that can be employed in follow-up analyses. Estimation is conducted using variational inference, ensuring that accurate posterior approximation and robust uncertainty quantification are achieved. The algorithm also introduces a novel variational message passing fragment for multivariate functional principal component Gaussian likelihood that enables modularity and reuse across models. Detailed simulations assess the effectiveness of the approach in sharing information from sparse and irregularly sampled multivariate curves. The methodology is also exploited to estimate the molecular disease courses of individual patients with SARS-CoV-2 infection and characterise patient heterogeneity in recovery outcomes; this study reveals key coordinated dynamics across the immune, inflammatory and metabolic systems, which are associated with long-COVID symptoms up to one year post disease onset. The approach is implemented in the R package bayesFPCA.

多变量函数曲线分析有可能在医疗保健、医学、经济学和社会科学等领域产生重要的科学发现。然而，现实世界中常见的纵向数据既不规则又观测稀疏，这给当前的函数数据方法带来了重大挑战。本文提出了一种用于多元函数主成分分析的贝叶斯分层框架，该框架通过灵活地汇集受试者和相关曲线的信息，来适应这种不规则观测环境的复杂性。该模型通过共享的功能主成分得分来表示共同的潜在动态，从而有效地借用曲线间的力量，同时避免了估计协方差矩阵这一具有计算挑战性的任务。这些分数还提供了曲线联合变化主要模式的简明表述，并构成了可在后续分析中使用的可解释的标量总结。使用变异推理进行估计，确保实现精确的后验近似和稳健的不确定性量化。该算法还为多元函数主成分高斯似然引入了一个新颖的变分信息传递片段，实现了模块化和跨模型重用。详细的模拟评估了该方法在共享稀疏和不规则采样多元曲线信息方面的有效性。这项研究揭示了免疫、炎症和新陈代谢系统的关键协调动态，这些系统与发病后一年内的长COVID症状有关。该方法在 R 软件包 bayesFPCA 中实现。

{"title":"Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves","authors":"Tui H. Nolan , Sylvia Richardson , Hélène Ruffieux","doi":"10.1016/j.csda.2024.108094","DOIUrl":"10.1016/j.csda.2024.108094","url":null,"abstract":"<div><div>The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However, it is common for real-world settings to present longitudinal data that are both irregularly and sparsely observed, which introduces important challenges for the current functional data methodology. A Bayesian hierarchical framework for multivariate functional principal component analysis is proposed, which accommodates the intricacies of such irregular observation settings by flexibly pooling information across subjects and correlated curves. The model represents common latent dynamics via shared functional principal component scores, thereby effectively borrowing strength across curves while circumventing the computationally challenging task of estimating covariance matrices. These scores also provide a parsimonious representation of the major modes of joint variation of the curves and constitute interpretable scalar summaries that can be employed in follow-up analyses. Estimation is conducted using variational inference, ensuring that accurate posterior approximation and robust uncertainty quantification are achieved. The algorithm also introduces a novel variational message passing fragment for multivariate functional principal component Gaussian likelihood that enables modularity and reuse across models. Detailed simulations assess the effectiveness of the approach in sharing information from sparse and irregularly sampled multivariate curves. The methodology is also exploited to estimate the molecular disease courses of individual patients with SARS-CoV-2 infection and characterise patient heterogeneity in recovery outcomes; this study reveals key coordinated dynamics across the immune, inflammatory and metabolic systems, which are associated with long-COVID symptoms up to one year post disease onset. The approach is implemented in the R package <span>bayesFPCA</span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108094"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dirichlet process model for directional-linear data with application to bloodstain pattern analysis 应用于血迹模式分析的定向线性数据的狄利克特过程模型

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-11-12 DOI: 10.1016/j.csda.2024.108093

Tong Zou, Hal S. Stern

Directional data require specialized models because of the non-Euclidean nature of their domain. When a directional variable is observed jointly with linear variables, modeling their dependence adds an additional layer of complexity. A Bayesian nonparametric approach is introduced to analyze directional-linear data. Firstly, the projected normal distribution is extended to model the joint distribution of linear variables and a directional variable with arbitrary dimension projected from a higher-dimensional augmented multivariate normal distribution. The new distribution is called the semi-projected normal distribution (SPN) and can be used as the mixture distribution in a Dirichlet process model to obtain a more flexible class of models for directional-linear data. Then, a conditional inverse-Wishart distribution is proposed as part of the prior distribution to address an identifiability issue inherited from the projected normal and preserve conjugacy with the SPN. The SPN mixture model shows superior performance in clustering on synthetic data compared to the semi-wrapped Gaussian model. The experiments show the ability of the SPN mixture model to characterize bloodstain patterns. A hierarchical Dirichlet process model with the SPN distribution is built to estimate the likelihood of bloodstain patterns under a posited causal mechanism for use in a likelihood ratio approach to the analysis of forensic bloodstain pattern evidence.

方向性数据需要专门的模型，因为其领域具有非欧几里得性质。当一个方向性变量与线性变量一起被观测时，对它们的依赖性建模又增加了一层复杂性。本文引入了一种贝叶斯非参数方法来分析方向线性数据。首先，对投影正态分布进行了扩展，以模拟线性变量和任意维度的方向变量的联合分布。新分布被称为半投影正态分布（SPN），可用作 Dirichlet 过程模型中的混合分布，从而获得一类更灵活的方向线性数据模型。然后，提出了一种条件反 Wishart 分布作为先验分布的一部分，以解决投影正态分布遗留下来的可识别性问题，并保持与 SPN 的共轭性。与半包高斯模型相比，SPN 混合模型在合成数据的聚类中表现出更优越的性能。实验表明，SPN 混合物模型有能力描述血迹模式。建立了一个具有 SPN 分布的分层 Dirichlet 过程模型，用于估算血迹模式在假定因果机制下的可能性，并将其用于分析法医血迹模式证据的似然比方法。

{"title":"A Dirichlet process model for directional-linear data with application to bloodstain pattern analysis","authors":"Tong Zou, Hal S. Stern","doi":"10.1016/j.csda.2024.108093","DOIUrl":"10.1016/j.csda.2024.108093","url":null,"abstract":"<div><div>Directional data require specialized models because of the non-Euclidean nature of their domain. When a directional variable is observed jointly with linear variables, modeling their dependence adds an additional layer of complexity. A Bayesian nonparametric approach is introduced to analyze directional-linear data. Firstly, the projected normal distribution is extended to model the joint distribution of linear variables and a directional variable with arbitrary dimension projected from a higher-dimensional augmented multivariate normal distribution. The new distribution is called the semi-projected normal distribution (SPN) and can be used as the mixture distribution in a Dirichlet process model to obtain a more flexible class of models for directional-linear data. Then, a conditional inverse-Wishart distribution is proposed as part of the prior distribution to address an identifiability issue inherited from the projected normal and preserve conjugacy with the SPN. The SPN mixture model shows superior performance in clustering on synthetic data compared to the semi-wrapped Gaussian model. The experiments show the ability of the SPN mixture model to characterize bloodstain patterns. A hierarchical Dirichlet process model with the SPN distribution is built to estimate the likelihood of bloodstain patterns under a posited causal mechanism for use in a likelihood ratio approach to the analysis of forensic bloodstain pattern evidence.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108093"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computational Statistics & Data Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀