Pub Date : 2024-11-26DOI: 10.1016/j.csda.2024.108106
Na Young Yoo , Hyunju Lee , Ji Hwan Cha
Motivated by real data sets to be analyzed in this paper, we develop a new general class of bivariate distributions that can model the effect of the so-called ‘load-sharing configuration’ in a system with two components based on the reversed hazard rate. Under such load-sharing configuration, after the failure of one component, the surviving component has to shoulder extra load, which eventually results in its failure at an earlier time than what is expected under the case of independence. In the developed class of bivariate distributions, it is assumed that the residual lifetime of the remaining component is shortened according to the reversed hazard rate order. We derive the joint survival function, joint probability density function and the marginal distributions. We discuss a bivariate ageing property of the developed class of distributions. Some specific families of bivariate distributions which can be usefully applied in practice are obtained. These families of bivariate distributions are applied to some real data sets to illustrate their usefulness.
{"title":"Development of a new general class of bivariate distributions based on reversed hazard rate order","authors":"Na Young Yoo , Hyunju Lee , Ji Hwan Cha","doi":"10.1016/j.csda.2024.108106","DOIUrl":"10.1016/j.csda.2024.108106","url":null,"abstract":"<div><div>Motivated by real data sets to be analyzed in this paper, we develop a new general class of bivariate distributions that can model the effect of the so-called ‘load-sharing configuration’ in a system with two components based on the reversed hazard rate. Under such load-sharing configuration, after the failure of one component, the surviving component has to shoulder extra load, which eventually results in its failure at an earlier time than what is expected under the case of independence. In the developed class of bivariate distributions, it is assumed that the residual lifetime of the remaining component is shortened according to the reversed hazard rate order. We derive the joint survival function, joint probability density function and the marginal distributions. We discuss a bivariate ageing property of the developed class of distributions. Some specific families of bivariate distributions which can be usefully applied in practice are obtained. These families of bivariate distributions are applied to some real data sets to illustrate their usefulness.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108106"},"PeriodicalIF":1.5,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142747095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-22DOI: 10.1016/j.csda.2024.108097
Hyungjin Kim , Chuljin Park , Heeyoung Kim
We propose a novel framework for efficient parameter estimation in simulation models, formulated as an optimization problem that minimizes the discrepancy between physical system observations and simulation model outputs. Our framework, called multi-task optimization with Bayesian neural network surrogates (MOBS), is designed for scenarios that require the simultaneous estimation of multiple sets of parameters, each set corresponding to a distinct set of observations, while also enabling fast parameter estimation essential for real-time process monitoring and control. MOBS integrates a heuristic search algorithm, utilizing a single-layer Bayesian neural network surrogate model trained on an initial simulation dataset. This surrogate model is shared across multiple tasks to select and evaluate candidate parameter values, facilitating efficient multi-task optimization. We provide a closed-form parameter screening rule and demonstrate that the expected number of simulation runs converges to a user-specified threshold. Our framework was applied to a numerical example and a semiconductor manufacturing case study, significantly reducing computational costs while achieving accurate parameter estimation.
{"title":"Multi-task optimization with Bayesian neural network surrogates for parameter estimation of a simulation model","authors":"Hyungjin Kim , Chuljin Park , Heeyoung Kim","doi":"10.1016/j.csda.2024.108097","DOIUrl":"10.1016/j.csda.2024.108097","url":null,"abstract":"<div><div>We propose a novel framework for efficient parameter estimation in simulation models, formulated as an optimization problem that minimizes the discrepancy between physical system observations and simulation model outputs. Our framework, called multi-task optimization with Bayesian neural network surrogates (MOBS), is designed for scenarios that require the simultaneous estimation of multiple sets of parameters, each set corresponding to a distinct set of observations, while also enabling fast parameter estimation essential for real-time process monitoring and control. MOBS integrates a heuristic search algorithm, utilizing a single-layer Bayesian neural network surrogate model trained on an initial simulation dataset. This surrogate model is shared across multiple tasks to select and evaluate candidate parameter values, facilitating efficient multi-task optimization. We provide a closed-form parameter screening rule and demonstrate that the expected number of simulation runs converges to a user-specified threshold. Our framework was applied to a numerical example and a semiconductor manufacturing case study, significantly reducing computational costs while achieving accurate parameter estimation.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108097"},"PeriodicalIF":1.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-17DOI: 10.1016/j.csda.2024.108089
Jingyan Huang, Hock Peng Chan
We propose here a sparsity likelihood stopping rule to detect change-points when there are multiple data streams. It is optimal in the sense of minimizing, asymptotically, the detection delay when the change-points is present in only a small fraction of the data streams. This optimality holds at all levels of change-point sparsity. A key contribution of this paper is that we show optimality when there is extreme sparsity. Extreme sparsity refers to the number of data streams with change-points increasing very slowly as the number of data streams goes to infinity. The theoretical results are backed by a numerical study that shows the sparsity likelihood stopping rule performing well at all levels of sparsity. Applications of the stopping rule on non-normal models are also illustrated here.
{"title":"Optimal sequential detection by sparsity likelihood","authors":"Jingyan Huang, Hock Peng Chan","doi":"10.1016/j.csda.2024.108089","DOIUrl":"10.1016/j.csda.2024.108089","url":null,"abstract":"<div><div>We propose here a sparsity likelihood stopping rule to detect change-points when there are multiple data streams. It is optimal in the sense of minimizing, asymptotically, the detection delay when the change-points is present in only a small fraction of the data streams. This optimality holds at all levels of change-point sparsity. A key contribution of this paper is that we show optimality when there is extreme sparsity. Extreme sparsity refers to the number of data streams with change-points increasing very slowly as the number of data streams goes to infinity. The theoretical results are backed by a numerical study that shows the sparsity likelihood stopping rule performing well at all levels of sparsity. Applications of the stopping rule on non-normal models are also illustrated here.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108089"},"PeriodicalIF":1.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The stochastic FitzHugh-Nagumo (FHN) model is a two-dimensional nonlinear stochastic differential equation with additive degenerate noise, whose first component, the only one observed, describes the membrane voltage evolution of a single neuron. Due to its low-dimensionality, its analytical and numerical tractability and its neuronal interpretation, it has been used as a case study to test the performance of different statistical methods in estimating the underlying model parameters. Existing methods, however, often require complete observations, non-degeneracy of the noise or a complex architecture (e.g., to estimate the transition density of the process, ‘‘recovering’’ the unobserved second component) and they may not (satisfactorily) estimate all model parameters simultaneously. Moreover, these studies lack real data applications for the stochastic FHN model. The proposed method tackles all challenges (non-globally Lipschitz drift, non-explicit solution, lack of available transition density, degeneracy of the noise and partial observations). It is an intuitive and easy-to-implement sequential Monte Carlo approximate Bayesian computation algorithm, which relies on a recent computationally efficient and structure-preserving numerical splitting scheme for synthetic data generation and on summary statistics exploiting the structural properties of the process. All model parameters are successfully estimated from simulated data and, more remarkably, real action potential data of rats. The presented novel real-data fit may broaden the scope and credibility of this classic and widely used neuronal model.
{"title":"Inference for the stochastic FitzHugh-Nagumo model from real action potential data via approximate Bayesian computation","authors":"Adeline Samson , Massimiliano Tamborrino , Irene Tubikanec","doi":"10.1016/j.csda.2024.108095","DOIUrl":"10.1016/j.csda.2024.108095","url":null,"abstract":"<div><div>The stochastic FitzHugh-Nagumo (FHN) model is a two-dimensional nonlinear stochastic differential equation with additive degenerate noise, whose first component, the only one observed, describes the membrane voltage evolution of a single neuron. Due to its low-dimensionality, its analytical and numerical tractability and its neuronal interpretation, it has been used as a case study to test the performance of different statistical methods in estimating the underlying model parameters. Existing methods, however, often require complete observations, non-degeneracy of the noise or a complex architecture (e.g., to estimate the transition density of the process, ‘‘recovering’’ the unobserved second component) and they may not (satisfactorily) estimate all model parameters simultaneously. Moreover, these studies lack real data applications for the stochastic FHN model. The proposed method tackles all challenges (non-globally Lipschitz drift, non-explicit solution, lack of available transition density, degeneracy of the noise and partial observations). It is an intuitive and easy-to-implement sequential Monte Carlo approximate Bayesian computation algorithm, which relies on a recent computationally efficient and structure-preserving numerical splitting scheme for synthetic data generation and on summary statistics exploiting the structural properties of the process. All model parameters are successfully estimated from simulated data and, more remarkably, real action potential data of rats. The presented novel real-data fit may broaden the scope and credibility of this classic and widely used neuronal model.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108095"},"PeriodicalIF":1.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.csda.2024.108096
Steven De Keyser, Irène Gijbels
The aim is to generalize 2-Wasserstein dependence coefficients to measure dependence between a finite number of random vectors. This generalization includes theoretical properties, and in particular focuses on an interpretation of maximal dependence and an asymptotic normality result for a proposed semi-parametric estimator under a Gaussian copula assumption. In addition, it is of interest to look at general axioms for dependence measures between multiple random vectors, at plausible normalizations, and at various examples. Afterwards, it is important to study plug-in estimators based on penalized empirical covariance matrices in order to deal with high dimensionality issues and taking possible marginal independencies into account by inducing (block) sparsity. The latter ideas are investigated via a simulation study, considering other dependence coefficients as well. The use of the developed methods is illustrated in two real data applications.
{"title":"High-dimensional copula-based Wasserstein dependence","authors":"Steven De Keyser, Irène Gijbels","doi":"10.1016/j.csda.2024.108096","DOIUrl":"10.1016/j.csda.2024.108096","url":null,"abstract":"<div><div>The aim is to generalize 2-Wasserstein dependence coefficients to measure dependence between a finite number of random vectors. This generalization includes theoretical properties, and in particular focuses on an interpretation of maximal dependence and an asymptotic normality result for a proposed semi-parametric estimator under a Gaussian copula assumption. In addition, it is of interest to look at general axioms for dependence measures between multiple random vectors, at plausible normalizations, and at various examples. Afterwards, it is important to study plug-in estimators based on penalized empirical covariance matrices in order to deal with high dimensionality issues and taking possible marginal independencies into account by inducing (block) sparsity. The latter ideas are investigated via a simulation study, considering other dependence coefficients as well. The use of the developed methods is illustrated in two real data applications.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108096"},"PeriodicalIF":1.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.csda.2024.108094
Tui H. Nolan , Sylvia Richardson , Hélène Ruffieux
The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However, it is common for real-world settings to present longitudinal data that are both irregularly and sparsely observed, which introduces important challenges for the current functional data methodology. A Bayesian hierarchical framework for multivariate functional principal component analysis is proposed, which accommodates the intricacies of such irregular observation settings by flexibly pooling information across subjects and correlated curves. The model represents common latent dynamics via shared functional principal component scores, thereby effectively borrowing strength across curves while circumventing the computationally challenging task of estimating covariance matrices. These scores also provide a parsimonious representation of the major modes of joint variation of the curves and constitute interpretable scalar summaries that can be employed in follow-up analyses. Estimation is conducted using variational inference, ensuring that accurate posterior approximation and robust uncertainty quantification are achieved. The algorithm also introduces a novel variational message passing fragment for multivariate functional principal component Gaussian likelihood that enables modularity and reuse across models. Detailed simulations assess the effectiveness of the approach in sharing information from sparse and irregularly sampled multivariate curves. The methodology is also exploited to estimate the molecular disease courses of individual patients with SARS-CoV-2 infection and characterise patient heterogeneity in recovery outcomes; this study reveals key coordinated dynamics across the immune, inflammatory and metabolic systems, which are associated with long-COVID symptoms up to one year post disease onset. The approach is implemented in the R package bayesFPCA.
多变量函数曲线分析有可能在医疗保健、医学、经济学和社会科学等领域产生重要的科学发现。然而,现实世界中常见的纵向数据既不规则又观测稀疏,这给当前的函数数据方法带来了重大挑战。本文提出了一种用于多元函数主成分分析的贝叶斯分层框架,该框架通过灵活地汇集受试者和相关曲线的信息,来适应这种不规则观测环境的复杂性。该模型通过共享的功能主成分得分来表示共同的潜在动态,从而有效地借用曲线间的力量,同时避免了估计协方差矩阵这一具有计算挑战性的任务。这些分数还提供了曲线联合变化主要模式的简明表述,并构成了可在后续分析中使用的可解释的标量总结。使用变异推理进行估计,确保实现精确的后验近似和稳健的不确定性量化。该算法还为多元函数主成分高斯似然引入了一个新颖的变分信息传递片段,实现了模块化和跨模型重用。详细的模拟评估了该方法在共享稀疏和不规则采样多元曲线信息方面的有效性。这项研究揭示了免疫、炎症和新陈代谢系统的关键协调动态,这些系统与发病后一年内的长COVID症状有关。该方法在 R 软件包 bayesFPCA 中实现。
{"title":"Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves","authors":"Tui H. Nolan , Sylvia Richardson , Hélène Ruffieux","doi":"10.1016/j.csda.2024.108094","DOIUrl":"10.1016/j.csda.2024.108094","url":null,"abstract":"<div><div>The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However, it is common for real-world settings to present longitudinal data that are both irregularly and sparsely observed, which introduces important challenges for the current functional data methodology. A Bayesian hierarchical framework for multivariate functional principal component analysis is proposed, which accommodates the intricacies of such irregular observation settings by flexibly pooling information across subjects and correlated curves. The model represents common latent dynamics via shared functional principal component scores, thereby effectively borrowing strength across curves while circumventing the computationally challenging task of estimating covariance matrices. These scores also provide a parsimonious representation of the major modes of joint variation of the curves and constitute interpretable scalar summaries that can be employed in follow-up analyses. Estimation is conducted using variational inference, ensuring that accurate posterior approximation and robust uncertainty quantification are achieved. The algorithm also introduces a novel variational message passing fragment for multivariate functional principal component Gaussian likelihood that enables modularity and reuse across models. Detailed simulations assess the effectiveness of the approach in sharing information from sparse and irregularly sampled multivariate curves. The methodology is also exploited to estimate the molecular disease courses of individual patients with SARS-CoV-2 infection and characterise patient heterogeneity in recovery outcomes; this study reveals key coordinated dynamics across the immune, inflammatory and metabolic systems, which are associated with long-COVID symptoms up to one year post disease onset. The approach is implemented in the R package <span>bayesFPCA</span>.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108094"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.csda.2024.108093
Tong Zou, Hal S. Stern
Directional data require specialized models because of the non-Euclidean nature of their domain. When a directional variable is observed jointly with linear variables, modeling their dependence adds an additional layer of complexity. A Bayesian nonparametric approach is introduced to analyze directional-linear data. Firstly, the projected normal distribution is extended to model the joint distribution of linear variables and a directional variable with arbitrary dimension projected from a higher-dimensional augmented multivariate normal distribution. The new distribution is called the semi-projected normal distribution (SPN) and can be used as the mixture distribution in a Dirichlet process model to obtain a more flexible class of models for directional-linear data. Then, a conditional inverse-Wishart distribution is proposed as part of the prior distribution to address an identifiability issue inherited from the projected normal and preserve conjugacy with the SPN. The SPN mixture model shows superior performance in clustering on synthetic data compared to the semi-wrapped Gaussian model. The experiments show the ability of the SPN mixture model to characterize bloodstain patterns. A hierarchical Dirichlet process model with the SPN distribution is built to estimate the likelihood of bloodstain patterns under a posited causal mechanism for use in a likelihood ratio approach to the analysis of forensic bloodstain pattern evidence.
{"title":"A Dirichlet process model for directional-linear data with application to bloodstain pattern analysis","authors":"Tong Zou, Hal S. Stern","doi":"10.1016/j.csda.2024.108093","DOIUrl":"10.1016/j.csda.2024.108093","url":null,"abstract":"<div><div>Directional data require specialized models because of the non-Euclidean nature of their domain. When a directional variable is observed jointly with linear variables, modeling their dependence adds an additional layer of complexity. A Bayesian nonparametric approach is introduced to analyze directional-linear data. Firstly, the projected normal distribution is extended to model the joint distribution of linear variables and a directional variable with arbitrary dimension projected from a higher-dimensional augmented multivariate normal distribution. The new distribution is called the semi-projected normal distribution (SPN) and can be used as the mixture distribution in a Dirichlet process model to obtain a more flexible class of models for directional-linear data. Then, a conditional inverse-Wishart distribution is proposed as part of the prior distribution to address an identifiability issue inherited from the projected normal and preserve conjugacy with the SPN. The SPN mixture model shows superior performance in clustering on synthetic data compared to the semi-wrapped Gaussian model. The experiments show the ability of the SPN mixture model to characterize bloodstain patterns. A hierarchical Dirichlet process model with the SPN distribution is built to estimate the likelihood of bloodstain patterns under a posited causal mechanism for use in a likelihood ratio approach to the analysis of forensic bloodstain pattern evidence.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108093"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.csda.2024.108091
Ayushi Saxena, Vince Lyzinski
Two-sample network hypothesis testing is an important inference task with applications across diverse fields such as medicine, neuroscience, and sociology. Many of these testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. This assumption is often untrue, and the power of the subsequent test can degrade when there are misaligned/label-shuffled vertices across networks. This power loss due to shuffling is theoretically explored in the context of random dot product and stochastic block model networks for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where the power loss across multiple recently proposed tests in the literature is considered. Lastly, the impact that shuffling can have in real-data testing is demonstrated in a pair of examples from neuroscience and from social network analysis.
{"title":"Lost in the shuffle: Testing power in the presence of errorful network vertex labels","authors":"Ayushi Saxena, Vince Lyzinski","doi":"10.1016/j.csda.2024.108091","DOIUrl":"10.1016/j.csda.2024.108091","url":null,"abstract":"<div><div>Two-sample network hypothesis testing is an important inference task with applications across diverse fields such as medicine, neuroscience, and sociology. Many of these testing methodologies operate under the implicit assumption that the vertex correspondence across networks is a priori known. This assumption is often untrue, and the power of the subsequent test can degrade when there are misaligned/label-shuffled vertices across networks. This power loss due to shuffling is theoretically explored in the context of random dot product and stochastic block model networks for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices. The loss in testing power is further reinforced by numerous simulations and experiments, both in the stochastic block model and in the random dot product graph model, where the power loss across multiple recently proposed tests in the literature is considered. Lastly, the impact that shuffling can have in real-data testing is demonstrated in a pair of examples from neuroscience and from social network analysis.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"204 ","pages":"Article 108091"},"PeriodicalIF":1.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.csda.2024.108080
Lengyang Wang , Mingke Zhang
Dengue fever is one of the most common mosquito-borne infectious diseases in tropical regions. Understanding the dynamics of dengue transmission can help provide timely early warnings, thereby reducing mortality. However, previous studies have failed to simulate faithfully dengue dynamics and answer questions pertinent to outbreaks. By incorporating environmental factors into a time-series-susceptible-infectious-recovered (TSIR) model, a new substantive model, to analyze their impact on transmission, is proposed. The newly proposed environmental-time-series-susceptible-infectious-recovered (ETSIR) model can highlight statistically their significance on dengue transmission, thus providing deeper insight into the transmission and addressing several epidemiological puzzles.
{"title":"Statistical modeling of Dengue transmission dynamics with environmental factors","authors":"Lengyang Wang , Mingke Zhang","doi":"10.1016/j.csda.2024.108080","DOIUrl":"10.1016/j.csda.2024.108080","url":null,"abstract":"<div><div>Dengue fever is one of the most common mosquito-borne infectious diseases in tropical regions. Understanding the dynamics of dengue transmission can help provide timely early warnings, thereby reducing mortality. However, previous studies have failed to simulate faithfully dengue dynamics and answer questions pertinent to outbreaks. By incorporating environmental factors into a time-series-susceptible-infectious-recovered (TSIR) model, a new substantive model, to analyze their impact on transmission, is proposed. The newly proposed environmental-time-series-susceptible-infectious-recovered (ETSIR) model can highlight statistically their significance on dengue transmission, thus providing deeper insight into the transmission and addressing several epidemiological puzzles.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108080"},"PeriodicalIF":1.5,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.csda.2024.108077
Xueru Zhang , Dennis K.J. Lin , Min-Qian Liu , Jianbin Chen
The order-of-addition (OofA) experiment involves arranging components in a specific order to optimize a certain objective, which is attracting a great deal of attention in many disciplines, especially in the areas of biochemistry, scheduling, and engineering. Recent studies have highlighted its significance, and notable works have aimed to address NP-hard OofA problems from a statistical perspective. However, solving OofA problems presents challenges due to their complex nature and the presence of uncertainty, such as scheduling problems with uncertain processing times. These uncertainties affect processing times, which are not known with certainty in advance. They introduce heteroscedasticity into OofA experiments, where different orders result in varying dispersions. To address these challenges, a unified framework is proposed to analyze scheduling problems without making specific assumptions about the distribution of these certainties. It encompasses model development and optimization, encapsulating existing homoscedastic studies (where different orders produce the same dispersion value) as a specific instance. For heteroscedastic cases, a dual response optimization within an uncertainty set is proposed, aiming to minimize the dispersion of response while keeping the location of response with a predefined target value. However, solving the proposed non-linear minimax optimization is rather challenging. An equivalent optimization formulation with low computational cost is proposed for solving such a challenging problem. Theoretical supports are established to ensure the tractability of the proposed method. Simulation studies are conducted to demonstrate the effectiveness of the proposed approach. With its solid theoretical support, ease of implementation, and ability to find an optimal order, the proposed approach offers a practical and competitive solution to solving general order-of-addition problems.
{"title":"Analysis of order-of-addition experiments","authors":"Xueru Zhang , Dennis K.J. Lin , Min-Qian Liu , Jianbin Chen","doi":"10.1016/j.csda.2024.108077","DOIUrl":"10.1016/j.csda.2024.108077","url":null,"abstract":"<div><div>The order-of-addition (OofA) experiment involves arranging components in a specific order to optimize a certain objective, which is attracting a great deal of attention in many disciplines, especially in the areas of biochemistry, scheduling, and engineering. Recent studies have highlighted its significance, and notable works have aimed to address NP-hard OofA problems from a statistical perspective. However, solving OofA problems presents challenges due to their complex nature and the presence of uncertainty, such as scheduling problems with uncertain processing times. These uncertainties affect processing times, which are not known with certainty in advance. They introduce heteroscedasticity into OofA experiments, where different orders result in varying dispersions. To address these challenges, a unified framework is proposed to analyze scheduling problems without making specific assumptions about the distribution of these certainties. It encompasses model development and optimization, encapsulating existing homoscedastic studies (where different orders produce the same dispersion value) as a specific instance. For heteroscedastic cases, a dual response optimization within an uncertainty set is proposed, aiming to minimize the dispersion of response while keeping the location of response with a predefined target value. However, solving the proposed non-linear minimax optimization is rather challenging. An equivalent optimization formulation with low computational cost is proposed for solving such a challenging problem. Theoretical supports are established to ensure the tractability of the proposed method. Simulation studies are conducted to demonstrate the effectiveness of the proposed approach. With its solid theoretical support, ease of implementation, and ability to find an optimal order, the proposed approach offers a practical and competitive solution to solving general order-of-addition problems.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"203 ","pages":"Article 108077"},"PeriodicalIF":1.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}