首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Adaptive-to-sub-null testing for mediation effects in structural equation models 结构方程模型中中介效应的自适应亚零检验
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-20 DOI: 10.1016/j.csda.2025.108205
Jiaqi Huang , Chuyun Ye , Lixing Zhu
To effectively implement large-scale hypothesis testing of causal mediation effects and control false discovery rate (FDR) for linear structural equation models, this paper proposes an Adaptive-to-Sub-Null test (AtST) tailored specifically for the assessment of multidimensional mediation effects. The significant distinction of AtST from existing methods is that for every mediator, the weak limits of the test statistic under all mutually exclusive sub-null hypotheses uniformly conform to a chi-square distribution with one degree of freedom. Therefore, in the asymptotic sense, the significance level can be maintained and the p-values can be computed easily without any other prior information on the sub-null hypotheses or resampling technique. In theoretical investigations, we extend existing parameter estimation methods by allowing lower sparsity level in high-dimensional covariate vectors. These results offer a solid base for better FDR control by directly applying the classical Storey's method. We also apply a data-driven approach for selecting the tuning parameter of Storey's estimator. Simulations are conducted to demonstrate the efficacy and validity of the AtST, complemented by an analytical exploration of a genuine dataset for illustration.
为了有效地对线性结构方程模型的因果中介效应进行大规模假设检验,控制错误发现率(FDR),本文提出了一种专门针对多维中介效应评估的自适应次零检验(AtST)。AtST与现有方法的显著区别在于,对于每个中介,在所有互异的亚零假设下,检验统计量的弱限均匀地符合一个自由度的卡方分布。因此,在渐近意义上,显著性水平可以保持,p值可以很容易地计算,而不需要任何其他关于亚零假设或重采样技术的先验信息。在理论研究中,我们扩展了现有的参数估计方法,允许高维协变量向量的更低稀疏度水平。这些结果为直接应用经典的Storey方法更好地控制FDR提供了坚实的基础。我们还采用数据驱动的方法来选择Storey估计器的调优参数。进行模拟以证明AtST的有效性和有效性,并辅以对真实数据集的分析探索来说明。
{"title":"Adaptive-to-sub-null testing for mediation effects in structural equation models","authors":"Jiaqi Huang ,&nbsp;Chuyun Ye ,&nbsp;Lixing Zhu","doi":"10.1016/j.csda.2025.108205","DOIUrl":"10.1016/j.csda.2025.108205","url":null,"abstract":"<div><div>To effectively implement large-scale hypothesis testing of causal mediation effects and control false discovery rate (FDR) for linear structural equation models, this paper proposes an Adaptive-to-Sub-Null test (AtST) tailored specifically for the assessment of multidimensional mediation effects. The significant distinction of AtST from existing methods is that for every mediator, the weak limits of the test statistic under all mutually exclusive sub-null hypotheses uniformly conform to a chi-square distribution with one degree of freedom. Therefore, in the asymptotic sense, the significance level can be maintained and the <em>p</em>-values can be computed easily without any other prior information on the sub-null hypotheses or resampling technique. In theoretical investigations, we extend existing parameter estimation methods by allowing lower sparsity level in high-dimensional covariate vectors. These results offer a solid base for better FDR control by directly applying the classical Storey's method. We also apply a data-driven approach for selecting the tuning parameter of Storey's estimator. Simulations are conducted to demonstrate the efficacy and validity of the AtST, complemented by an analytical exploration of a genuine dataset for illustration.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108205"},"PeriodicalIF":1.5,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exact statistical analysis for response-adaptive clinical trials: A general and computationally tractable approach 反应适应性临床试验的精确统计分析:一种通用的、可计算的方法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-19 DOI: 10.1016/j.csda.2025.108207
Stef Baas , Peter Jacko , Sofía S. Villar
Response-adaptive clinical trial designs allow targeting a given objective by skewing the allocation of participants to treatments based on observed outcomes. Response-adaptive designs face greater regulatory scrutiny due to potential type I error rate inflation, which limits their uptake in practice. Existing approaches for type I error control either only work for specific designs, have a risk of Monte Carlo/approximation error, are conservative, or computationally intractable. To this end, a general and computationally tractable approach is developed for exact analysis in two-arm response-adaptive designs with binary outcomes. This approach can construct exact tests for designs using either a randomized or deterministic response-adaptive procedure. The constructed conditional and unconditional exact tests generalize Fisher's and Barnard's exact tests, respectively. Furthermore, the approach allows for complexities such as delayed outcomes, early stopping, or allocation of participants in blocks. The efficient implementation of forward recursion allows for testing of two-arm trials with 1,000 participants on a standard computer. Through an illustrative computational study of trials using randomized dynamic programming it is shown that, contrary to what is known for equal allocation, the conditional exact Wald test based on total successes has, almost uniformly, higher power than the unconditional exact Wald test. Two real-world trials with the above-mentioned complexities are re-analyzed to demonstrate the value of the new approach in controlling type I errors and/or improving the statistical power.
反应适应性临床试验设计允许通过根据观察到的结果扭曲参与者的治疗分配来针对给定的目标。由于潜在的I型错误率膨胀,响应自适应设计面临更严格的监管审查,这限制了它们在实践中的应用。现有的I类误差控制方法要么只适用于特定的设计,要么有蒙特卡罗/近似误差的风险,要么是保守的,要么是计算上难以处理的。为此,在具有二元结果的双臂响应自适应设计中,开发了一种通用且计算易于处理的方法来进行精确分析。这种方法可以使用随机或确定性响应-自适应程序为设计构建精确的测试。构造的条件和无条件精确检验分别推广了Fisher和Barnard的精确检验。此外,该方法允许诸如延迟结果、提前停止或在块中分配参与者等复杂性。前向递归的有效实现允许在标准计算机上测试1,000名参与者的双臂试验。通过使用随机动态规划的试验的说明性计算研究表明,与已知的平均分配相反,基于总成功的条件精确沃尔德检验几乎一致地比无条件精确沃尔德检验具有更高的功率。重新分析了具有上述复杂性的两个现实世界试验,以证明新方法在控制I型误差和/或提高统计能力方面的价值。
{"title":"Exact statistical analysis for response-adaptive clinical trials: A general and computationally tractable approach","authors":"Stef Baas ,&nbsp;Peter Jacko ,&nbsp;Sofía S. Villar","doi":"10.1016/j.csda.2025.108207","DOIUrl":"10.1016/j.csda.2025.108207","url":null,"abstract":"<div><div>Response-adaptive clinical trial designs allow targeting a given objective by skewing the allocation of participants to treatments based on observed outcomes. Response-adaptive designs face greater regulatory scrutiny due to potential type I error rate inflation, which limits their uptake in practice. Existing approaches for type I error control either only work for specific designs, have a risk of Monte Carlo/approximation error, are conservative, or computationally intractable. To this end, a general and computationally tractable approach is developed for exact analysis in two-arm response-adaptive designs with binary outcomes. This approach can construct exact tests for designs using either a randomized or deterministic response-adaptive procedure. The constructed conditional and unconditional exact tests generalize Fisher's and Barnard's exact tests, respectively. Furthermore, the approach allows for complexities such as delayed outcomes, early stopping, or allocation of participants in blocks. The efficient implementation of forward recursion allows for testing of two-arm trials with 1,000 participants on a standard computer. Through an illustrative computational study of trials using randomized dynamic programming it is shown that, contrary to what is known for equal allocation, the conditional exact Wald test based on total successes has, almost uniformly, higher power than the unconditional exact Wald test. Two real-world trials with the above-mentioned complexities are re-analyzed to demonstrate the value of the new approach in controlling type I errors and/or improving the statistical power.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108207"},"PeriodicalIF":1.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144099882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dirichlet stochastic block model for composition-weighted networks 组合加权网络的Dirichlet随机块模型
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-16 DOI: 10.1016/j.csda.2025.108204
Iuliia Promskaia , Adrian O'Hagan , Michael Fop
Network data are prevalent in applications where individual entities interact with each other, and often these interactions have associated weights representing the strength of association. Clustering such weighted network data is a common task, which involves identifying groups of nodes that display similarities in the way they interact. However, traditional clustering methods typically use edge weights in their raw form, overlooking that the observed weights are influenced by the nodes' capacities to distribute weights along the edges. This can lead to clustering results that primarily reflect nodes' total weight capacities rather than the specific interactions between them. One way to address this issue is to analyse the strengths of connections in relative rather than absolute terms, by transforming the relational weights into a compositional format. This approach expresses each edge weight as a proportion of the sending or receiving weight capacity of the respective node. To cluster these data, a Dirichlet stochastic block model tailored for composition-weighted networks is proposed. The model relies on direct modelling of compositional weight vectors using a Dirichlet mixture, where parameters are determined by the cluster labels of sender and receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm, expressing the complete data likelihood of each node as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to determine the optimal number of clusters. The proposed approach is validated through simulation studies, and its practical utility is illustrated on two real-world networks.
网络数据普遍存在于各个实体相互交互的应用程序中,并且这些交互通常具有表示关联强度的关联权重。对这种加权网络数据进行聚类是一项常见的任务,它涉及识别在交互方式上显示相似性的节点组。然而,传统的聚类方法通常使用原始形式的边权值,忽略了观察到的权值受到节点沿边分布权值的能力的影响。这可能导致聚类结果主要反映节点的总权重容量,而不是节点之间的特定交互。解决这个问题的一种方法是通过将关系权重转换为组合格式,以相对而不是绝对的方式分析连接的强度。这种方法将每个边的权重表示为各自节点的发送或接收权重容量的比例。为了对这些数据进行聚类,提出了一种适合于组合加权网络的Dirichlet随机块模型。该模型依赖于使用Dirichlet混合物的组合权重向量的直接建模,其中参数由发送方和接收方节点的聚类标签确定。通过扩展分类期望最大化算法实现推理,将每个节点的完整数据似然表示为剩余节点的固定聚类标签的函数。导出了一个模型选择准则来确定最优簇数。通过仿真研究验证了该方法的有效性,并在两个实际网络中说明了该方法的实用性。
{"title":"A Dirichlet stochastic block model for composition-weighted networks","authors":"Iuliia Promskaia ,&nbsp;Adrian O'Hagan ,&nbsp;Michael Fop","doi":"10.1016/j.csda.2025.108204","DOIUrl":"10.1016/j.csda.2025.108204","url":null,"abstract":"<div><div>Network data are prevalent in applications where individual entities interact with each other, and often these interactions have associated weights representing the strength of association. Clustering such weighted network data is a common task, which involves identifying groups of nodes that display similarities in the way they interact. However, traditional clustering methods typically use edge weights in their raw form, overlooking that the observed weights are influenced by the nodes' capacities to distribute weights along the edges. This can lead to clustering results that primarily reflect nodes' total weight capacities rather than the specific interactions between them. One way to address this issue is to analyse the strengths of connections in relative rather than absolute terms, by transforming the relational weights into a compositional format. This approach expresses each edge weight as a proportion of the sending or receiving weight capacity of the respective node. To cluster these data, a Dirichlet stochastic block model tailored for composition-weighted networks is proposed. The model relies on direct modelling of compositional weight vectors using a Dirichlet mixture, where parameters are determined by the cluster labels of sender and receiver nodes. Inference is implemented via an extension of the classification expectation-maximisation algorithm, expressing the complete data likelihood of each node as a function of fixed cluster labels of the remaining nodes. A model selection criterion is derived to determine the optimal number of clusters. The proposed approach is validated through simulation studies, and its practical utility is illustrated on two real-world networks.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108204"},"PeriodicalIF":1.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Penalized maximum likelihood estimation with nonparametric Gaussian scale mixture errors 非参数高斯尺度混合误差的惩罚最大似然估计
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-16 DOI: 10.1016/j.csda.2025.108206
Seo-Young Park , Byungtae Seo
The penalized least squares and maximum likelihood methods have been successfully employed for simultaneous parameter estimation and variable selection. However, outlying observations can severely affect the quality of the estimator and selection performance. Although some robust methods for variable selection have been proposed in the literature, they often lose substantial efficiency. This is primarily attributed to the excessive dependence on choosing additional tuning parameters or modifying the original objective functions as tools to enhance robustness. In response to these challenges, we use a nonparametric Gaussian scale mixture distribution for the regression error distribution. This approach allows the error distributions in the model to achieve great flexibility and provides data-adaptive robustness. Our proposed estimator exhibits desirable theoretical properties, including sparsity and oracle properties. In the estimation process, we employ a combination of expectation-maximization and gradient-based algorithms for the parametric and nonparametric components, respectively. Through comprehensive numerical studies, encompassing simulation studies and real data analysis, we substantiate the robust performance of the proposed method.
惩罚最小二乘和极大似然方法已成功地用于同时进行参数估计和变量选择。然而,离群观测值会严重影响估计器的质量和选择性能。虽然文献中提出了一些稳健的变量选择方法,但它们往往失去了实质性的效率。这主要是由于过度依赖于选择额外的调优参数或修改原始目标函数作为增强鲁棒性的工具。为了应对这些挑战,我们使用非参数高斯尺度混合分布作为回归误差分布。这种方法使模型中的误差分布具有很大的灵活性,并提供了数据自适应的鲁棒性。我们提出的估计器展示了理想的理论特性,包括稀疏性和oracle特性。在估计过程中,我们分别对参数和非参数分量采用了期望最大化和基于梯度的组合算法。通过全面的数值研究,包括模拟研究和实际数据分析,我们证实了该方法的鲁棒性。
{"title":"Penalized maximum likelihood estimation with nonparametric Gaussian scale mixture errors","authors":"Seo-Young Park ,&nbsp;Byungtae Seo","doi":"10.1016/j.csda.2025.108206","DOIUrl":"10.1016/j.csda.2025.108206","url":null,"abstract":"<div><div>The penalized least squares and maximum likelihood methods have been successfully employed for simultaneous parameter estimation and variable selection. However, outlying observations can severely affect the quality of the estimator and selection performance. Although some robust methods for variable selection have been proposed in the literature, they often lose substantial efficiency. This is primarily attributed to the excessive dependence on choosing additional tuning parameters or modifying the original objective functions as tools to enhance robustness. In response to these challenges, we use a nonparametric Gaussian scale mixture distribution for the regression error distribution. This approach allows the error distributions in the model to achieve great flexibility and provides data-adaptive robustness. Our proposed estimator exhibits desirable theoretical properties, including sparsity and oracle properties. In the estimation process, we employ a combination of expectation-maximization and gradient-based algorithms for the parametric and nonparametric components, respectively. Through comprehensive numerical studies, encompassing simulation studies and real data analysis, we substantiate the robust performance of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108206"},"PeriodicalIF":1.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heavy-tailed matrix-variate hidden Markov models 重尾矩阵变量隐马尔可夫模型
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-12 DOI: 10.1016/j.csda.2025.108198
Salvatore D. Tomarchio
The matrix-variate framework for hidden Markov models (HMMs) is expanded with two families of models using matrix-variate t and contaminated normal distributions. These models improve the handling of tail behavior, clustering, and address challenges in identifying outlying matrices in matrix-variate data. Two Expectation-Conditional Maximization (ECM) algorithms are implemented in the R package MatrixHMM for parameter estimation. Simulations assess parameter recovery, robustness, anomaly detection, and show the advantages over alternative approaches. The models are applied to real-world data to analyze labor market dynamics across Italian provinces.
将隐马尔可夫模型(hmm)的矩阵变量框架扩展为使用矩阵变量t和污染正态分布的两类模型。这些模型改进了尾部行为、聚类的处理,并解决了在矩阵变量数据中识别离群矩阵的挑战。在R包MatrixHMM中实现了两种期望-条件最大化(ECM)算法用于参数估计。模拟评估参数恢复,鲁棒性,异常检测,并显示优于替代方法的优势。这些模型被应用于现实世界的数据,以分析意大利各省的劳动力市场动态。
{"title":"Heavy-tailed matrix-variate hidden Markov models","authors":"Salvatore D. Tomarchio","doi":"10.1016/j.csda.2025.108198","DOIUrl":"10.1016/j.csda.2025.108198","url":null,"abstract":"<div><div>The matrix-variate framework for hidden Markov models (HMMs) is expanded with two families of models using matrix-variate <em>t</em> and contaminated normal distributions. These models improve the handling of tail behavior, clustering, and address challenges in identifying outlying matrices in matrix-variate data. Two Expectation-Conditional Maximization (ECM) algorithms are implemented in the R package <strong>MatrixHMM</strong> for parameter estimation. Simulations assess parameter recovery, robustness, anomaly detection, and show the advantages over alternative approaches. The models are applied to real-world data to analyze labor market dynamics across Italian provinces.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108198"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical inference for partially shape-constrained function-on-scalar linear regression models 部分形状约束标量函数线性回归模型的统计推断
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-12 DOI: 10.1016/j.csda.2025.108200
Kyunghee Han , Yeonjoo Park , Soo-Young Kim
Functional linear regression models are widely used to link functional/longitudinal outcomes with multiple scalar predictors, identifying time-varying covariate effects through regression coefficient functions. Beyond assessing statistical significance, characterizing the shapes of coefficient functions is crucial for drawing interpretable scientific conclusions. Existing studies on shape-constrained analysis primarily focus on global shapes, which require strict prior knowledge of functional relationships across the entire domain. This often leads to misspecified regression models due to a lack of prior information, making them impractical for real-world applications. To address this, a flexible framework is introduced to identify partial shapes in regression coefficient functions. The proposed partial shape-constrained analysis enables researchers to validate functional shapes within a targeted sub-domain, avoiding the misspecification of shape constraints outside the sub-domain of interest. The method also allows for testing different sub-domains for individual covariates and multiple partial shape constraints across composite sub-domains. Our framework supports both kernel- and spline-based estimation approaches, ensuring robust performance with flexibility in computational preference. Finite-sample experiments across various scenarios demonstrate that the proposed framework significantly outperforms the application of global shape constraints to partial domains in both estimation and inference procedures. The inferential tool particularly maintains the type I error rate at the nominal significance level and exhibits increasing power with larger sample sizes, confirming the consistency of the test procedure. The practicality of partial shape-constrained inference is demonstrated through two applications: a clinical trial on NeuroBloc for type A-resistant cervical dystonia and the National Institute of Mental Health Schizophrenia Study.
功能线性回归模型被广泛用于将功能/纵向结果与多个标量预测因子联系起来,通过回归系数函数识别时变协变量效应。除了评估统计显著性之外,表征系数函数的形状对于得出可解释的科学结论至关重要。现有的形状约束分析研究主要集中在全局形状上,这需要对整个领域的功能关系有严格的先验知识。这通常会导致由于缺乏先验信息而导致错误指定的回归模型,从而使它们在实际应用中变得不切实际。为了解决这个问题,引入了一个灵活的框架来识别回归系数函数中的部分形状。提出的部分形状约束分析使研究人员能够在目标子域内验证功能形状,避免对感兴趣的子域外的形状约束的错误说明。该方法还允许测试不同子域的单个协变量和跨复合子域的多个部分形状约束。我们的框架支持基于核和基于样条的估计方法,确保了健壮的性能和灵活的计算偏好。跨各种场景的有限样本实验表明,所提出的框架在估计和推理过程中都明显优于局部域的全局形状约束应用。推理工具特别将I型错误率维持在名义显著性水平上,并随着样本量的增加而显示出越来越大的功率,从而确认了测试程序的一致性。部分形状约束推理的实用性通过两个应用得到了证明:一项针对a型抵抗性颈肌张力障碍的NeuroBloc临床试验和国家精神卫生研究所精神分裂症研究。
{"title":"Statistical inference for partially shape-constrained function-on-scalar linear regression models","authors":"Kyunghee Han ,&nbsp;Yeonjoo Park ,&nbsp;Soo-Young Kim","doi":"10.1016/j.csda.2025.108200","DOIUrl":"10.1016/j.csda.2025.108200","url":null,"abstract":"<div><div>Functional linear regression models are widely used to link functional/longitudinal outcomes with multiple scalar predictors, identifying time-varying covariate effects through regression coefficient functions. Beyond assessing statistical significance, characterizing the shapes of coefficient functions is crucial for drawing interpretable scientific conclusions. Existing studies on shape-constrained analysis primarily focus on global shapes, which require strict prior knowledge of functional relationships across the entire domain. This often leads to misspecified regression models due to a lack of prior information, making them impractical for real-world applications. To address this, a flexible framework is introduced to identify partial shapes in regression coefficient functions. The proposed partial shape-constrained analysis enables researchers to validate functional shapes within a targeted sub-domain, avoiding the misspecification of shape constraints outside the sub-domain of interest. The method also allows for testing different sub-domains for individual covariates and multiple partial shape constraints across composite sub-domains. Our framework supports both kernel- and spline-based estimation approaches, ensuring robust performance with flexibility in computational preference. Finite-sample experiments across various scenarios demonstrate that the proposed framework significantly outperforms the application of global shape constraints to partial domains in both estimation and inference procedures. The inferential tool particularly maintains the type I error rate at the nominal significance level and exhibits increasing power with larger sample sizes, confirming the consistency of the test procedure. The practicality of partial shape-constrained inference is demonstrated through two applications: a clinical trial on NeuroBloc for type A-resistant cervical dystonia and the National Institute of Mental Health Schizophrenia Study.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108200"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144083910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed variable screening for generalized linear models 广义线性模型的分布变量筛选
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-12 DOI: 10.1016/j.csda.2025.108203
Tianbo Diao , Bo Li , Lianqiang Qu , Liuquan Sun
In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.
本文提出了一种广义线性模型的分布变量筛选方法。这种方法设计用于处理样本量和协变量数量都很大的情况。具体而言,该方法通过使用稀疏性限制的代理似然估计量来选择相关协变量。它考虑了协变量的联合效应,而不仅仅是边际效应,这一特点提高了筛选结果的可靠性。我们建立了该方法的可靠筛选特性,保证了所选模型有高概率包含真实模型。通过仿真研究来评估所提出方法的有限样本性能,并通过对真实数据集的应用展示了其实用性。
{"title":"Distributed variable screening for generalized linear models","authors":"Tianbo Diao ,&nbsp;Bo Li ,&nbsp;Lianqiang Qu ,&nbsp;Liuquan Sun","doi":"10.1016/j.csda.2025.108203","DOIUrl":"10.1016/j.csda.2025.108203","url":null,"abstract":"<div><div>In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108203"},"PeriodicalIF":1.5,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantile Super Learning for independent and online settings with application to solar power forecasting 分位数超级学习独立和在线设置应用于太阳能发电预测
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-09 DOI: 10.1016/j.csda.2025.108202
Herbert Susmann , Antoine Chambaz
Estimating quantiles of an outcome conditional on covariates is of fundamental interest in statistics with broad application in probabilistic prediction and forecasting. An ensemble method for conditional quantile estimation is proposed, Quantile Super Learning, that combines predictions from multiple candidate algorithms based on their empirical performance measured with respect to a cross-validated empirical risk of the quantile loss function. Theoretical guarantees for both i.i.d. and online data scenarios are presented. The performance of this approach for quantile estimation and in forming prediction intervals is tested in simulation studies. Two case studies related to solar energy are used to illustrate Quantile Super Learning: in an i.i.d. setting, we predict the physical properties of perovskite materials for photovoltaic cells, and in an online setting we forecast ground solar irradiance based on output from dynamic weather ensemble models.
估计以协变量为条件的结果的分位数是统计学中的一个基本问题,在概率预测和预测中有着广泛的应用。提出了一种条件分位数估计的集成方法,即分位数超级学习,该方法结合了来自多个候选算法的预测,这些算法基于分位数损失函数的交叉验证的经验风险测量的经验性能。本文给出了对i.i.d和在线数据场景的理论保证。仿真研究验证了该方法在分位数估计和预测区间形成方面的性能。两个与太阳能相关的案例研究用于说明分位数超级学习:在i.i.d设置中,我们预测光伏电池的钙钛矿材料的物理性质;在在线设置中,我们根据动态天气集合模型的输出预测地面太阳辐照度。
{"title":"Quantile Super Learning for independent and online settings with application to solar power forecasting","authors":"Herbert Susmann ,&nbsp;Antoine Chambaz","doi":"10.1016/j.csda.2025.108202","DOIUrl":"10.1016/j.csda.2025.108202","url":null,"abstract":"<div><div>Estimating quantiles of an outcome conditional on covariates is of fundamental interest in statistics with broad application in probabilistic prediction and forecasting. An ensemble method for conditional quantile estimation is proposed, Quantile Super Learning, that combines predictions from multiple candidate algorithms based on their empirical performance measured with respect to a cross-validated empirical risk of the quantile loss function. Theoretical guarantees for both i.i.d. and online data scenarios are presented. The performance of <em>this</em> approach for quantile estimation and in forming prediction intervals is tested in simulation studies. Two case studies related to solar energy are used to illustrate Quantile Super Learning: in an i.i.d. setting, we predict the physical properties of perovskite materials for photovoltaic cells, and in an online setting we forecast ground solar irradiance based on output from dynamic weather ensemble models.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108202"},"PeriodicalIF":1.5,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Monotone composite quantile regression neural network for censored data with a cure fraction 带固定分数的删减数据的单调复合分位数回归神经网络
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-08 DOI: 10.1016/j.csda.2025.108201
Xinran Zhang , Xiaohui Yuan , Chunjie Wang , Xinyuan Song
The cure rate monotone composite quantile regression neural network model is investigated as an extension of the cure rate quantile model. It can uncover complex nonlinear relationships and effectively ensure the non-crossing of quantile predictions. An iterative algorithm coupled with data augmentation is developed to predict the survival time of susceptible subjects and the cure rate among all subjects. Simulation studies indicate that the proposed approach exhibits advantages in prediction over traditional statistical methods in finite samples when nonlinearity exists between response and predictors. The analysis of two real datasets further validates the utility of the proposed method.
作为固形率分位数模型的扩展,研究了固形率单调复合分位数回归神经网络模型。它可以揭示复杂的非线性关系,有效地保证分位数预测的不交叉。提出了一种结合数据增强的迭代算法来预测易感受试者的生存时间和所有受试者的治愈率。仿真研究表明,当响应和预测量之间存在非线性时,该方法在有限样本下的预测效果优于传统的统计方法。对两个真实数据集的分析进一步验证了所提方法的有效性。
{"title":"Monotone composite quantile regression neural network for censored data with a cure fraction","authors":"Xinran Zhang ,&nbsp;Xiaohui Yuan ,&nbsp;Chunjie Wang ,&nbsp;Xinyuan Song","doi":"10.1016/j.csda.2025.108201","DOIUrl":"10.1016/j.csda.2025.108201","url":null,"abstract":"<div><div>The cure rate monotone composite quantile regression neural network model is investigated as an extension of the cure rate quantile model. It can uncover complex nonlinear relationships and effectively ensure the non-crossing of quantile predictions. An iterative algorithm coupled with data augmentation is developed to predict the survival time of susceptible subjects and the cure rate among all subjects. Simulation studies indicate that the proposed approach exhibits advantages in prediction over traditional statistical methods in finite samples when nonlinearity exists between response and predictors. The analysis of two real datasets further validates the utility of the proposed method.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"211 ","pages":"Article 108201"},"PeriodicalIF":1.5,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143935576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent-class trajectory modeling with a heterogeneous mean-variance relation 基于异构均值-方差关系的潜类轨迹建模
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-05-02 DOI: 10.1016/j.csda.2025.108199
Niek G.P. Den Teuling , Francesco Ungolo , Steffen C. Pauws , Edwin R. van den Heuvel
The benefit of addressing heteroskedastic residual variances across trajectories is investigated with the purpose of finding clusters of longitudinal trajectories. Models are proposed to account for class-specific heteroskedasticity through a mean-variance relation or random residual variance, thereby accounting for trajectory-specific variance. The analyzed latent-class trajectory models are an extension of growth mixture models (GMM). The estimation bias of the model parameters and the recoverability of the number of latent classes are assessed under various data-generating models and settings by means of a simulation study. Furthermore, the empirical applicability of these models is demonstrated through the analysis of the time-varying incidence rate of COVID-19 cases across counties in the United States. Overall, the class-specific mean-variance could be reliably estimated by the proposed models in datasets comprising 250 trajectories. In addition, the extended GMM accounting for the residual random variance showed improved group trajectory estimation over the standard GMM.
为了找到纵向轨迹的簇,研究了处理跨轨迹的异方差残差的好处。提出了通过均值-方差关系或随机残差方差来解释类别特异性异方差的模型,从而解释了轨迹特异性方差。所分析的潜级轨迹模型是生长混合模型(GMM)的扩展。通过模拟研究,评估了模型参数的估计偏差和潜在类别数量在不同数据生成模型和设置下的可恢复性。此外,通过对美国各县COVID-19病例时变发病率的分析,证明了这些模型的实证适用性。总体而言,在包含250个轨迹的数据集中,所提出的模型可以可靠地估计特定类别的平均方差。此外,考虑残差随机方差的扩展GMM比标准GMM具有更好的群体轨迹估计。
{"title":"Latent-class trajectory modeling with a heterogeneous mean-variance relation","authors":"Niek G.P. Den Teuling ,&nbsp;Francesco Ungolo ,&nbsp;Steffen C. Pauws ,&nbsp;Edwin R. van den Heuvel","doi":"10.1016/j.csda.2025.108199","DOIUrl":"10.1016/j.csda.2025.108199","url":null,"abstract":"<div><div>The benefit of addressing heteroskedastic residual variances across trajectories is investigated with the purpose of finding clusters of longitudinal trajectories. Models are proposed to account for class-specific heteroskedasticity through a mean-variance relation or random residual variance, thereby accounting for trajectory-specific variance. The analyzed latent-class trajectory models are an extension of growth mixture models (GMM). The estimation bias of the model parameters and the recoverability of the number of latent classes are assessed under various data-generating models and settings by means of a simulation study. Furthermore, the empirical applicability of these models is demonstrated through the analysis of the time-varying incidence rate of COVID-19 cases across counties in the United States. Overall, the class-specific mean-variance could be reliably estimated by the proposed models in datasets comprising 250 trajectories. In addition, the extended GMM accounting for the residual random variance showed improved group trajectory estimation over the standard GMM.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"210 ","pages":"Article 108199"},"PeriodicalIF":1.5,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143904339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1