IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-04-07 DOI: 10.1002/sta4.673

Yaxian Chen, Kwok Fai Lam, Zhonghua Liu

SummaryFeature screening is an important tool in analysing ultrahigh‐dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact on the response variable. Detecting a sparse set of variables that have a nonlinear or nonmonotonic relationship with the response variable is still a challenging task. To fill the gap, this paper proposed a robust model‐free screening approach for right‐censored survival data by providing a new perspective of quantifying the covariate effect on the restricted mean survival time, rather than the routinely used hazard function. The proposed measure, based on the difference between the restricted mean survival time of covariate‐stratified and overall data, is able to identify comprehensive types of associations including linear, nonlinear, nonmonotone and even local dependencies like change points. The sure screening property is established, and a more flexible iterative screening procedure is developed to increase the accuracy of the variable screening. Simulation studies are carried out to demonstrate the superiority of the proposed method in selecting important features with a complex association with the response variable. The potential of applying the proposed method to handle interval‐censored failure time data has also been explored in simulations, and the results have been promising. The method is applied to a breast cancer dataset to identify potential prognostic factors, which reveals potential associations between breast cancer and lymphoma.

摘要特征筛选是分析超高维数据的重要工具，尤其是在分子生物学和肿瘤学研究领域。然而，大多数注意力都集中在识别对响应变量有线性或单调影响的特征上。检测与响应变量具有非线性或非单调关系的稀疏变量集仍然是一项具有挑战性的任务。为了填补这一空白，本文提出了一种针对右删失生存数据的稳健无模型筛选方法，提供了一个量化协变量对受限平均生存时间影响的新视角，而不是常规使用的危险函数。所提出的测量方法基于协变量分层的受限平均生存时间与整体数据之间的差异，能够识别包括线性、非线性、非单调甚至局部依赖性（如变化点）在内的各种类型的关联。建立了确定的筛选属性，并开发了更灵活的迭代筛选程序，以提高变量筛选的准确性。通过模拟研究，证明了所提方法在筛选与响应变量有复杂关联的重要特征方面的优越性。模拟研究还探讨了应用所提方法处理间隔删失失效时间数据的潜力，结果令人鼓舞。该方法被应用于乳腺癌数据集，以确定潜在的预后因素，从而揭示乳腺癌和淋巴瘤之间的潜在关联。

{"title":"High‐dimensional feature screening for nonlinear associations with survival outcome using restricted mean survival time","authors":"Yaxian Chen, Kwok Fai Lam, Zhonghua Liu","doi":"10.1002/sta4.673","DOIUrl":"https://doi.org/10.1002/sta4.673","url":null,"abstract":"SummaryFeature screening is an important tool in analysing ultrahigh‐dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact on the response variable. Detecting a sparse set of variables that have a nonlinear or nonmonotonic relationship with the response variable is still a challenging task. To fill the gap, this paper proposed a robust model‐free screening approach for right‐censored survival data by providing a new perspective of quantifying the covariate effect on the restricted mean survival time, rather than the routinely used hazard function. The proposed measure, based on the difference between the restricted mean survival time of covariate‐stratified and overall data, is able to identify comprehensive types of associations including linear, nonlinear, nonmonotone and even local dependencies like change points. The sure screening property is established, and a more flexible iterative screening procedure is developed to increase the accuracy of the variable screening. Simulation studies are carried out to demonstrate the superiority of the proposed method in selecting important features with a complex association with the response variable. The potential of applying the proposed method to handle interval‐censored failure time data has also been explored in simulations, and the results have been promising. The method is applied to a breast cancer dataset to identify potential prognostic factors, which reveals potential associations between breast cancer and lymphoma.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"39 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Double verification for two‐sample covariance matrices test 双样本协方差矩阵检验的双重验证

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-04-07 DOI: 10.1002/sta4.670

Wenming Sun, Lingfeng Lyu, Xiao Guo

This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.

本文探讨在高维环境下测试两个协方差矩阵的相等性。现有的测试统计量通常基于弗罗贝尼斯平方准则或元素最大准则构建。然而，前者在处理稀疏替代方案时可能会出现功率损失，而后者在处理密集替代方案时可能会表现不佳。在本文中，我们采用了一种新颖的框架，引入了一种双重验证检验统计量，旨在对密集和稀疏替代方案都具有强大的检验能力。此外，我们还提出了一种自适应权重测试统计量，以增强其威力。此外，我们还分析了所提检验的渐近规模和功率。仿真结果表明，我们提出的方法性能令人满意。

引用次数: 0

STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model STAR：利用 "易受攻击者--攻击者--被攻击者--被移除者 "模型在图结构上传播创新成果

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-04-05 DOI: 10.1002/sta4.671

Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel

Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.

产品、服务或理念等新创新的采用通常是由点对点的社会互动和外部影响共同推动的。社交图谱通常用于有效地模拟点对点互动，即新采用者影响其同伴也采用创新。然而，采用创新的影响也可能通过与采用者关系密切的个体传播，这些个体被称为 "吹捧者"，他们只分享有关创新的信息。我们扩展了一个非均质泊松过程模型，将外部影响和同伴间影响都考虑在内，并加入了一个可选的 "吹捧 "阶段，我们将这一扩展称为 "易受影响者--吹捧者--被吹捧者"（STAR）模型。在一项广泛的模拟研究中，我们发现所提出的模型是稳定的、可识别的，并能在出现 "吐槽 "的情况下准确识别 "吐槽"。此外，通过模拟，我们还证明了 STAR 模型的推理和预测对社交图中的缺失边（这是真实世界数据中常见的情况）具有很强的鲁棒性。模拟和理论分析表明，当边缘缺失时，STAR 模型能够准确估计外部和内部影响源所占的份额。此外，STAR 模型还可用于改进外部参数和病毒参数的推断以及后续预测，即使 "告密 "并不是真实数据生成机制的一部分。

{"title":"STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model","authors":"Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel","doi":"10.1002/sta4.671","DOIUrl":"https://doi.org/10.1002/sta4.671","url":null,"abstract":"Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"31 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning 用于超高维分散联合学习的网络交替方向乘法

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-04-05 DOI: 10.1002/sta4.669

Wei Dong, Sanying Feng

Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.

近年来，超高维数据分析取得了巨大成就。当数据存储在多个客户端，而客户端之间只能通过网络结构进行连接时，超高维分析的实现在数值上可能具有挑战性，甚至是不可行的。在这项工作中，我们研究了用于超高维数据分析的分散式联合学习，即通过大量设备估算相关参数，而无需通过网络结构共享数据。在本地机器中，每个并行运行梯度上升，通过稀疏性限制约束方法获得估计值。此外，我们还通过交替方向乘法（ADMM）聚合每台机器的信息，利用不同机器间的凹对融合惩罚，通过网络结构获得全局模型。所提出的方法可以降低传统机器学习的隐私风险，恢复稀疏性，并同时提供所有回归系数的估计值。在温和的条件下，我们展示了我们方法的收敛性和估计一致性。模拟和真实数据实例都证明了该方法的良好性能。

{"title":"Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning","authors":"Wei Dong, Sanying Feng","doi":"10.1002/sta4.669","DOIUrl":"https://doi.org/10.1002/sta4.669","url":null,"abstract":"Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"9 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing the effectiveness of k$$ k $$‐different treatments through the area under the ROC curve 通过 ROC 曲线下面积比较 k$$ k$$ 不同疗法的疗效

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-04-05 DOI: 10.1002/sta4.672

Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette

The area under the receiver‐operating characteristic curve (AUC) has become a popular index not only for measuring the overall prediction capacity of a marker but also the strength of the association between continuous and binary variables. In the current considered study, the AUC was used for comparing the association size of four different interventions involving impulsive decision making, studied through an animal model, in which each animal provides several negative (pretreatment) and positive (posttreatment) measures. The problem of the full comparison of the average AUCs arises therefore in a natural way. We construct an analysis of variance (ANOVA) type test for testing the equality of the impact of these treatments measured through the respective AUCs and considering the random‐effect represented by the animal. The use (and development) of a post hoc Tukey's HSD‐type test is also considered. We explore the finite‐sample behaviour of our proposal via Monte Carlo simulations, and analyse the data generated from the original problem. An R package implementing the procedures is provided in the supporting information.

受体运行特征曲线下面积（AUC）已成为一种流行的指标，不仅可用于衡量标记物的整体预测能力，还可用于衡量连续变量和二元变量之间的关联强度。在目前的研究中，AUC 被用于比较四种不同干预措施的关联大小，这些干预措施涉及冲动决策，通过动物模型进行研究，其中每只动物都提供了几种消极（治疗前）和积极（治疗后）的测量指标。因此，全面比较平均 AUC 的问题自然而然就出现了。我们构建了一个方差分析（ANOVA）类型的测试，通过各自的 AUCs 值来测试这些治疗方法的影响是否相同，并考虑到动物所代表的随机效应。我们还考虑了事后 Tukey's HSD 类型检验的使用（和开发）。我们通过蒙特卡罗模拟探索了我们建议的有限样本行为，并分析了原始问题产生的数据。辅助信息中提供了一个实现这些程序的 R 软件包。

{"title":"Comparing the effectiveness of k$$ k $$‐different treatments through the area under the ROC curve","authors":"Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette","doi":"10.1002/sta4.672","DOIUrl":"https://doi.org/10.1002/sta4.672","url":null,"abstract":"The area under the receiver‐operating characteristic curve (AUC) has become a popular index not only for measuring the overall prediction capacity of a marker but also the strength of the association between continuous and binary variables. In the current considered study, the AUC was used for comparing the association size of four different interventions involving impulsive decision making, studied through an animal model, in which each animal provides several negative (pretreatment) and positive (posttreatment) measures. The problem of the full comparison of the average AUCs arises therefore in a natural way. We construct an analysis of variance (ANOVA) type test for testing the equality of the impact of these treatments measured through the respective AUCs and considering the random‐effect represented by the animal. The use (and development) of a post hoc Tukey's HSD‐type test is also considered. We explore the finite‐sample behaviour of our proposal via Monte Carlo simulations, and analyse the data generated from the original problem. An R package implementing the procedures is provided in the supporting information.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"119 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the construction of nested orthogonal arrays with the adjacent numbers of levels 关于构建层级数相邻的嵌套正交数组

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-03-25 DOI: 10.1002/sta4.666

Shanqi Pang, Yan Zhu

Nested orthogonal arrays (NOAs) provide an option for designing an experimental setup consisting of two experiments, with the expensive higher‐precision experiment nested within a larger and relatively inexpensive lower‐precision experiment. Construction of NOAs with the adjacent numbers of levels is a challenging problem. In this paper, we present several methods for constructing such NOAs and obtain some classes of such new symmetric NOAs in which the larger arrays have minimum run size. These methods are also extended to construction of NOAs with more than two layers. Furthermore, by adding some columns to these symmetric NOAs, we can construct a lot of new asymmetric NOAs. Illustrative examples are given.

嵌套正交阵列（NOAs）为设计由两个实验组成的实验装置提供了一种选择，即把昂贵的高精度实验嵌套在一个更大且相对便宜的低精度实验中。构建具有相邻级数的 NOAs 是一个具有挑战性的问题。在本文中，我们提出了几种构建此类无损检测器的方法，并获得了几类此类新的对称无损检测器，其中较大的阵列具有最小运行规模。这些方法还可扩展到构建两层以上的 NOA。此外，通过在这些对称无损检测器中添加一些列，我们可以构造出许多新的非对称无损检测器。本文给出了一些示例。

引用次数: 0

Beta regression for double‐bounded response with correlated high‐dimensional covariates 具有相关高维协变量的双界响应的贝塔回归

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-03-12 DOI: 10.1002/sta4.663

Jianxuan Liu

Continuous responses measured on a standard unit interval are ubiquitous in many scientific disciplines. Statistical models built upon a normal error structure do not generally work because they can produce biassed estimates or result in predictions outside either bound. In real‐life applications, data are often high‐dimensional, correlated and consist of a mixture of various data types. Little literature is available to address the unique data challenge. We propose a semiparametric approach to analyse the association between a double‐bounded response and high‐dimensional correlated covariates of mixed types. The proposed method makes full use of all available data through one or several linear combinations of the covariates without losing information from the data. The only assumption we make is that the response variable follows a Beta distribution; no additional assumption is required. The resulting estimators are consistent and efficient. We illustrate the proposed method in simulation studies and demonstrate it in a real‐life data application. The semiparametric approach contributes to the sufficient dimension reduction literature for its novelty in investigating double‐bounded response which is absent in the current literature. This work also provides a new tool for data practitioners to analyse the association between a popular unit interval response and mixed types of high‐dimensional correlated covariates.

在许多科学学科中，以标准单位间隔测量的连续反应无处不在。建立在正态误差结构基础上的统计模型一般不会奏效，因为它们会产生有偏差的估计值，或者导致预测结果超出任一界限。在实际应用中，数据往往是高维的、相关的，并由各种类型的数据混合而成。目前几乎没有文献可以解决这一独特的数据难题。我们提出了一种半参数方法，用于分析双重约束响应与混合类型的高维相关协变量之间的关联。所提出的方法通过一个或多个协变量的线性组合充分利用了所有可用数据，而不会丢失数据信息。我们唯一的假设是响应变量服从 Beta 分布，不需要其他假设。由此得到的估计值具有一致性和高效性。我们在模拟研究中说明了所提出的方法，并在实际数据应用中进行了演示。半参数方法在研究双界响应方面的新颖性为充分降维文献做出了贡献，这在目前的文献中是没有的。这项工作还为数据从业人员分析流行的单位间隔响应与混合类型的高维相关协变量之间的关联提供了一种新工具。

{"title":"Beta regression for double‐bounded response with correlated high‐dimensional covariates","authors":"Jianxuan Liu","doi":"10.1002/sta4.663","DOIUrl":"https://doi.org/10.1002/sta4.663","url":null,"abstract":"Continuous responses measured on a standard unit interval are ubiquitous in many scientific disciplines. Statistical models built upon a normal error structure do not generally work because they can produce biassed estimates or result in predictions outside either bound. In real‐life applications, data are often high‐dimensional, correlated and consist of a mixture of various data types. Little literature is available to address the unique data challenge. We propose a semiparametric approach to analyse the association between a double‐bounded response and high‐dimensional correlated covariates of mixed types. The proposed method makes full use of all available data through one or several linear combinations of the covariates without losing information from the data. The only assumption we make is that the response variable follows a Beta distribution; no additional assumption is required. The resulting estimators are consistent and efficient. We illustrate the proposed method in simulation studies and demonstrate it in a real‐life data application. The semiparametric approach contributes to the sufficient dimension reduction literature for its novelty in investigating double‐bounded response which is absent in the current literature. This work also provides a new tool for data practitioners to analyse the association between a popular unit interval response and mixed types of high‐dimensional correlated covariates.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualisation and outlier detection for probability density function ensembles 概率密度函数集合的可视化和离群点检测

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-03-11 DOI: 10.1002/sta4.662

Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer

Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.

函数数据的探索性数据分析（EDA）--观测值是整个函数的数据对象--是一个难题，最近的文献对此给予了极大关注。函数数据无处不在，在气象学、生物学、医学和工程学等领域的应用中十分普遍，因此人们对函数数据的兴趣大增。经验概率密度函数（PDF）可视为受约束的函数数据对象，必须积分为一且为非负。它们出现在年收入分布、海洋学中浮游动物的大小结构和大脑的连接模式等方面。虽然 PDF 数据在现代研究中很常见，但很少有人关注专门针对 PDF 的 EDA。在本文中，我们扩展了几种针对 PDF 函数数据的 EDA 方法，并在模拟数据上对这些方法进行了比较，模拟数据表现出不同类型的变化，旨在模拟真实世界中的应用。然后，我们使用新方法对地下断裂网络气体输送模拟中观察到的突破曲线进行 EDA。

{"title":"Visualisation and outlier detection for probability density function ensembles","authors":"Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer","doi":"10.1002/sta4.662","DOIUrl":"https://doi.org/10.1002/sta4.662","url":null,"abstract":"Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"15 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal designs for crossover model with partial interactions 具有部分交互作用的交叉模型的优化设计

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-03-07 DOI: 10.1002/sta4.668

Futao Zhang, Pierre Druilhet, Xiangshun Kong

This paper studies the universally optimal designs for estimating total effects under crossover models with partial interactions. We provide necessary and sufficient conditions for a symmetric design to be universally optimal, based on which algorithms can be used to derive optimal symmetric designs under any form of the within-block covariance matrix. To cope with the computational complexity of algorithms when the experimental scale is too large, we provide the analytical form of optimal designs under the type-H covariance matrix. We find that for a fixed number of treatments, say <mjx-container aria-label="t" ctxtmenu_counter="0" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic- data-semantic-role="latinletter" data-semantic-speech="t" data-semantic-type="identifier"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden="true" display="inline" unselectable="on"><math altimg="/cms/asset/c3669d78-641d-4172-958e-37ddc1934825/sta4668-math-0001.png" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi data-semantic-="" data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic-role="latinletter" data-semantic-speech="t" data-semantic-type="identifier">t</mi></mrow>$$ t $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, the number of distinct treatments appearing in the support sequences increases with the increase of the number of periods, <mjx-container aria-label="k" ctxtmenu_counter="1" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic- data-semantic-role="latinletter" data-semantic-speech="k" data-semantic-type="identifier"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden="true" display="inline" unselectable="on"><math altimg="/cms/asset/c09a2ac1-1512-49b7-8baa-c3acf0ec7390/sta4668-math-0002.png" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi data-semantic-="" data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic-role="latinletter" data-semantic-speech="k" data-semantic-type="identifier">k</mi></mrow>$$ k $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, until <mjx-container aria-label="k greater than or equals t squared" ctxtmenu_counter="2" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow data-semantic-children="0,4" data-semantic-content="1" data-semanti

本文研究了在具有部分交互作用的交叉模型下估计总效应的普遍最优设计。我们提供了对称设计成为普遍最优设计的必要条件和充分条件，在此基础上，可以使用算法推导出任何形式的块内协方差矩阵下的最优对称设计。为了应对实验规模过大时算法的计算复杂性，我们提供了 H 型协方差矩阵下最优设计的解析形式。我们发现，对于固定数量的处理（例如 t$$ t $$），支持序列中出现的不同处理的数量会随着周期数 k$$ k $$的增加而增加，直到 k≥t2$$ kge {t}^2 $$，在这种情况下，所有 t$$ t $$的处理都会出现。最佳设计最多可由两个代表性序列构成，其中每个处理在重复次数相等或几乎相等的连续时期内出现。

{"title":"Optimal designs for crossover model with partial interactions","authors":"Futao Zhang, Pierre Druilhet, Xiangshun Kong","doi":"10.1002/sta4.668","DOIUrl":"https://doi.org/10.1002/sta4.668","url":null,"abstract":"This paper studies the universally optimal designs for estimating total effects under crossover models with partial interactions. We provide necessary and sufficient conditions for a symmetric design to be universally optimal, based on which algorithms can be used to derive optimal symmetric designs under any form of the within-block covariance matrix. To cope with the computational complexity of algorithms when the experimental scale is too large, we provide the analytical form of optimal designs under the type-H covariance matrix. We find that for a fixed number of treatments, say <mjx-container aria-label=\"t\" ctxtmenu_counter=\"0\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"><math altimg=\"/cms/asset/c3669d78-641d-4172-958e-37ddc1934825/sta4668-math-0001.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\">t</mi></mrow>$$ t $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, the number of distinct treatments appearing in the support sequences increases with the increase of the number of periods, <mjx-container aria-label=\"k\" ctxtmenu_counter=\"1\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"><math altimg=\"/cms/asset/c09a2ac1-1512-49b7-8baa-c3acf0ec7390/sta4668-math-0002.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\">k</mi></mrow>$$ k $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, until <mjx-container aria-label=\"k greater than or equals t squared\" ctxtmenu_counter=\"2\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow data-semantic-children=\"0,4\" data-semantic-content=\"1\" data-semanti","PeriodicalId":56159,"journal":{"name":"Stat","volume":"134 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph-based mutually exciting point processes for modelling event times in docked bike-sharing systems 基于图的互激点过程，用于模拟有桩共享单车系统中的事件时间

IF 1.7 4区数学 Q3 STATISTICS & PROBABILITY

Stat

Pub Date : 2024-03-07 DOI: 10.1002/sta4.660

Francesco Sanna Passino, Yining Che, Carlos Cardoso Correia Perello

This paper introduces graph-based mutually exciting processes (GB-MEP) to model event times in network point processes, focusing on an application to docked bike-sharing systems. GB-MEP incorporates known relationships between nodes in a graph within the intensity function of a node-based multivariate Hawkes process. This approach reduces the number of parameters to a quantity proportional to the number of nodes in the network, resulting in significant advantages for computational scalability when compared with traditional methods. The model is applied on event data observed on the Santander Cycles network in central London, demonstrating that exploiting network-wide information related to geographical location of the stations is beneficial to improve the performance of node-based models for applications in bike-sharing systems. The proposed GB-MEP framework is more generally applicable to any network point process where a distance function between nodes is available, demonstrating wider applicability.

本文介绍了基于图的互激过程（GB-MEP），以模拟网络点过程中的事件时间，重点关注有桩共享单车系统的应用。GB-MEP 将图中节点之间的已知关系纳入基于节点的多变量霍克斯过程的强度函数中。这种方法将参数数量减少到与网络中节点数量成正比，与传统方法相比，在计算可扩展性方面具有显著优势。该模型应用于在伦敦市中心桑坦德自行车网络上观察到的事件数据，证明利用与站点地理位置相关的全网信息有利于提高基于节点的模型在共享单车系统中的应用性能。所提出的 GB-MEP 框架更普遍地适用于节点间存在距离函数的任何网络点过程，从而证明了其更广泛的适用性。

引用次数: 0

Stat最新文献