首页 > 最新文献

Stat最新文献

英文 中文
Double verification for two‐sample covariance matrices test 双样本协方差矩阵检验的双重验证
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-07 DOI: 10.1002/sta4.670
Wenming Sun, Lingfeng Lyu, Xiao Guo
This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.
本文探讨在高维环境下测试两个协方差矩阵的相等性。现有的测试统计量通常基于弗罗贝尼斯平方准则或元素最大准则构建。然而,前者在处理稀疏替代方案时可能会出现功率损失,而后者在处理密集替代方案时可能会表现不佳。在本文中,我们采用了一种新颖的框架,引入了一种双重验证检验统计量,旨在对密集和稀疏替代方案都具有强大的检验能力。此外,我们还提出了一种自适应权重测试统计量,以增强其威力。此外,我们还分析了所提检验的渐近规模和功率。仿真结果表明,我们提出的方法性能令人满意。
{"title":"Double verification for two‐sample covariance matrices test","authors":"Wenming Sun, Lingfeng Lyu, Xiao Guo","doi":"10.1002/sta4.670","DOIUrl":"https://doi.org/10.1002/sta4.670","url":null,"abstract":"This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"4 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model STAR:利用 "易受攻击者--攻击者--被攻击者--被移除者 "模型在图结构上传播创新成果
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-05 DOI: 10.1002/sta4.671
Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel
Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.
产品、服务或理念等新创新的采用通常是由点对点的社会互动和外部影响共同推动的。社交图谱通常用于有效地模拟点对点互动,即新采用者影响其同伴也采用创新。然而,采用创新的影响也可能通过与采用者关系密切的个体传播,这些个体被称为 "吹捧者",他们只分享有关创新的信息。我们扩展了一个非均质泊松过程模型,将外部影响和同伴间影响都考虑在内,并加入了一个可选的 "吹捧 "阶段,我们将这一扩展称为 "易受影响者--吹捧者--被吹捧者"(STAR)模型。在一项广泛的模拟研究中,我们发现所提出的模型是稳定的、可识别的,并能在出现 "吐槽 "的情况下准确识别 "吐槽"。此外,通过模拟,我们还证明了 STAR 模型的推理和预测对社交图中的缺失边(这是真实世界数据中常见的情况)具有很强的鲁棒性。模拟和理论分析表明,当边缘缺失时,STAR 模型能够准确估计外部和内部影响源所占的份额。此外,STAR 模型还可用于改进外部参数和病毒参数的推断以及后续预测,即使 "告密 "并不是真实数据生成机制的一部分。
{"title":"STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model","authors":"Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel","doi":"10.1002/sta4.671","DOIUrl":"https://doi.org/10.1002/sta4.671","url":null,"abstract":"Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"31 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning 用于超高维分散联合学习的网络交替方向乘法
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-05 DOI: 10.1002/sta4.669
Wei Dong, Sanying Feng
Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.
近年来,超高维数据分析取得了巨大成就。当数据存储在多个客户端,而客户端之间只能通过网络结构进行连接时,超高维分析的实现在数值上可能具有挑战性,甚至是不可行的。在这项工作中,我们研究了用于超高维数据分析的分散式联合学习,即通过大量设备估算相关参数,而无需通过网络结构共享数据。在本地机器中,每个并行运行梯度上升,通过稀疏性限制约束方法获得估计值。此外,我们还通过交替方向乘法(ADMM)聚合每台机器的信息,利用不同机器间的凹对融合惩罚,通过网络结构获得全局模型。所提出的方法可以降低传统机器学习的隐私风险,恢复稀疏性,并同时提供所有回归系数的估计值。在温和的条件下,我们展示了我们方法的收敛性和估计一致性。模拟和真实数据实例都证明了该方法的良好性能。
{"title":"Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning","authors":"Wei Dong, Sanying Feng","doi":"10.1002/sta4.669","DOIUrl":"https://doi.org/10.1002/sta4.669","url":null,"abstract":"Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"9 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing the effectiveness of k$$ k $$‐different treatments through the area under the ROC curve 通过 ROC 曲线下面积比较 k$$ k$$ 不同疗法的疗效
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-04-05 DOI: 10.1002/sta4.672
Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette
The area under the receiver‐operating characteristic curve (AUC) has become a popular index not only for measuring the overall prediction capacity of a marker but also the strength of the association between continuous and binary variables. In the current considered study, the AUC was used for comparing the association size of four different interventions involving impulsive decision making, studied through an animal model, in which each animal provides several negative (pretreatment) and positive (posttreatment) measures. The problem of the full comparison of the average AUCs arises therefore in a natural way. We construct an analysis of variance (ANOVA) type test for testing the equality of the impact of these treatments measured through the respective AUCs and considering the random‐effect represented by the animal. The use (and development) of a post hoc Tukey's HSD‐type test is also considered. We explore the finite‐sample behaviour of our proposal via Monte Carlo simulations, and analyse the data generated from the original problem. An R package implementing the procedures is provided in the supporting information.
受体运行特征曲线下面积(AUC)已成为一种流行的指标,不仅可用于衡量标记物的整体预测能力,还可用于衡量连续变量和二元变量之间的关联强度。在目前的研究中,AUC 被用于比较四种不同干预措施的关联大小,这些干预措施涉及冲动决策,通过动物模型进行研究,其中每只动物都提供了几种消极(治疗前)和积极(治疗后)的测量指标。因此,全面比较平均 AUC 的问题自然而然就出现了。我们构建了一个方差分析(ANOVA)类型的测试,通过各自的 AUCs 值来测试这些治疗方法的影响是否相同,并考虑到动物所代表的随机效应。我们还考虑了事后 Tukey's HSD 类型检验的使用(和开发)。我们通过蒙特卡罗模拟探索了我们建议的有限样本行为,并分析了原始问题产生的数据。辅助信息中提供了一个实现这些程序的 R 软件包。
{"title":"Comparing the effectiveness of k$$ k $$‐different treatments through the area under the ROC curve","authors":"Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette","doi":"10.1002/sta4.672","DOIUrl":"https://doi.org/10.1002/sta4.672","url":null,"abstract":"The area under the receiver‐operating characteristic curve (AUC) has become a popular index not only for measuring the overall prediction capacity of a marker but also the strength of the association between continuous and binary variables. In the current considered study, the AUC was used for comparing the association size of four different interventions involving impulsive decision making, studied through an animal model, in which each animal provides several negative (pretreatment) and positive (posttreatment) measures. The problem of the full comparison of the average AUCs arises therefore in a natural way. We construct an analysis of variance (ANOVA) type test for testing the equality of the impact of these treatments measured through the respective AUCs and considering the random‐effect represented by the animal. The use (and development) of a post hoc Tukey's HSD‐type test is also considered. We explore the finite‐sample behaviour of our proposal via Monte Carlo simulations, and analyse the data generated from the original problem. An R package implementing the procedures is provided in the supporting information.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"119 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the construction of nested orthogonal arrays with the adjacent numbers of levels 关于构建层级数相邻的嵌套正交数组
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-25 DOI: 10.1002/sta4.666
Shanqi Pang, Yan Zhu
Nested orthogonal arrays (NOAs) provide an option for designing an experimental setup consisting of two experiments, with the expensive higher‐precision experiment nested within a larger and relatively inexpensive lower‐precision experiment. Construction of NOAs with the adjacent numbers of levels is a challenging problem. In this paper, we present several methods for constructing such NOAs and obtain some classes of such new symmetric NOAs in which the larger arrays have minimum run size. These methods are also extended to construction of NOAs with more than two layers. Furthermore, by adding some columns to these symmetric NOAs, we can construct a lot of new asymmetric NOAs. Illustrative examples are given.
嵌套正交阵列(NOAs)为设计由两个实验组成的实验装置提供了一种选择,即把昂贵的高精度实验嵌套在一个更大且相对便宜的低精度实验中。构建具有相邻级数的 NOAs 是一个具有挑战性的问题。在本文中,我们提出了几种构建此类无损检测器的方法,并获得了几类此类新的对称无损检测器,其中较大的阵列具有最小运行规模。这些方法还可扩展到构建两层以上的 NOA。此外,通过在这些对称无损检测器中添加一些列,我们可以构造出许多新的非对称无损检测器。本文给出了一些示例。
{"title":"On the construction of nested orthogonal arrays with the adjacent numbers of levels","authors":"Shanqi Pang, Yan Zhu","doi":"10.1002/sta4.666","DOIUrl":"https://doi.org/10.1002/sta4.666","url":null,"abstract":"Nested orthogonal arrays (NOAs) provide an option for designing an experimental setup consisting of two experiments, with the expensive higher‐precision experiment nested within a larger and relatively inexpensive lower‐precision experiment. Construction of NOAs with the adjacent numbers of levels is a challenging problem. In this paper, we present several methods for constructing such NOAs and obtain some classes of such new symmetric NOAs in which the larger arrays have minimum run size. These methods are also extended to construction of NOAs with more than two layers. Furthermore, by adding some columns to these symmetric NOAs, we can construct a lot of new asymmetric NOAs. Illustrative examples are given.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"381 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140302361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beta regression for double‐bounded response with correlated high‐dimensional covariates 具有相关高维协变量的双界响应的贝塔回归
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-12 DOI: 10.1002/sta4.663
Jianxuan Liu
Continuous responses measured on a standard unit interval are ubiquitous in many scientific disciplines. Statistical models built upon a normal error structure do not generally work because they can produce biassed estimates or result in predictions outside either bound. In real‐life applications, data are often high‐dimensional, correlated and consist of a mixture of various data types. Little literature is available to address the unique data challenge. We propose a semiparametric approach to analyse the association between a double‐bounded response and high‐dimensional correlated covariates of mixed types. The proposed method makes full use of all available data through one or several linear combinations of the covariates without losing information from the data. The only assumption we make is that the response variable follows a Beta distribution; no additional assumption is required. The resulting estimators are consistent and efficient. We illustrate the proposed method in simulation studies and demonstrate it in a real‐life data application. The semiparametric approach contributes to the sufficient dimension reduction literature for its novelty in investigating double‐bounded response which is absent in the current literature. This work also provides a new tool for data practitioners to analyse the association between a popular unit interval response and mixed types of high‐dimensional correlated covariates.
在许多科学学科中,以标准单位间隔测量的连续反应无处不在。建立在正态误差结构基础上的统计模型一般不会奏效,因为它们会产生有偏差的估计值,或者导致预测结果超出任一界限。在实际应用中,数据往往是高维的、相关的,并由各种类型的数据混合而成。目前几乎没有文献可以解决这一独特的数据难题。我们提出了一种半参数方法,用于分析双重约束响应与混合类型的高维相关协变量之间的关联。所提出的方法通过一个或多个协变量的线性组合充分利用了所有可用数据,而不会丢失数据信息。我们唯一的假设是响应变量服从 Beta 分布,不需要其他假设。由此得到的估计值具有一致性和高效性。我们在模拟研究中说明了所提出的方法,并在实际数据应用中进行了演示。半参数方法在研究双界响应方面的新颖性为充分降维文献做出了贡献,这在目前的文献中是没有的。这项工作还为数据从业人员分析流行的单位间隔响应与混合类型的高维相关协变量之间的关联提供了一种新工具。
{"title":"Beta regression for double‐bounded response with correlated high‐dimensional covariates","authors":"Jianxuan Liu","doi":"10.1002/sta4.663","DOIUrl":"https://doi.org/10.1002/sta4.663","url":null,"abstract":"Continuous responses measured on a standard unit interval are ubiquitous in many scientific disciplines. Statistical models built upon a normal error structure do not generally work because they can produce biassed estimates or result in predictions outside either bound. In real‐life applications, data are often high‐dimensional, correlated and consist of a mixture of various data types. Little literature is available to address the unique data challenge. We propose a semiparametric approach to analyse the association between a double‐bounded response and high‐dimensional correlated covariates of mixed types. The proposed method makes full use of all available data through one or several linear combinations of the covariates without losing information from the data. The only assumption we make is that the response variable follows a Beta distribution; no additional assumption is required. The resulting estimators are consistent and efficient. We illustrate the proposed method in simulation studies and demonstrate it in a real‐life data application. The semiparametric approach contributes to the sufficient dimension reduction literature for its novelty in investigating double‐bounded response which is absent in the current literature. This work also provides a new tool for data practitioners to analyse the association between a popular unit interval response and mixed types of high‐dimensional correlated covariates.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visualisation and outlier detection for probability density function ensembles 概率密度函数集合的可视化和离群点检测
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-11 DOI: 10.1002/sta4.662
Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer
Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.
函数数据的探索性数据分析(EDA)--观测值是整个函数的数据对象--是一个难题,最近的文献对此给予了极大关注。函数数据无处不在,在气象学、生物学、医学和工程学等领域的应用中十分普遍,因此人们对函数数据的兴趣大增。经验概率密度函数(PDF)可视为受约束的函数数据对象,必须积分为一且为非负。它们出现在年收入分布、海洋学中浮游动物的大小结构和大脑的连接模式等方面。虽然 PDF 数据在现代研究中很常见,但很少有人关注专门针对 PDF 的 EDA。在本文中,我们扩展了几种针对 PDF 函数数据的 EDA 方法,并在模拟数据上对这些方法进行了比较,模拟数据表现出不同类型的变化,旨在模拟真实世界中的应用。然后,我们使用新方法对地下断裂网络气体输送模拟中观察到的突破曲线进行 EDA。
{"title":"Visualisation and outlier detection for probability density function ensembles","authors":"Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer","doi":"10.1002/sta4.662","DOIUrl":"https://doi.org/10.1002/sta4.662","url":null,"abstract":"Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"15 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal designs for crossover model with partial interactions 具有部分交互作用的交叉模型的优化设计
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-07 DOI: 10.1002/sta4.668
Futao Zhang, Pierre Druilhet, Xiangshun Kong
This paper studies the universally optimal designs for estimating total effects under crossover models with partial interactions. We provide necessary and sufficient conditions for a symmetric design to be universally optimal, based on which algorithms can be used to derive optimal symmetric designs under any form of the within-block covariance matrix. To cope with the computational complexity of algorithms when the experimental scale is too large, we provide the analytical form of optimal designs under the type-H covariance matrix. We find that for a fixed number of treatments, say <mjx-container aria-label="t" ctxtmenu_counter="0" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic- data-semantic-role="latinletter" data-semantic-speech="t" data-semantic-type="identifier"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden="true" display="inline" unselectable="on"><math altimg="/cms/asset/c3669d78-641d-4172-958e-37ddc1934825/sta4668-math-0001.png" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi data-semantic-="" data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic-role="latinletter" data-semantic-speech="t" data-semantic-type="identifier">t</mi></mrow>$$ t $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, the number of distinct treatments appearing in the support sequences increases with the increase of the number of periods, <mjx-container aria-label="k" ctxtmenu_counter="1" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic- data-semantic-role="latinletter" data-semantic-speech="k" data-semantic-type="identifier"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden="true" display="inline" unselectable="on"><math altimg="/cms/asset/c09a2ac1-1512-49b7-8baa-c3acf0ec7390/sta4668-math-0002.png" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi data-semantic-="" data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic-role="latinletter" data-semantic-speech="k" data-semantic-type="identifier">k</mi></mrow>$$ k $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, until <mjx-container aria-label="k greater than or equals t squared" ctxtmenu_counter="2" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow data-semantic-children="0,4" data-semantic-content="1" data-semanti
本文研究了在具有部分交互作用的交叉模型下估计总效应的普遍最优设计。我们提供了对称设计成为普遍最优设计的必要条件和充分条件,在此基础上,可以使用算法推导出任何形式的块内协方差矩阵下的最优对称设计。为了应对实验规模过大时算法的计算复杂性,我们提供了 H 型协方差矩阵下最优设计的解析形式。我们发现,对于固定数量的处理(例如 t$$ t $$),支持序列中出现的不同处理的数量会随着周期数 k$$ k $$的增加而增加,直到 k≥t2$$ kge {t}^2 $$,在这种情况下,所有 t$$ t $$的处理都会出现。最佳设计最多可由两个代表性序列构成,其中每个处理在重复次数相等或几乎相等的连续时期内出现。
{"title":"Optimal designs for crossover model with partial interactions","authors":"Futao Zhang, Pierre Druilhet, Xiangshun Kong","doi":"10.1002/sta4.668","DOIUrl":"https://doi.org/10.1002/sta4.668","url":null,"abstract":"This paper studies the universally optimal designs for estimating total effects under crossover models with partial interactions. We provide necessary and sufficient conditions for a symmetric design to be universally optimal, based on which algorithms can be used to derive optimal symmetric designs under any form of the within-block covariance matrix. To cope with the computational complexity of algorithms when the experimental scale is too large, we provide the analytical form of optimal designs under the type-H covariance matrix. We find that for a fixed number of treatments, say &lt;mjx-container aria-label=\"t\" ctxtmenu_counter=\"0\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"&gt;&lt;mjx-math aria-hidden=\"true\"&gt;&lt;mjx-semantics&gt;&lt;mjx-mrow&gt;&lt;mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\"&gt;&lt;mjx-c&gt;&lt;/mjx-c&gt;&lt;/mjx-mi&gt;&lt;/mjx-mrow&gt;&lt;/mjx-semantics&gt;&lt;/mjx-math&gt;&lt;mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"&gt;&lt;math altimg=\"/cms/asset/c3669d78-641d-4172-958e-37ddc1934825/sta4668-math-0001.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\"&gt;t&lt;/mi&gt;&lt;/mrow&gt;$$ t $$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/mjx-assistive-mml&gt;&lt;/mjx-container&gt;, the number of distinct treatments appearing in the support sequences increases with the increase of the number of periods, &lt;mjx-container aria-label=\"k\" ctxtmenu_counter=\"1\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"&gt;&lt;mjx-math aria-hidden=\"true\"&gt;&lt;mjx-semantics&gt;&lt;mjx-mrow&gt;&lt;mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\"&gt;&lt;mjx-c&gt;&lt;/mjx-c&gt;&lt;/mjx-mi&gt;&lt;/mjx-mrow&gt;&lt;/mjx-semantics&gt;&lt;/mjx-math&gt;&lt;mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"&gt;&lt;math altimg=\"/cms/asset/c09a2ac1-1512-49b7-8baa-c3acf0ec7390/sta4668-math-0002.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"&gt;&lt;semantics&gt;&lt;mrow&gt;&lt;mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\"&gt;k&lt;/mi&gt;&lt;/mrow&gt;$$ k $$&lt;/annotation&gt;&lt;/semantics&gt;&lt;/math&gt;&lt;/mjx-assistive-mml&gt;&lt;/mjx-container&gt;, until &lt;mjx-container aria-label=\"k greater than or equals t squared\" ctxtmenu_counter=\"2\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"&gt;&lt;mjx-math aria-hidden=\"true\"&gt;&lt;mjx-semantics&gt;&lt;mjx-mrow data-semantic-children=\"0,4\" data-semantic-content=\"1\" data-semanti","PeriodicalId":56159,"journal":{"name":"Stat","volume":"134 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based mutually exciting point processes for modelling event times in docked bike-sharing systems 基于图的互激点过程,用于模拟有桩共享单车系统中的事件时间
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-07 DOI: 10.1002/sta4.660
Francesco Sanna Passino, Yining Che, Carlos Cardoso Correia Perello
This paper introduces graph-based mutually exciting processes (GB-MEP) to model event times in network point processes, focusing on an application to docked bike-sharing systems. GB-MEP incorporates known relationships between nodes in a graph within the intensity function of a node-based multivariate Hawkes process. This approach reduces the number of parameters to a quantity proportional to the number of nodes in the network, resulting in significant advantages for computational scalability when compared with traditional methods. The model is applied on event data observed on the Santander Cycles network in central London, demonstrating that exploiting network-wide information related to geographical location of the stations is beneficial to improve the performance of node-based models for applications in bike-sharing systems. The proposed GB-MEP framework is more generally applicable to any network point process where a distance function between nodes is available, demonstrating wider applicability.
本文介绍了基于图的互激过程(GB-MEP),以模拟网络点过程中的事件时间,重点关注有桩共享单车系统的应用。GB-MEP 将图中节点之间的已知关系纳入基于节点的多变量霍克斯过程的强度函数中。这种方法将参数数量减少到与网络中节点数量成正比,与传统方法相比,在计算可扩展性方面具有显著优势。该模型应用于在伦敦市中心桑坦德自行车网络上观察到的事件数据,证明利用与站点地理位置相关的全网信息有利于提高基于节点的模型在共享单车系统中的应用性能。所提出的 GB-MEP 框架更普遍地适用于节点间存在距离函数的任何网络点过程,从而证明了其更广泛的适用性。
{"title":"Graph-based mutually exciting point processes for modelling event times in docked bike-sharing systems","authors":"Francesco Sanna Passino, Yining Che, Carlos Cardoso Correia Perello","doi":"10.1002/sta4.660","DOIUrl":"https://doi.org/10.1002/sta4.660","url":null,"abstract":"This paper introduces graph-based mutually exciting processes (GB-MEP) to model event times in network point processes, focusing on an application to docked bike-sharing systems. GB-MEP incorporates known relationships between nodes in a graph within the intensity function of a node-based multivariate Hawkes process. This approach reduces the number of parameters to a quantity proportional to the number of nodes in the network, resulting in significant advantages for computational scalability when compared with traditional methods. The model is applied on event data observed on the Santander Cycles network in central London, demonstrating that exploiting network-wide information related to geographical location of the stations is beneficial to improve the performance of node-based models for applications in bike-sharing systems. The proposed GB-MEP framework is more generally applicable to any network point process where a distance function between nodes is available, demonstrating wider applicability.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"25 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Table inference for combinatorial origin‐destination choices in agent‐based population synthesis 基于代理的种群合成中原产地-目的地组合选择的表格推论
IF 1.7 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-03-06 DOI: 10.1002/sta4.656
Ioannis Zachos, Theodoros Damoulas, Mark Girolami
A key challenge in agent‐based mobility simulations is the synthesis of individual agent socioeconomic profiles. Such profiles include locations of agent activities, which dictate the quality of the simulated travel patterns. These locations are typically represented in origin‐destination matrices that are sampled using coarse travel surveys. This is because fine‐grained trip profiles are scarce and fragmented due to privacy and cost reasons. The discrepancy between data and sampling resolutions renders agent traits nonidentifiable due to the combinatorial space of data‐consistent individual attributes. This problem is pertinent to any agent‐based inference setting where the latent state is discrete. Existing approaches have used continuous relaxations of the underlying location assignments and subsequent ad hoc discretisation thereof. We propose a framework to efficiently navigate this space offering improved reconstruction and coverage as well as linear‐time sampling of the ground truth origin‐destination table. This allows us to avoid factorially growing rejection rates and poor summary statistic consistency inherent in discrete choice modelling. We achieve this by introducing joint sampling schemes for the continuous intensity and discrete table of agent trips, as well as Markov bases that can efficiently traverse this combinatorial space subject to summary statistic constraints. Our framework's benefits are demonstrated in multiple controlled experiments and a large‐scale application to agent work trip reconstruction in Cambridge, UK.
基于代理的移动模拟面临的一个主要挑战是如何综合代理的社会经济概况。这些概况包括代理人的活动地点,这些地点决定了模拟出行模式的质量。这些地点通常在出发地-目的地矩阵中体现,而出发地-目的地矩阵是通过粗略的旅行调查采样得到的。这是因为出于隐私和成本方面的考虑,细粒度的旅行概况非常稀少和分散。由于数据和采样分辨率之间的差异,与数据一致的个体属性的组合空间使得代理特征无法识别。这个问题与潜在状态离散的任何基于代理的推理设置都相关。现有的方法使用了对基础位置分配的连续松弛,以及随后的临时离散化。我们提出了一个框架,可以有效地浏览这个空间,提供更好的重构和覆盖率,并对基本真实的原籍-目的地表进行线性时间采样。这样,我们就能避免离散选择建模中固有的因数增长的拒绝率和较差的汇总统计一致性。为此,我们引入了连续强度和代理行程离散表的联合采样方案,并引入了马尔可夫基(Markov bases),可以在汇总统计约束条件下有效地遍历这一组合空间。我们的框架在多个受控实验和英国剑桥代理人工作行程重建的大规模应用中展示了其优势。
{"title":"Table inference for combinatorial origin‐destination choices in agent‐based population synthesis","authors":"Ioannis Zachos, Theodoros Damoulas, Mark Girolami","doi":"10.1002/sta4.656","DOIUrl":"https://doi.org/10.1002/sta4.656","url":null,"abstract":"A key challenge in agent‐based mobility simulations is the synthesis of individual agent socioeconomic profiles. Such profiles include locations of agent activities, which dictate the quality of the simulated travel patterns. These locations are typically represented in origin‐destination matrices that are sampled using coarse travel surveys. This is because fine‐grained trip profiles are scarce and fragmented due to privacy and cost reasons. The discrepancy between data and sampling resolutions renders agent traits nonidentifiable due to the combinatorial space of data‐consistent individual attributes. This problem is pertinent to any agent‐based inference setting where the latent state is discrete. Existing approaches have used continuous relaxations of the underlying location assignments and subsequent ad hoc discretisation thereof. We propose a framework to efficiently navigate this space offering improved reconstruction and coverage as well as linear‐time sampling of the ground truth origin‐destination table. This allows us to avoid factorially growing rejection rates and poor summary statistic consistency inherent in discrete choice modelling. We achieve this by introducing joint sampling schemes for the continuous intensity and discrete table of agent trips, as well as Markov bases that can efficiently traverse this combinatorial space subject to summary statistic constraints. Our framework's benefits are demonstrated in multiple controlled experiments and a large‐scale application to agent work trip reconstruction in Cambridge, UK.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"105 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140056692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Stat
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1