This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.
{"title":"Double verification for two‐sample covariance matrices test","authors":"Wenming Sun, Lingfeng Lyu, Xiao Guo","doi":"10.1002/sta4.670","DOIUrl":"https://doi.org/10.1002/sta4.670","url":null,"abstract":"This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"4 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel
Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.
{"title":"STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model","authors":"Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel","doi":"10.1002/sta4.671","DOIUrl":"https://doi.org/10.1002/sta4.671","url":null,"abstract":"Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"31 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.
{"title":"Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning","authors":"Wei Dong, Sanying Feng","doi":"10.1002/sta4.669","DOIUrl":"https://doi.org/10.1002/sta4.669","url":null,"abstract":"Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"9 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette
The area under the receiver‐operating characteristic curve (AUC) has become a popular index not only for measuring the overall prediction capacity of a marker but also the strength of the association between continuous and binary variables. In the current considered study, the AUC was used for comparing the association size of four different interventions involving impulsive decision making, studied through an animal model, in which each animal provides several negative (pretreatment) and positive (posttreatment) measures. The problem of the full comparison of the average AUCs arises therefore in a natural way. We construct an analysis of variance (ANOVA) type test for testing the equality of the impact of these treatments measured through the respective AUCs and considering the random‐effect represented by the animal. The use (and development) of a post hoc Tukey's HSD‐type test is also considered. We explore the finite‐sample behaviour of our proposal via Monte Carlo simulations, and analyse the data generated from the original problem. An R package implementing the procedures is provided in the supporting information.
{"title":"Comparing the effectiveness of k$$ k $$‐different treatments through the area under the ROC curve","authors":"Pablo Martínez‐Camblor, Sonia Pérez‐Fernández, Lucas L. Dwiel, Wilder T. Doucette","doi":"10.1002/sta4.672","DOIUrl":"https://doi.org/10.1002/sta4.672","url":null,"abstract":"The area under the receiver‐operating characteristic curve (AUC) has become a popular index not only for measuring the overall prediction capacity of a marker but also the strength of the association between continuous and binary variables. In the current considered study, the AUC was used for comparing the association size of four different interventions involving impulsive decision making, studied through an animal model, in which each animal provides several negative (pretreatment) and positive (posttreatment) measures. The problem of the full comparison of the average AUCs arises therefore in a natural way. We construct an analysis of variance (ANOVA) type test for testing the equality of the impact of these treatments measured through the respective AUCs and considering the random‐effect represented by the animal. The use (and development) of a post hoc Tukey's HSD‐type test is also considered. We explore the finite‐sample behaviour of our proposal via Monte Carlo simulations, and analyse the data generated from the original problem. An R package implementing the procedures is provided in the supporting information.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"119 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nested orthogonal arrays (NOAs) provide an option for designing an experimental setup consisting of two experiments, with the expensive higher‐precision experiment nested within a larger and relatively inexpensive lower‐precision experiment. Construction of NOAs with the adjacent numbers of levels is a challenging problem. In this paper, we present several methods for constructing such NOAs and obtain some classes of such new symmetric NOAs in which the larger arrays have minimum run size. These methods are also extended to construction of NOAs with more than two layers. Furthermore, by adding some columns to these symmetric NOAs, we can construct a lot of new asymmetric NOAs. Illustrative examples are given.
{"title":"On the construction of nested orthogonal arrays with the adjacent numbers of levels","authors":"Shanqi Pang, Yan Zhu","doi":"10.1002/sta4.666","DOIUrl":"https://doi.org/10.1002/sta4.666","url":null,"abstract":"Nested orthogonal arrays (NOAs) provide an option for designing an experimental setup consisting of two experiments, with the expensive higher‐precision experiment nested within a larger and relatively inexpensive lower‐precision experiment. Construction of NOAs with the adjacent numbers of levels is a challenging problem. In this paper, we present several methods for constructing such NOAs and obtain some classes of such new symmetric NOAs in which the larger arrays have minimum run size. These methods are also extended to construction of NOAs with more than two layers. Furthermore, by adding some columns to these symmetric NOAs, we can construct a lot of new asymmetric NOAs. Illustrative examples are given.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"381 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140302361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Continuous responses measured on a standard unit interval are ubiquitous in many scientific disciplines. Statistical models built upon a normal error structure do not generally work because they can produce biassed estimates or result in predictions outside either bound. In real‐life applications, data are often high‐dimensional, correlated and consist of a mixture of various data types. Little literature is available to address the unique data challenge. We propose a semiparametric approach to analyse the association between a double‐bounded response and high‐dimensional correlated covariates of mixed types. The proposed method makes full use of all available data through one or several linear combinations of the covariates without losing information from the data. The only assumption we make is that the response variable follows a Beta distribution; no additional assumption is required. The resulting estimators are consistent and efficient. We illustrate the proposed method in simulation studies and demonstrate it in a real‐life data application. The semiparametric approach contributes to the sufficient dimension reduction literature for its novelty in investigating double‐bounded response which is absent in the current literature. This work also provides a new tool for data practitioners to analyse the association between a popular unit interval response and mixed types of high‐dimensional correlated covariates.
{"title":"Beta regression for double‐bounded response with correlated high‐dimensional covariates","authors":"Jianxuan Liu","doi":"10.1002/sta4.663","DOIUrl":"https://doi.org/10.1002/sta4.663","url":null,"abstract":"Continuous responses measured on a standard unit interval are ubiquitous in many scientific disciplines. Statistical models built upon a normal error structure do not generally work because they can produce biassed estimates or result in predictions outside either bound. In real‐life applications, data are often high‐dimensional, correlated and consist of a mixture of various data types. Little literature is available to address the unique data challenge. We propose a semiparametric approach to analyse the association between a double‐bounded response and high‐dimensional correlated covariates of mixed types. The proposed method makes full use of all available data through one or several linear combinations of the covariates without losing information from the data. The only assumption we make is that the response variable follows a Beta distribution; no additional assumption is required. The resulting estimators are consistent and efficient. We illustrate the proposed method in simulation studies and demonstrate it in a real‐life data application. The semiparametric approach contributes to the sufficient dimension reduction literature for its novelty in investigating double‐bounded response which is absent in the current literature. This work also provides a new tool for data practitioners to analyse the association between a popular unit interval response and mixed types of high‐dimensional correlated covariates.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"1 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140127092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer
Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.
函数数据的探索性数据分析(EDA)--观测值是整个函数的数据对象--是一个难题,最近的文献对此给予了极大关注。函数数据无处不在,在气象学、生物学、医学和工程学等领域的应用中十分普遍,因此人们对函数数据的兴趣大增。经验概率密度函数(PDF)可视为受约束的函数数据对象,必须积分为一且为非负。它们出现在年收入分布、海洋学中浮游动物的大小结构和大脑的连接模式等方面。虽然 PDF 数据在现代研究中很常见,但很少有人关注专门针对 PDF 的 EDA。在本文中,我们扩展了几种针对 PDF 函数数据的 EDA 方法,并在模拟数据上对这些方法进行了比较,模拟数据表现出不同类型的变化,旨在模拟真实世界中的应用。然后,我们使用新方法对地下断裂网络气体输送模拟中观察到的突破曲线进行 EDA。
{"title":"Visualisation and outlier detection for probability density function ensembles","authors":"Alexander C. Murph, Justin D. Strait, Kelly R. Moran, Jeffrey D. Hyman, Philip H. Stauffer","doi":"10.1002/sta4.662","DOIUrl":"https://doi.org/10.1002/sta4.662","url":null,"abstract":"Exploratory data analysis (EDA) for functional data—data objects where observations are entire functions—is a difficult problem that has seen significant attention in recent literature. This surge in interest is motivated by the ubiquitous nature of functional data, which are prevalent in applications across fields such as meteorology, biology, medicine and engineering. Empirical probability density functions (PDFs) can be viewed as constrained functional data objects that must integrate to one and be nonnegative. They show up in contexts such as yearly income distributions, zooplankton size structure in oceanography and in connectivity patterns in the brain, among others. While PDF data are certainly common in modern research, little attention has been given to EDA specifically for PDFs. In this paper, we extend several methods for EDA on functional data for PDFs and compare them on simulated data that exhibit different types of variation, designed to mimic that seen in real-world applications. We then use our new methods to perform EDA on the breakthrough curves observed in gas transport simulations for underground fracture networks.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"15 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140116940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper studies the universally optimal designs for estimating total effects under crossover models with partial interactions. We provide necessary and sufficient conditions for a symmetric design to be universally optimal, based on which algorithms can be used to derive optimal symmetric designs under any form of the within-block covariance matrix. To cope with the computational complexity of algorithms when the experimental scale is too large, we provide the analytical form of optimal designs under the type-H covariance matrix. We find that for a fixed number of treatments, say <mjx-container aria-label="t" ctxtmenu_counter="0" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic- data-semantic-role="latinletter" data-semantic-speech="t" data-semantic-type="identifier"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden="true" display="inline" unselectable="on"><math altimg="/cms/asset/c3669d78-641d-4172-958e-37ddc1934825/sta4668-math-0001.png" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi data-semantic-="" data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic-role="latinletter" data-semantic-speech="t" data-semantic-type="identifier">t</mi></mrow>$$ t $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, the number of distinct treatments appearing in the support sequences increases with the increase of the number of periods, <mjx-container aria-label="k" ctxtmenu_counter="1" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic- data-semantic-role="latinletter" data-semantic-speech="k" data-semantic-type="identifier"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden="true" display="inline" unselectable="on"><math altimg="/cms/asset/c09a2ac1-1512-49b7-8baa-c3acf0ec7390/sta4668-math-0002.png" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi data-semantic-="" data-semantic-annotation="clearspeak:simple" data-semantic-font="italic" data-semantic-role="latinletter" data-semantic-speech="k" data-semantic-type="identifier">k</mi></mrow>$$ k $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, until <mjx-container aria-label="k greater than or equals t squared" ctxtmenu_counter="2" ctxtmenu_oldtabindex="1" jax="CHTML" role="application" sre-explorer- style="font-size: 103%; position: relative;" tabindex="0"><mjx-math aria-hidden="true"><mjx-semantics><mjx-mrow data-semantic-children="0,4" data-semantic-content="1" data-semanti
本文研究了在具有部分交互作用的交叉模型下估计总效应的普遍最优设计。我们提供了对称设计成为普遍最优设计的必要条件和充分条件,在此基础上,可以使用算法推导出任何形式的块内协方差矩阵下的最优对称设计。为了应对实验规模过大时算法的计算复杂性,我们提供了 H 型协方差矩阵下最优设计的解析形式。我们发现,对于固定数量的处理(例如 t$$ t $$),支持序列中出现的不同处理的数量会随着周期数 k$$ k $$的增加而增加,直到 k≥t2$$ kge {t}^2 $$,在这种情况下,所有 t$$ t $$的处理都会出现。最佳设计最多可由两个代表性序列构成,其中每个处理在重复次数相等或几乎相等的连续时期内出现。
{"title":"Optimal designs for crossover model with partial interactions","authors":"Futao Zhang, Pierre Druilhet, Xiangshun Kong","doi":"10.1002/sta4.668","DOIUrl":"https://doi.org/10.1002/sta4.668","url":null,"abstract":"This paper studies the universally optimal designs for estimating total effects under crossover models with partial interactions. We provide necessary and sufficient conditions for a symmetric design to be universally optimal, based on which algorithms can be used to derive optimal symmetric designs under any form of the within-block covariance matrix. To cope with the computational complexity of algorithms when the experimental scale is too large, we provide the analytical form of optimal designs under the type-H covariance matrix. We find that for a fixed number of treatments, say <mjx-container aria-label=\"t\" ctxtmenu_counter=\"0\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"><math altimg=\"/cms/asset/c3669d78-641d-4172-958e-37ddc1934825/sta4668-math-0001.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"t\" data-semantic-type=\"identifier\">t</mi></mrow>$$ t $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, the number of distinct treatments appearing in the support sequences increases with the increase of the number of periods, <mjx-container aria-label=\"k\" ctxtmenu_counter=\"1\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\"><mjx-c></mjx-c></mjx-mi></mjx-mrow></mjx-semantics></mjx-math><mjx-assistive-mml aria-hidden=\"true\" display=\"inline\" unselectable=\"on\"><math altimg=\"/cms/asset/c09a2ac1-1512-49b7-8baa-c3acf0ec7390/sta4668-math-0002.png\" xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi data-semantic-=\"\" data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic-role=\"latinletter\" data-semantic-speech=\"k\" data-semantic-type=\"identifier\">k</mi></mrow>$$ k $$</annotation></semantics></math></mjx-assistive-mml></mjx-container>, until <mjx-container aria-label=\"k greater than or equals t squared\" ctxtmenu_counter=\"2\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" role=\"application\" sre-explorer- style=\"font-size: 103%; position: relative;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\"><mjx-semantics><mjx-mrow data-semantic-children=\"0,4\" data-semantic-content=\"1\" data-semanti","PeriodicalId":56159,"journal":{"name":"Stat","volume":"134 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Sanna Passino, Yining Che, Carlos Cardoso Correia Perello
This paper introduces graph-based mutually exciting processes (GB-MEP) to model event times in network point processes, focusing on an application to docked bike-sharing systems. GB-MEP incorporates known relationships between nodes in a graph within the intensity function of a node-based multivariate Hawkes process. This approach reduces the number of parameters to a quantity proportional to the number of nodes in the network, resulting in significant advantages for computational scalability when compared with traditional methods. The model is applied on event data observed on the Santander Cycles network in central London, demonstrating that exploiting network-wide information related to geographical location of the stations is beneficial to improve the performance of node-based models for applications in bike-sharing systems. The proposed GB-MEP framework is more generally applicable to any network point process where a distance function between nodes is available, demonstrating wider applicability.
{"title":"Graph-based mutually exciting point processes for modelling event times in docked bike-sharing systems","authors":"Francesco Sanna Passino, Yining Che, Carlos Cardoso Correia Perello","doi":"10.1002/sta4.660","DOIUrl":"https://doi.org/10.1002/sta4.660","url":null,"abstract":"This paper introduces graph-based mutually exciting processes (GB-MEP) to model event times in network point processes, focusing on an application to docked bike-sharing systems. GB-MEP incorporates known relationships between nodes in a graph within the intensity function of a node-based multivariate Hawkes process. This approach reduces the number of parameters to a quantity proportional to the number of nodes in the network, resulting in significant advantages for computational scalability when compared with traditional methods. The model is applied on event data observed on the Santander Cycles network in central London, demonstrating that exploiting network-wide information related to geographical location of the stations is beneficial to improve the performance of node-based models for applications in bike-sharing systems. The proposed GB-MEP framework is more generally applicable to any network point process where a distance function between nodes is available, demonstrating wider applicability.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"25 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A key challenge in agent‐based mobility simulations is the synthesis of individual agent socioeconomic profiles. Such profiles include locations of agent activities, which dictate the quality of the simulated travel patterns. These locations are typically represented in origin‐destination matrices that are sampled using coarse travel surveys. This is because fine‐grained trip profiles are scarce and fragmented due to privacy and cost reasons. The discrepancy between data and sampling resolutions renders agent traits nonidentifiable due to the combinatorial space of data‐consistent individual attributes. This problem is pertinent to any agent‐based inference setting where the latent state is discrete. Existing approaches have used continuous relaxations of the underlying location assignments and subsequent ad hoc discretisation thereof. We propose a framework to efficiently navigate this space offering improved reconstruction and coverage as well as linear‐time sampling of the ground truth origin‐destination table. This allows us to avoid factorially growing rejection rates and poor summary statistic consistency inherent in discrete choice modelling. We achieve this by introducing joint sampling schemes for the continuous intensity and discrete table of agent trips, as well as Markov bases that can efficiently traverse this combinatorial space subject to summary statistic constraints. Our framework's benefits are demonstrated in multiple controlled experiments and a large‐scale application to agent work trip reconstruction in Cambridge, UK.
{"title":"Table inference for combinatorial origin‐destination choices in agent‐based population synthesis","authors":"Ioannis Zachos, Theodoros Damoulas, Mark Girolami","doi":"10.1002/sta4.656","DOIUrl":"https://doi.org/10.1002/sta4.656","url":null,"abstract":"A key challenge in agent‐based mobility simulations is the synthesis of individual agent socioeconomic profiles. Such profiles include locations of agent activities, which dictate the quality of the simulated travel patterns. These locations are typically represented in origin‐destination matrices that are sampled using coarse travel surveys. This is because fine‐grained trip profiles are scarce and fragmented due to privacy and cost reasons. The discrepancy between data and sampling resolutions renders agent traits nonidentifiable due to the combinatorial space of data‐consistent individual attributes. This problem is pertinent to any agent‐based inference setting where the latent state is discrete. Existing approaches have used continuous relaxations of the underlying location assignments and subsequent ad hoc discretisation thereof. We propose a framework to efficiently navigate this space offering improved reconstruction and coverage as well as linear‐time sampling of the ground truth origin‐destination table. This allows us to avoid factorially growing rejection rates and poor summary statistic consistency inherent in discrete choice modelling. We achieve this by introducing joint sampling schemes for the continuous intensity and discrete table of agent trips, as well as Markov bases that can efficiently traverse this combinatorial space subject to summary statistic constraints. Our framework's benefits are demonstrated in multiple controlled experiments and a large‐scale application to agent work trip reconstruction in Cambridge, UK.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"105 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140056692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}