The minimum moment aberration and the minimum Lee‐moment aberration criteria are two popular conceptually simple and computationally cheap criteria for selecting good designs. However, the minimum moment aberration is suitable for qualitative factors, and the minimum Lee‐moment aberration cannot distinguish some designs with high‐level quantitative factors. In this paper, the minimum absolute‐moment aberration criterion is proposed to compare and select designs with multi‐level quantitative factors. We validate the statistical justifications of this criterion from theoretical and numerical aspects. Furthermore, we extend the minimum absolute‐moment aberration criterion into screening designs with both qualitative and quantitative factors, naming the new criterion as the minimum mixed‐moment aberration criterion. Then we utilise a numerical study to compare and evaluate the performance of some popular designs with both qualitative and quantitative factors in computer experiments.
{"title":"Generalised minimum moment aberration for designs with both qualitative and quantitative factors","authors":"Yao Xiao, Na Zou, Hong Qin, Kang Wang","doi":"10.1002/sta4.684","DOIUrl":"https://doi.org/10.1002/sta4.684","url":null,"abstract":"The minimum moment aberration and the minimum Lee‐moment aberration criteria are two popular conceptually simple and computationally cheap criteria for selecting good designs. However, the minimum moment aberration is suitable for qualitative factors, and the minimum Lee‐moment aberration cannot distinguish some designs with high‐level quantitative factors. In this paper, the minimum absolute‐moment aberration criterion is proposed to compare and select designs with multi‐level quantitative factors. We validate the statistical justifications of this criterion from theoretical and numerical aspects. Furthermore, we extend the minimum absolute‐moment aberration criterion into screening designs with both qualitative and quantitative factors, naming the new criterion as the minimum mixed‐moment aberration criterion. Then we utilise a numerical study to compare and evaluate the performance of some popular designs with both qualitative and quantitative factors in computer experiments.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140833309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the increasing importance of high‐dimensional varying coefficient models, the study of their Bayesian versions is still in its infancy. This paper contributes to the literature by developing a sparse empirical Bayes formulation that addresses the problem of high‐dimensional model selection in the framework of Bayesian varying coefficient modelling under Gaussian process (GP) priors. To break the computational bottleneck of GP‐based varying coefficient modelling, we introduce the low‐cost computation strategy that incorporates linear algebra techniques and the Laplace approximation into the evaluation of the high‐dimensional posterior model distribution. A simulation study is conducted to demonstrate the superiority of the proposed Bayesian method compared to an existing high‐dimensional varying coefficient modelling approach. In addition, its applicability to real data analysis is illustrated using yeast cell cycle data.
尽管高维变化系数模型越来越重要,但对其贝叶斯版本的研究仍处于起步阶段。本文通过开发一种稀疏经验贝叶斯公式,在高斯过程(GP)先验下的贝叶斯变化系数建模框架内解决了高维模型选择问题,为相关文献做出了贡献。为了打破基于 GP 的变化系数建模的计算瓶颈,我们引入了低成本计算策略,将线性代数技术和拉普拉斯近似纳入高维后验模型分布的评估中。我们进行了一项模拟研究,以证明与现有的高维变化系数建模方法相比,所提出的贝叶斯方法更具优势。此外,还利用酵母细胞周期数据说明了该方法在实际数据分析中的适用性。
{"title":"A sparse empirical Bayes approach to high‐dimensional Gaussian process‐based varying coefficient models","authors":"Myungjin Kim, Gyuhyeong Goh","doi":"10.1002/sta4.678","DOIUrl":"https://doi.org/10.1002/sta4.678","url":null,"abstract":"Despite the increasing importance of high‐dimensional varying coefficient models, the study of their Bayesian versions is still in its infancy. This paper contributes to the literature by developing a sparse empirical Bayes formulation that addresses the problem of high‐dimensional model selection in the framework of Bayesian varying coefficient modelling under Gaussian process (GP) priors. To break the computational bottleneck of GP‐based varying coefficient modelling, we introduce the low‐cost computation strategy that incorporates linear algebra techniques and the Laplace approximation into the evaluation of the high‐dimensional posterior model distribution. A simulation study is conducted to demonstrate the superiority of the proposed Bayesian method compared to an existing high‐dimensional varying coefficient modelling approach. In addition, its applicability to real data analysis is illustrated using yeast cell cycle data.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emily Slade, Sarah Jane K. Robbins, Kristen J. McQuerry, Anthony A. Mangino
Collaborative biostatistics units within universities and academic medical centres operate under a wide range of different funding models; common to many of these models is the challenge of allocating time to activities that are not linked to a specific research project, such as professional development, mentorship and administrative tasks. The purpose of this paper is to describe a proposed model for ‘flexible funding’, that is, funding that is not linked to a specific research project, within a collaborative biostatistics unit and to detail the benefits and challenges associated with the proposed model. We present results from a qualitative study representing the perspectives of collaborative biostatisticians working under the proposed flexible funding model. In addition to providing examples of activities undertaken as part of time allocated to flexible funding, the qualitative results reveal several benefits of flexible funding both for a collaborative biostatistician (e.g., job satisfaction and professional development) and for the collaborative biostatistics unit as a whole (e.g., retention, process improvement, and leadership).
{"title":"The value of flexible funding for collaborative biostatistics units in universities and academic medical centres","authors":"Emily Slade, Sarah Jane K. Robbins, Kristen J. McQuerry, Anthony A. Mangino","doi":"10.1002/sta4.679","DOIUrl":"https://doi.org/10.1002/sta4.679","url":null,"abstract":"Collaborative biostatistics units within universities and academic medical centres operate under a wide range of different funding models; common to many of these models is the challenge of allocating time to activities that are not linked to a specific research project, such as professional development, mentorship and administrative tasks. The purpose of this paper is to describe a proposed model for ‘flexible funding’, that is, funding that is not linked to a specific research project, within a collaborative biostatistics unit and to detail the benefits and challenges associated with the proposed model. We present results from a qualitative study representing the perspectives of collaborative biostatisticians working under the proposed flexible funding model. In addition to providing examples of activities undertaken as part of time allocated to flexible funding, the qualitative results reveal several benefits of flexible funding both for a collaborative biostatistician (e.g., job satisfaction and professional development) and for the collaborative biostatistics unit as a whole (e.g., retention, process improvement, and leadership).","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicola Hewett, Lee Fawcett, Andrew Golightly, Neil Thorpe
Improving road safety is hugely important with the number of deaths on the world's roads remaining unacceptably high; an estimated 1.3 million people die each year as a result of road traffic collisions. Current practice for treating collision hotspots is almost always reactive: once a threshold level of collisions has been overtopped during some pre‐determined observation period, treatment is applied (e.g., road safety cameras). Traffic collisions are rare, so prolonged observation periods are necessary. However, traffic conflicts are more frequent and are a margin of the social cost; hence, traffic conflict before/after studies can be conducted over shorter time periods. We investigate the effect of implementing the leading pedestrian interval treatment at signalised intersections as a safety intervention in a city in north America. Pedestrian‐vehicle traffic conflict data were collected from treatment and control sites during the before and after periods. We implement a before/after study on post‐encroachment times (PETs) where small PET values denote ‘near‐misses’. Hence, extreme value theory is employed to model extremes of our PET processes, with adjustments to the usual modelling framework to account for temporal dependence and treatment effects.
改善道路安全极为重要,因为全球道路上的死亡人数仍然高得令人无法接受;据估计,每年有 130 万人死于道路交通碰撞事故。目前处理碰撞热点的做法几乎总是被动的:一旦在某个预先确定的观察期内碰撞次数超过了临界值,就会采取相应的处理措施(如道路安全摄像机)。交通碰撞很少发生,因此有必要延长观察期。然而,交通冲突较为频繁,是社会成本的一个边际;因此,交通冲突前后的研究可以在较短的时间段内进行。我们在美国北部的一个城市调查了在信号灯控制的交叉路口实施领先行人间隔处理作为安全干预措施的效果。在实施前后,我们分别从实施地点和对照地点收集了行人与车辆交通冲突的数据。我们对蚕食后时间(PET)进行了前后研究,其中较小的 PET 值表示 "近乎失误"。因此,我们采用极值理论对 PET 过程的极值进行建模,并对通常的建模框架进行调整,以考虑时间依赖性和处理效果。
{"title":"Using extreme value theory to evaluate the leading pedestrian interval road safety intervention","authors":"Nicola Hewett, Lee Fawcett, Andrew Golightly, Neil Thorpe","doi":"10.1002/sta4.676","DOIUrl":"https://doi.org/10.1002/sta4.676","url":null,"abstract":"Improving road safety is hugely important with the number of deaths on the world's roads remaining unacceptably high; an estimated 1.3 million people die each year as a result of road traffic collisions. Current practice for treating collision hotspots is almost always reactive: once a threshold level of collisions has been overtopped during some pre‐determined observation period, treatment is applied (e.g., road safety cameras). Traffic collisions are rare, so prolonged observation periods are necessary. However, traffic <jats:italic>conflicts</jats:italic> are more frequent and are a margin of the social cost; hence, traffic conflict before/after studies can be conducted over shorter time periods. We investigate the effect of implementing the leading pedestrian interval treatment at signalised intersections as a safety intervention in a city in north America. Pedestrian‐vehicle traffic conflict data were collected from treatment and control sites during the before and after periods. We implement a before/after study on post‐encroachment times (PETs) where small PET values denote ‘near‐misses’. Hence, extreme value theory is employed to model extremes of our PET processes, with adjustments to the usual modelling framework to account for temporal dependence and treatment effects.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140626894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Taylor Brown, Megan Mehta, Mahathi Ryali, Xiaoran Dong, Iliya Shadfar, Jacqueline Dominquez Davalos, Aaron Culich, Anthony Suen
As one of the largest data science research incubator initiatives in the country, the University of California, Berkeley's Data Science Discovery Program serves as a case study for a scalable and sustainable model of data science consulting in higher education. This case contributes to the broader literature on data science consulting in higher education by analysing the programme's development, institutional influences; staffing and structural model; and defining features, which may prove instructive to similar programmes at other institutions. The programme is characterised by a unique structure of undergraduate consultations led by graduate student mentorship and governance; a streamlined, multidepartmental model that facilitates scalability and sustainability; and diverse modes for undergraduate consulting—including one‐on‐one ad‐hoc data science consultations, extended data science project development and management, peer mentorship and data science workshop instruction. This case demonstrates that universities may be able to initiate a low‐stakes, small‐scale data science consulting initiative and then progressively scale up the project in collaboration with multiple departments and organisations across campus.
{"title":"The data science discovery program: A model for data science consulting in higher education","authors":"C. Taylor Brown, Megan Mehta, Mahathi Ryali, Xiaoran Dong, Iliya Shadfar, Jacqueline Dominquez Davalos, Aaron Culich, Anthony Suen","doi":"10.1002/sta4.677","DOIUrl":"https://doi.org/10.1002/sta4.677","url":null,"abstract":"As one of the largest data science research incubator initiatives in the country, the University of California, Berkeley's Data Science Discovery Program serves as a case study for a scalable and sustainable model of data science consulting in higher education. This case contributes to the broader literature on data science consulting in higher education by analysing the programme's development, institutional influences; staffing and structural model; and defining features, which may prove instructive to similar programmes at other institutions. The programme is characterised by a unique structure of undergraduate consultations led by graduate student mentorship and governance; a streamlined, multidepartmental model that facilitates scalability and sustainability; and diverse modes for undergraduate consulting—including one‐on‐one ad‐hoc data science consultations, extended data science project development and management, peer mentorship and data science workshop instruction. This case demonstrates that universities may be able to initiate a low‐stakes, small‐scale data science consulting initiative and then progressively scale up the project in collaboration with multiple departments and organisations across campus.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mediation analysis intends to unveil the underlying relationship between an outcome variable and an exposure variable through one or more intermediate variables called mediators. In recent decades, research on mediation analysis has been focusing on multivariate mediation models, where the number of mediating variables is possibly of high dimension. This paper concerns high-dimensional mediation analysis and proposes a three-step algorithm that extracts and utilizes inter-connectivity among candidate mediators. More specifically, the proposed methodology starts with a screening procedure to reduce the dimensionality of the initial set of candidate mediators, followed by a penalized regression model that incorporates both parameter- and group-wise regularization, and ends with fitting a multivariate mediation model and identifying active mediating variables through a joint significance test. To showcase the performance of the proposed algorithm, we conducted two simulation studies in high-dimensional and ultra-high-dimensional settings, respectively. Furthermore, we demonstrate the practical applications of the proposal using a real data set that uncovers the possible impact of environmental toxicants on women's gestational age at delivery through 61 biomarkers that belong to 7 biological pathways.
{"title":"Utilizing latent connectivity among mediators in high-dimensional mediation analysis","authors":"Jia Yuan Hu, Marley DeSimone, Qing Wang","doi":"10.1002/sta4.675","DOIUrl":"https://doi.org/10.1002/sta4.675","url":null,"abstract":"Mediation analysis intends to unveil the underlying relationship between an outcome variable and an exposure variable through one or more intermediate variables called mediators. In recent decades, research on mediation analysis has been focusing on multivariate mediation models, where the number of mediating variables is possibly of high dimension. This paper concerns high-dimensional mediation analysis and proposes a three-step algorithm that extracts and utilizes inter-connectivity among candidate mediators. More specifically, the proposed methodology starts with a screening procedure to reduce the dimensionality of the initial set of candidate mediators, followed by a penalized regression model that incorporates both parameter- and group-wise regularization, and ends with fitting a multivariate mediation model and identifying active mediating variables through a joint significance test. To showcase the performance of the proposed algorithm, we conducted two simulation studies in high-dimensional and ultra-high-dimensional settings, respectively. Furthermore, we demonstrate the practical applications of the proposal using a real data set that uncovers the possible impact of environmental toxicants on women's gestational age at delivery through 61 biomarkers that belong to 7 biological pathways.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SummaryFeature screening is an important tool in analysing ultrahigh‐dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact on the response variable. Detecting a sparse set of variables that have a nonlinear or nonmonotonic relationship with the response variable is still a challenging task. To fill the gap, this paper proposed a robust model‐free screening approach for right‐censored survival data by providing a new perspective of quantifying the covariate effect on the restricted mean survival time, rather than the routinely used hazard function. The proposed measure, based on the difference between the restricted mean survival time of covariate‐stratified and overall data, is able to identify comprehensive types of associations including linear, nonlinear, nonmonotone and even local dependencies like change points. The sure screening property is established, and a more flexible iterative screening procedure is developed to increase the accuracy of the variable screening. Simulation studies are carried out to demonstrate the superiority of the proposed method in selecting important features with a complex association with the response variable. The potential of applying the proposed method to handle interval‐censored failure time data has also been explored in simulations, and the results have been promising. The method is applied to a breast cancer dataset to identify potential prognostic factors, which reveals potential associations between breast cancer and lymphoma.
{"title":"High‐dimensional feature screening for nonlinear associations with survival outcome using restricted mean survival time","authors":"Yaxian Chen, Kwok Fai Lam, Zhonghua Liu","doi":"10.1002/sta4.673","DOIUrl":"https://doi.org/10.1002/sta4.673","url":null,"abstract":"SummaryFeature screening is an important tool in analysing ultrahigh‐dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact on the response variable. Detecting a sparse set of variables that have a nonlinear or nonmonotonic relationship with the response variable is still a challenging task. To fill the gap, this paper proposed a robust model‐free screening approach for right‐censored survival data by providing a new perspective of quantifying the covariate effect on the restricted mean survival time, rather than the routinely used hazard function. The proposed measure, based on the difference between the restricted mean survival time of covariate‐stratified and overall data, is able to identify comprehensive types of associations including linear, nonlinear, nonmonotone and even local dependencies like change points. The sure screening property is established, and a more flexible iterative screening procedure is developed to increase the accuracy of the variable screening. Simulation studies are carried out to demonstrate the superiority of the proposed method in selecting important features with a complex association with the response variable. The potential of applying the proposed method to handle interval‐censored failure time data has also been explored in simulations, and the results have been promising. The method is applied to a breast cancer dataset to identify potential prognostic factors, which reveals potential associations between breast cancer and lymphoma.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.
{"title":"Double verification for two‐sample covariance matrices test","authors":"Wenming Sun, Lingfeng Lyu, Xiao Guo","doi":"10.1002/sta4.670","DOIUrl":"https://doi.org/10.1002/sta4.670","url":null,"abstract":"This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel
Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.
{"title":"STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model","authors":"Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel","doi":"10.1002/sta4.671","DOIUrl":"https://doi.org/10.1002/sta4.671","url":null,"abstract":"Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.
{"title":"Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning","authors":"Wei Dong, Sanying Feng","doi":"10.1002/sta4.669","DOIUrl":"https://doi.org/10.1002/sta4.669","url":null,"abstract":"Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}