首页 > 最新文献

Stat最新文献

英文 中文
Generalised minimum moment aberration for designs with both qualitative and quantitative factors 具有定性和定量因素的设计的广义最小矩差
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-05-01 DOI: 10.1002/sta4.684
Yao Xiao, Na Zou, Hong Qin, Kang Wang
The minimum moment aberration and the minimum Lee‐moment aberration criteria are two popular conceptually simple and computationally cheap criteria for selecting good designs. However, the minimum moment aberration is suitable for qualitative factors, and the minimum Lee‐moment aberration cannot distinguish some designs with high‐level quantitative factors. In this paper, the minimum absolute‐moment aberration criterion is proposed to compare and select designs with multi‐level quantitative factors. We validate the statistical justifications of this criterion from theoretical and numerical aspects. Furthermore, we extend the minimum absolute‐moment aberration criterion into screening designs with both qualitative and quantitative factors, naming the new criterion as the minimum mixed‐moment aberration criterion. Then we utilise a numerical study to compare and evaluate the performance of some popular designs with both qualitative and quantitative factors in computer experiments.
最小力矩畸变和最小李矩畸变准则是两种常用的选择优秀设计的准则,它们概念简单,计算成本低廉。然而,最小矩差适用于定性因子,而最小李矩差则无法区分一些具有高级定量因子的设计。本文提出了最小绝对矩差准则,用于比较和选择具有多级定量因子的设计。我们从理论和数值方面验证了这一标准的统计合理性。此外,我们还将最小绝对矩差准则扩展到同时筛选定性和定量因素的设计,并将新准则命名为最小混合矩差准则。然后,我们利用数值研究,在计算机实验中比较和评估了一些同时具有定性和定量因素的流行设计的性能。
{"title":"Generalised minimum moment aberration for designs with both qualitative and quantitative factors","authors":"Yao Xiao, Na Zou, Hong Qin, Kang Wang","doi":"10.1002/sta4.684","DOIUrl":"https://doi.org/10.1002/sta4.684","url":null,"abstract":"The minimum moment aberration and the minimum Lee‐moment aberration criteria are two popular conceptually simple and computationally cheap criteria for selecting good designs. However, the minimum moment aberration is suitable for qualitative factors, and the minimum Lee‐moment aberration cannot distinguish some designs with high‐level quantitative factors. In this paper, the minimum absolute‐moment aberration criterion is proposed to compare and select designs with multi‐level quantitative factors. We validate the statistical justifications of this criterion from theoretical and numerical aspects. Furthermore, we extend the minimum absolute‐moment aberration criterion into screening designs with both qualitative and quantitative factors, naming the new criterion as the minimum mixed‐moment aberration criterion. Then we utilise a numerical study to compare and evaluate the performance of some popular designs with both qualitative and quantitative factors in computer experiments.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140833309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sparse empirical Bayes approach to high‐dimensional Gaussian process‐based varying coefficient models 基于高斯过程的高维变化系数模型的稀疏经验贝叶斯方法
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-20 DOI: 10.1002/sta4.678
Myungjin Kim, Gyuhyeong Goh
Despite the increasing importance of high‐dimensional varying coefficient models, the study of their Bayesian versions is still in its infancy. This paper contributes to the literature by developing a sparse empirical Bayes formulation that addresses the problem of high‐dimensional model selection in the framework of Bayesian varying coefficient modelling under Gaussian process (GP) priors. To break the computational bottleneck of GP‐based varying coefficient modelling, we introduce the low‐cost computation strategy that incorporates linear algebra techniques and the Laplace approximation into the evaluation of the high‐dimensional posterior model distribution. A simulation study is conducted to demonstrate the superiority of the proposed Bayesian method compared to an existing high‐dimensional varying coefficient modelling approach. In addition, its applicability to real data analysis is illustrated using yeast cell cycle data.
尽管高维变化系数模型越来越重要,但对其贝叶斯版本的研究仍处于起步阶段。本文通过开发一种稀疏经验贝叶斯公式,在高斯过程(GP)先验下的贝叶斯变化系数建模框架内解决了高维模型选择问题,为相关文献做出了贡献。为了打破基于 GP 的变化系数建模的计算瓶颈,我们引入了低成本计算策略,将线性代数技术和拉普拉斯近似纳入高维后验模型分布的评估中。我们进行了一项模拟研究,以证明与现有的高维变化系数建模方法相比,所提出的贝叶斯方法更具优势。此外,还利用酵母细胞周期数据说明了该方法在实际数据分析中的适用性。
{"title":"A sparse empirical Bayes approach to high‐dimensional Gaussian process‐based varying coefficient models","authors":"Myungjin Kim, Gyuhyeong Goh","doi":"10.1002/sta4.678","DOIUrl":"https://doi.org/10.1002/sta4.678","url":null,"abstract":"Despite the increasing importance of high‐dimensional varying coefficient models, the study of their Bayesian versions is still in its infancy. This paper contributes to the literature by developing a sparse empirical Bayes formulation that addresses the problem of high‐dimensional model selection in the framework of Bayesian varying coefficient modelling under Gaussian process (GP) priors. To break the computational bottleneck of GP‐based varying coefficient modelling, we introduce the low‐cost computation strategy that incorporates linear algebra techniques and the Laplace approximation into the evaluation of the high‐dimensional posterior model distribution. A simulation study is conducted to demonstrate the superiority of the proposed Bayesian method compared to an existing high‐dimensional varying coefficient modelling approach. In addition, its applicability to real data analysis is illustrated using yeast cell cycle data.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The value of flexible funding for collaborative biostatistics units in universities and academic medical centres 为大学和学术医学中心的合作生物统计单位提供灵活资金的价值
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-20 DOI: 10.1002/sta4.679
Emily Slade, Sarah Jane K. Robbins, Kristen J. McQuerry, Anthony A. Mangino
Collaborative biostatistics units within universities and academic medical centres operate under a wide range of different funding models; common to many of these models is the challenge of allocating time to activities that are not linked to a specific research project, such as professional development, mentorship and administrative tasks. The purpose of this paper is to describe a proposed model for ‘flexible funding’, that is, funding that is not linked to a specific research project, within a collaborative biostatistics unit and to detail the benefits and challenges associated with the proposed model. We present results from a qualitative study representing the perspectives of collaborative biostatisticians working under the proposed flexible funding model. In addition to providing examples of activities undertaken as part of time allocated to flexible funding, the qualitative results reveal several benefits of flexible funding both for a collaborative biostatistician (e.g., job satisfaction and professional development) and for the collaborative biostatistics unit as a whole (e.g., retention, process improvement, and leadership).
大学和学术医学中心内的生物统计合作单位在各种不同的资助模式下运作;其中许多模式的共同挑战是如何分配时间开展与特定研究项目无关的活动,如专业发展、指导和行政任务。本文旨在介绍一种 "灵活资助 "的建议模式,即在一个合作生物统计单位内,资助与特定研究项目无关的活动,并详细介绍与建议模式相关的益处和挑战。我们介绍了一项定性研究的结果,该研究代表了在拟议的灵活资助模式下工作的合作生物统计学家的观点。除了举例说明作为灵活资助时间分配的一部分而开展的活动外,定性研究结果还揭示了灵活资助对合作生物统计学家(如工作满意度和职业发展)和整个合作生物统计单位(如留住人才、流程改进和领导力)的若干益处。
{"title":"The value of flexible funding for collaborative biostatistics units in universities and academic medical centres","authors":"Emily Slade, Sarah Jane K. Robbins, Kristen J. McQuerry, Anthony A. Mangino","doi":"10.1002/sta4.679","DOIUrl":"https://doi.org/10.1002/sta4.679","url":null,"abstract":"Collaborative biostatistics units within universities and academic medical centres operate under a wide range of different funding models; common to many of these models is the challenge of allocating time to activities that are not linked to a specific research project, such as professional development, mentorship and administrative tasks. The purpose of this paper is to describe a proposed model for ‘flexible funding’, that is, funding that is not linked to a specific research project, within a collaborative biostatistics unit and to detail the benefits and challenges associated with the proposed model. We present results from a qualitative study representing the perspectives of collaborative biostatisticians working under the proposed flexible funding model. In addition to providing examples of activities undertaken as part of time allocated to flexible funding, the qualitative results reveal several benefits of flexible funding both for a collaborative biostatistician (e.g., job satisfaction and professional development) and for the collaborative biostatistics unit as a whole (e.g., retention, process improvement, and leadership).","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140627162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using extreme value theory to evaluate the leading pedestrian interval road safety intervention 利用极值理论评估领先的行人间隔道路安全干预措施
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-18 DOI: 10.1002/sta4.676
Nicola Hewett, Lee Fawcett, Andrew Golightly, Neil Thorpe
Improving road safety is hugely important with the number of deaths on the world's roads remaining unacceptably high; an estimated 1.3 million people die each year as a result of road traffic collisions. Current practice for treating collision hotspots is almost always reactive: once a threshold level of collisions has been overtopped during some pre‐determined observation period, treatment is applied (e.g., road safety cameras). Traffic collisions are rare, so prolonged observation periods are necessary. However, traffic conflicts are more frequent and are a margin of the social cost; hence, traffic conflict before/after studies can be conducted over shorter time periods. We investigate the effect of implementing the leading pedestrian interval treatment at signalised intersections as a safety intervention in a city in north America. Pedestrian‐vehicle traffic conflict data were collected from treatment and control sites during the before and after periods. We implement a before/after study on post‐encroachment times (PETs) where small PET values denote ‘near‐misses’. Hence, extreme value theory is employed to model extremes of our PET processes, with adjustments to the usual modelling framework to account for temporal dependence and treatment effects.
改善道路安全极为重要,因为全球道路上的死亡人数仍然高得令人无法接受;据估计,每年有 130 万人死于道路交通碰撞事故。目前处理碰撞热点的做法几乎总是被动的:一旦在某个预先确定的观察期内碰撞次数超过了临界值,就会采取相应的处理措施(如道路安全摄像机)。交通碰撞很少发生,因此有必要延长观察期。然而,交通冲突较为频繁,是社会成本的一个边际;因此,交通冲突前后的研究可以在较短的时间段内进行。我们在美国北部的一个城市调查了在信号灯控制的交叉路口实施领先行人间隔处理作为安全干预措施的效果。在实施前后,我们分别从实施地点和对照地点收集了行人与车辆交通冲突的数据。我们对蚕食后时间(PET)进行了前后研究,其中较小的 PET 值表示 "近乎失误"。因此,我们采用极值理论对 PET 过程的极值进行建模,并对通常的建模框架进行调整,以考虑时间依赖性和处理效果。
{"title":"Using extreme value theory to evaluate the leading pedestrian interval road safety intervention","authors":"Nicola Hewett, Lee Fawcett, Andrew Golightly, Neil Thorpe","doi":"10.1002/sta4.676","DOIUrl":"https://doi.org/10.1002/sta4.676","url":null,"abstract":"Improving road safety is hugely important with the number of deaths on the world's roads remaining unacceptably high; an estimated 1.3 million people die each year as a result of road traffic collisions. Current practice for treating collision hotspots is almost always reactive: once a threshold level of collisions has been overtopped during some pre‐determined observation period, treatment is applied (e.g., road safety cameras). Traffic collisions are rare, so prolonged observation periods are necessary. However, traffic <jats:italic>conflicts</jats:italic> are more frequent and are a margin of the social cost; hence, traffic conflict before/after studies can be conducted over shorter time periods. We investigate the effect of implementing the leading pedestrian interval treatment at signalised intersections as a safety intervention in a city in north America. Pedestrian‐vehicle traffic conflict data were collected from treatment and control sites during the before and after periods. We implement a before/after study on post‐encroachment times (PETs) where small PET values denote ‘near‐misses’. Hence, extreme value theory is employed to model extremes of our PET processes, with adjustments to the usual modelling framework to account for temporal dependence and treatment effects.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140626894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The data science discovery program: A model for data science consulting in higher education 数据科学发现计划:高等教育数据科学咨询模式
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-18 DOI: 10.1002/sta4.677
C. Taylor Brown, Megan Mehta, Mahathi Ryali, Xiaoran Dong, Iliya Shadfar, Jacqueline Dominquez Davalos, Aaron Culich, Anthony Suen
As one of the largest data science research incubator initiatives in the country, the University of California, Berkeley's Data Science Discovery Program serves as a case study for a scalable and sustainable model of data science consulting in higher education. This case contributes to the broader literature on data science consulting in higher education by analysing the programme's development, institutional influences; staffing and structural model; and defining features, which may prove instructive to similar programmes at other institutions. The programme is characterised by a unique structure of undergraduate consultations led by graduate student mentorship and governance; a streamlined, multidepartmental model that facilitates scalability and sustainability; and diverse modes for undergraduate consulting—including one‐on‐one ad‐hoc data science consultations, extended data science project development and management, peer mentorship and data science workshop instruction. This case demonstrates that universities may be able to initiate a low‐stakes, small‐scale data science consulting initiative and then progressively scale up the project in collaboration with multiple departments and organisations across campus.
作为美国最大的数据科学研究孵化器计划之一,加州大学伯克利分校的数据科学发现计划是高等教育中可扩展、可持续的数据科学咨询模式的案例研究。本案例通过分析该计划的发展、机构影响、人员配备和结构模式,以及可能对其他机构的类似计划具有指导意义的定义特征,为更广泛的高等教育数据科学咨询文献做出了贡献。该计划的特点包括:由研究生指导和管理领导的本科生咨询的独特结构;有利于可扩展性和可持续性的精简的多部门模式;本科生咨询的多样化模式--包括一对一的临时数据科学咨询、扩展的数据科学项目开发和管理、同行指导和数据科学研讨会指导。这个案例表明,大学可以启动一个低风险、小规模的数据科学咨询项目,然后与校园内的多个部门和组织合作,逐步扩大项目规模。
{"title":"The data science discovery program: A model for data science consulting in higher education","authors":"C. Taylor Brown, Megan Mehta, Mahathi Ryali, Xiaoran Dong, Iliya Shadfar, Jacqueline Dominquez Davalos, Aaron Culich, Anthony Suen","doi":"10.1002/sta4.677","DOIUrl":"https://doi.org/10.1002/sta4.677","url":null,"abstract":"As one of the largest data science research incubator initiatives in the country, the University of California, Berkeley's Data Science Discovery Program serves as a case study for a scalable and sustainable model of data science consulting in higher education. This case contributes to the broader literature on data science consulting in higher education by analysing the programme's development, institutional influences; staffing and structural model; and defining features, which may prove instructive to similar programmes at other institutions. The programme is characterised by a unique structure of undergraduate consultations led by graduate student mentorship and governance; a streamlined, multidepartmental model that facilitates scalability and sustainability; and diverse modes for undergraduate consulting—including one‐on‐one ad‐hoc data science consultations, extended data science project development and management, peer mentorship and data science workshop instruction. This case demonstrates that universities may be able to initiate a low‐stakes, small‐scale data science consulting initiative and then progressively scale up the project in collaboration with multiple departments and organisations across campus.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilizing latent connectivity among mediators in high-dimensional mediation analysis 在高维中介分析中利用中介人之间的潜在关联性
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-16 DOI: 10.1002/sta4.675
Jia Yuan Hu, Marley DeSimone, Qing Wang
Mediation analysis intends to unveil the underlying relationship between an outcome variable and an exposure variable through one or more intermediate variables called mediators. In recent decades, research on mediation analysis has been focusing on multivariate mediation models, where the number of mediating variables is possibly of high dimension. This paper concerns high-dimensional mediation analysis and proposes a three-step algorithm that extracts and utilizes inter-connectivity among candidate mediators. More specifically, the proposed methodology starts with a screening procedure to reduce the dimensionality of the initial set of candidate mediators, followed by a penalized regression model that incorporates both parameter- and group-wise regularization, and ends with fitting a multivariate mediation model and identifying active mediating variables through a joint significance test. To showcase the performance of the proposed algorithm, we conducted two simulation studies in high-dimensional and ultra-high-dimensional settings, respectively. Furthermore, we demonstrate the practical applications of the proposal using a real data set that uncovers the possible impact of environmental toxicants on women's gestational age at delivery through 61 biomarkers that belong to 7 biological pathways.
中介分析旨在通过一个或多个被称为中介变量的中间变量,揭示结果变量与暴露变量之间的内在关系。近几十年来,中介分析的研究主要集中在多变量中介模型上,中介变量的数量可能是高维的。本文关注高维中介分析,并提出了一种三步算法,用于提取和利用候选中介变量之间的相互联系。更具体地说,所提出的方法首先是筛选程序,以降低初始候选中介变量集的维度,然后是包含参数正则化和分组正则化的惩罚回归模型,最后是拟合多元中介模型,并通过联合显著性检验确定活跃的中介变量。为了展示所提算法的性能,我们分别在高维和超高维环境下进行了两次模拟研究。此外,我们还利用一个真实数据集展示了该建议的实际应用,该数据集通过隶属于 7 条生物通路的 61 个生物标志物揭示了环境毒物对妇女分娩时胎龄的可能影响。
{"title":"Utilizing latent connectivity among mediators in high-dimensional mediation analysis","authors":"Jia Yuan Hu, Marley DeSimone, Qing Wang","doi":"10.1002/sta4.675","DOIUrl":"https://doi.org/10.1002/sta4.675","url":null,"abstract":"Mediation analysis intends to unveil the underlying relationship between an outcome variable and an exposure variable through one or more intermediate variables called mediators. In recent decades, research on mediation analysis has been focusing on multivariate mediation models, where the number of mediating variables is possibly of high dimension. This paper concerns high-dimensional mediation analysis and proposes a three-step algorithm that extracts and utilizes inter-connectivity among candidate mediators. More specifically, the proposed methodology starts with a screening procedure to reduce the dimensionality of the initial set of candidate mediators, followed by a penalized regression model that incorporates both parameter- and group-wise regularization, and ends with fitting a multivariate mediation model and identifying active mediating variables through a joint significance test. To showcase the performance of the proposed algorithm, we conducted two simulation studies in high-dimensional and ultra-high-dimensional settings, respectively. Furthermore, we demonstrate the practical applications of the proposal using a real data set that uncovers the possible impact of environmental toxicants on women's gestational age at delivery through 61 biomarkers that belong to 7 biological pathways.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High‐dimensional feature screening for nonlinear associations with survival outcome using restricted mean survival time 利用受限平均存活时间对与存活结果非线性关联的高维特征进行筛选
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-07 DOI: 10.1002/sta4.673
Yaxian Chen, Kwok Fai Lam, Zhonghua Liu
SummaryFeature screening is an important tool in analysing ultrahigh‐dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact on the response variable. Detecting a sparse set of variables that have a nonlinear or nonmonotonic relationship with the response variable is still a challenging task. To fill the gap, this paper proposed a robust model‐free screening approach for right‐censored survival data by providing a new perspective of quantifying the covariate effect on the restricted mean survival time, rather than the routinely used hazard function. The proposed measure, based on the difference between the restricted mean survival time of covariate‐stratified and overall data, is able to identify comprehensive types of associations including linear, nonlinear, nonmonotone and even local dependencies like change points. The sure screening property is established, and a more flexible iterative screening procedure is developed to increase the accuracy of the variable screening. Simulation studies are carried out to demonstrate the superiority of the proposed method in selecting important features with a complex association with the response variable. The potential of applying the proposed method to handle interval‐censored failure time data has also been explored in simulations, and the results have been promising. The method is applied to a breast cancer dataset to identify potential prognostic factors, which reveals potential associations between breast cancer and lymphoma.
摘要特征筛选是分析超高维数据的重要工具,尤其是在分子生物学和肿瘤学研究领域。然而,大多数注意力都集中在识别对响应变量有线性或单调影响的特征上。检测与响应变量具有非线性或非单调关系的稀疏变量集仍然是一项具有挑战性的任务。为了填补这一空白,本文提出了一种针对右删失生存数据的稳健无模型筛选方法,提供了一个量化协变量对受限平均生存时间影响的新视角,而不是常规使用的危险函数。所提出的测量方法基于协变量分层的受限平均生存时间与整体数据之间的差异,能够识别包括线性、非线性、非单调甚至局部依赖性(如变化点)在内的各种类型的关联。建立了确定的筛选属性,并开发了更灵活的迭代筛选程序,以提高变量筛选的准确性。通过模拟研究,证明了所提方法在筛选与响应变量有复杂关联的重要特征方面的优越性。模拟研究还探讨了应用所提方法处理间隔删失失效时间数据的潜力,结果令人鼓舞。该方法被应用于乳腺癌数据集,以确定潜在的预后因素,从而揭示乳腺癌和淋巴瘤之间的潜在关联。
{"title":"High‐dimensional feature screening for nonlinear associations with survival outcome using restricted mean survival time","authors":"Yaxian Chen, Kwok Fai Lam, Zhonghua Liu","doi":"10.1002/sta4.673","DOIUrl":"https://doi.org/10.1002/sta4.673","url":null,"abstract":"SummaryFeature screening is an important tool in analysing ultrahigh‐dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact on the response variable. Detecting a sparse set of variables that have a nonlinear or nonmonotonic relationship with the response variable is still a challenging task. To fill the gap, this paper proposed a robust model‐free screening approach for right‐censored survival data by providing a new perspective of quantifying the covariate effect on the restricted mean survival time, rather than the routinely used hazard function. The proposed measure, based on the difference between the restricted mean survival time of covariate‐stratified and overall data, is able to identify comprehensive types of associations including linear, nonlinear, nonmonotone and even local dependencies like change points. The sure screening property is established, and a more flexible iterative screening procedure is developed to increase the accuracy of the variable screening. Simulation studies are carried out to demonstrate the superiority of the proposed method in selecting important features with a complex association with the response variable. The potential of applying the proposed method to handle interval‐censored failure time data has also been explored in simulations, and the results have been promising. The method is applied to a breast cancer dataset to identify potential prognostic factors, which reveals potential associations between breast cancer and lymphoma.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Double verification for two‐sample covariance matrices test 双样本协方差矩阵检验的双重验证
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-07 DOI: 10.1002/sta4.670
Wenming Sun, Lingfeng Lyu, Xiao Guo
This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.
本文探讨在高维环境下测试两个协方差矩阵的相等性。现有的测试统计量通常基于弗罗贝尼斯平方准则或元素最大准则构建。然而,前者在处理稀疏替代方案时可能会出现功率损失,而后者在处理密集替代方案时可能会表现不佳。在本文中,我们采用了一种新颖的框架,引入了一种双重验证检验统计量,旨在对密集和稀疏替代方案都具有强大的检验能力。此外,我们还提出了一种自适应权重测试统计量,以增强其威力。此外,我们还分析了所提检验的渐近规模和功率。仿真结果表明,我们提出的方法性能令人满意。
{"title":"Double verification for two‐sample covariance matrices test","authors":"Wenming Sun, Lingfeng Lyu, Xiao Guo","doi":"10.1002/sta4.670","DOIUrl":"https://doi.org/10.1002/sta4.670","url":null,"abstract":"This paper explores testing the equality of two covariance matrices under high‐dimensional settings. Existing test statistics are usually constructed based on the squared Frobenius norm or the elementwise maximum norm. However, the former may experience power loss when handling sparse alternatives, while the latter may have a poor performance against dense alternatives. In this paper, with a novel framework, we introduce a double verification test statistic designed to be powerful against both dense and sparse alternatives. Additionally, we propose an adaptive weight test statistic to enhance power. Furthermore, we present an analysis of the asymptotic size and power of the proposed test. Simulation results demonstrate the satisfactory performance of our proposed method.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model STAR:利用 "易受攻击者--攻击者--被攻击者--被移除者 "模型在图结构上传播创新成果
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-05 DOI: 10.1002/sta4.671
Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel
Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.
产品、服务或理念等新创新的采用通常是由点对点的社会互动和外部影响共同推动的。社交图谱通常用于有效地模拟点对点互动,即新采用者影响其同伴也采用创新。然而,采用创新的影响也可能通过与采用者关系密切的个体传播,这些个体被称为 "吹捧者",他们只分享有关创新的信息。我们扩展了一个非均质泊松过程模型,将外部影响和同伴间影响都考虑在内,并加入了一个可选的 "吹捧 "阶段,我们将这一扩展称为 "易受影响者--吹捧者--被吹捧者"(STAR)模型。在一项广泛的模拟研究中,我们发现所提出的模型是稳定的、可识别的,并能在出现 "吐槽 "的情况下准确识别 "吐槽"。此外,通过模拟,我们还证明了 STAR 模型的推理和预测对社交图中的缺失边(这是真实世界数据中常见的情况)具有很强的鲁棒性。模拟和理论分析表明,当边缘缺失时,STAR 模型能够准确估计外部和内部影响源所占的份额。此外,STAR 模型还可用于改进外部参数和病毒参数的推断以及后续预测,即使 "告密 "并不是真实数据生成机制的一部分。
{"title":"STAR: Spread of innovations on graph structures with the Susceptible‐Tattler‐Adopter‐Removed model","authors":"Riccardo Parviero, Kristoffer H. Hellton, Geoffrey Canright, Ida Scheel","doi":"10.1002/sta4.671","DOIUrl":"https://doi.org/10.1002/sta4.671","url":null,"abstract":"Adoptions of a new innovation such as a product, service or idea are typically driven both by peer‐to‐peer social interactions and by external influence. Social graphs are usually used to efficiently model the peer‐to‐peer interactions, where new adopters influence their peers to also adopt the innovation. However, the influence to adopt may also spread through individuals close to the adopters, known as tattlers, who only share information regarding the innovation. We extend an inhomogeneous Poisson process model accounting for both external and peer‐to‐peer influence to include an optional tattling stage, and we term the extension the Susceptible‐Tattler‐Adopter‐Removed (STAR) model. In an extensive simulation study, the proposed model is shown to be stable and identifiable and to accurately identify tattling when present. Further, using simulations, we show that both inference and prediction of the STAR model are quite robust against missing edges in the social graph, a common situation in real‐world data. Simulations and theoretical considerations demonstrate that, when edges are missing, the STAR model is able to accurately estimate the shares attributed to the external and internal sources of influence. Furthermore, the STAR model may be used to improve the inference of the external and viral parameters and subsequent predictions even when tattling is not part of the real data‐generating mechanism.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning 用于超高维分散联合学习的网络交替方向乘法
IF 1.7 4区 数学 Q4 Mathematics Pub Date : 2024-04-05 DOI: 10.1002/sta4.669
Wei Dong, Sanying Feng
Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.
近年来,超高维数据分析取得了巨大成就。当数据存储在多个客户端,而客户端之间只能通过网络结构进行连接时,超高维分析的实现在数值上可能具有挑战性,甚至是不可行的。在这项工作中,我们研究了用于超高维数据分析的分散式联合学习,即通过大量设备估算相关参数,而无需通过网络结构共享数据。在本地机器中,每个并行运行梯度上升,通过稀疏性限制约束方法获得估计值。此外,我们还通过交替方向乘法(ADMM)聚合每台机器的信息,利用不同机器间的凹对融合惩罚,通过网络结构获得全局模型。所提出的方法可以降低传统机器学习的隐私风险,恢复稀疏性,并同时提供所有回归系数的估计值。在温和的条件下,我们展示了我们方法的收敛性和估计一致性。模拟和真实数据实例都证明了该方法的良好性能。
{"title":"Network alternating direction method of multipliers for ultrahigh‐dimensional decentralised federated learning","authors":"Wei Dong, Sanying Feng","doi":"10.1002/sta4.669","DOIUrl":"https://doi.org/10.1002/sta4.669","url":null,"abstract":"Ultrahigh‐dimensional data analysis has received great achievement in recent years. When the data are stored in multiple clients and the clients can be connected only with each other through a network structure, the implementation of ultrahigh‐dimensional analysis can be numerically challenging or even infeasible. In this work, we study decentralised federated learning for ultrahigh‐dimensional data analysis, where the parameters of interest are estimated via a large amount of devices without data sharing by a network structure. In the local machines, each parallel runs gradient ascent to obtain estimators via the sparsity‐restricted constrained methods. Also, we obtain a global model by aggregating each machine's information via an alternating direction method of multipliers (ADMM) using a concave pairwise fusion penalty between different machines through a network structure. The proposed method can mitigate privacy risks from traditional machine learning, recover the sparsity and provide estimates of all regression coefficients simultaneously. Under mild conditions, we show the convergence and estimation consistency of our method. The promising performance of the method is supported by both simulated and real data examples.","PeriodicalId":56159,"journal":{"name":"Stat","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Stat
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1