Model free feature screening for large scale and ultrahigh dimensional survival data

IF 0.6 4区 数学 Q3 STATISTICS & PROBABILITY Annals of the Institute of Statistical Mathematics Pub Date : 2024-10-19 DOI:10.1007/s10463-024-00912-x
Yingli Pan, Haoyu Wang, Zhan Liu
{"title":"Model free feature screening for large scale and ultrahigh dimensional survival data","authors":"Yingli Pan,&nbsp;Haoyu Wang,&nbsp;Zhan Liu","doi":"10.1007/s10463-024-00912-x","DOIUrl":null,"url":null,"abstract":"<div><p>This paper provides a novel perspective on feature screening in the analysis of high-dimensional right-censored large-<i>p</i>-large-<i>N</i> survival data. The research introduces a distributed feature screening method known as Aggregated Distance Correlation Screening (ADCS). The proposed screening framework involves expressing the distance correlation measure as a function of multiple component parameters, each of which can be estimated in a distributed manner using a natural U-statistic from data segments. By aggregating the component estimates, a final correlation estimate is obtained, facilitating feature screening. Importantly, this approach does not necessitate any specific model specification for responses or predictors and is effective with heavy-tailed data. The study establishes the consistency of the proposed aggregated correlation estimator <span>\\(\\widetilde{\\omega }_{j}\\)</span> under mild conditions and demonstrates the sure screening property of the ADCS. Empirical results from both simulated and real datasets confirm the efficacy and practicality of the ADCS approach proposed in this paper.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"77 1","pages":"155 - 190"},"PeriodicalIF":0.6000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the Institute of Statistical Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10463-024-00912-x","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

This paper provides a novel perspective on feature screening in the analysis of high-dimensional right-censored large-p-large-N survival data. The research introduces a distributed feature screening method known as Aggregated Distance Correlation Screening (ADCS). The proposed screening framework involves expressing the distance correlation measure as a function of multiple component parameters, each of which can be estimated in a distributed manner using a natural U-statistic from data segments. By aggregating the component estimates, a final correlation estimate is obtained, facilitating feature screening. Importantly, this approach does not necessitate any specific model specification for responses or predictors and is effective with heavy-tailed data. The study establishes the consistency of the proposed aggregated correlation estimator \(\widetilde{\omega }_{j}\) under mild conditions and demonstrates the sure screening property of the ADCS. Empirical results from both simulated and real datasets confirm the efficacy and practicality of the ADCS approach proposed in this paper.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大规模和超高维生存数据的无模型特征筛选
本文为高维右删大p-大n存活数据分析中的特征筛选提供了一个新的视角。本研究引入了一种分布式特征筛选方法——聚合距离相关筛选(ADCS)。提出的筛选框架包括将距离相关度量表示为多个组件参数的函数,每个组件参数都可以使用数据段的自然u统计量以分布式方式进行估计。通过汇总分量估计,得到最终的相关性估计,便于特征筛选。重要的是,这种方法不需要任何特定的模型规范来响应或预测,并且对重尾数据有效。研究建立了所提出的聚合相关估计器\(\widetilde{\omega }_{j}\)在温和条件下的一致性,证明了ADCS具有可靠的筛选性能。模拟和真实数据集的实证结果证实了本文提出的ADCS方法的有效性和实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.00
自引率
0.00%
发文量
39
审稿时长
6-12 weeks
期刊介绍: Annals of the Institute of Statistical Mathematics (AISM) aims to provide a forum for open communication among statisticians, and to contribute to the advancement of statistics as a science to enable humans to handle information in order to cope with uncertainties. It publishes high-quality papers that shed new light on the theoretical, computational and/or methodological aspects of statistical science. Emphasis is placed on (a) development of new methodologies motivated by real data, (b) development of unifying theories, and (c) analysis and improvement of existing methodologies and theories.
期刊最新文献
Discussion of “Mode-based estimation of the center of symmetry” Mode-based estimation of the center of symmetry Discussion of “Mode-based estimation of the center of symmetry” Rejoinder to the discussion of “Mode-based estimation of the center of symmetry” Central limit theorems for vector-valued composite functionals with smoothing and applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1