高维生存数据中有影响的观测值的检测

P. Divya, S. Suresh
{"title":"高维生存数据中有影响的观测值的检测","authors":"P. Divya, S. Suresh","doi":"10.1080/23737484.2023.2266404","DOIUrl":null,"url":null,"abstract":"AbstractSurvival analysis is a statistical technique mainly used to analyze time-to-event data. Identification of influential observation attains greater importance since it leads to discovering new prognostic factors. Influential observation in survival typically points to individuals whose survival time is extremely short or long in comparison to others. Particularly, when the data possess more covariates than the observations, all classical approaches fail to perform. Hence, dimensionality reduction is necessary for choosing appropriate variables and it has been done by popular techniques such as LASSO and elastic net algorithm. This paper consider high-dimensional breast cancer data, and its dimensionality is reduced using variable selection methods. Subsequently, the rank product test and martingale residuals are used to identify an influential observation. Furthermore, a resampling technique is used to validate the consistency and robustness of the methods. The novelty of this paper lies in comparing the prediction accuracy of datasets with and without outliers using Random Survival Forest (RSF) for different training fractions. Comparatively, the RSF result demonstrates that the LASSO approach outperform others in the absence of outliers. Therefore, we suggest reducing dimensionality using the LASSO variable selection technique first, followed by removing likely outliers to improve the performance of classification algorithms.KEYWORDS: Survival analysisvariable selection methodsmartingale residualsrank product testrandom survival forest Disclosure statementThe authors declare that there is no conflict of interest regarding the publication of this paper.","PeriodicalId":36561,"journal":{"name":"Communications in Statistics Case Studies Data Analysis and Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of influential observations in high-dimensional survival data\",\"authors\":\"P. Divya, S. Suresh\",\"doi\":\"10.1080/23737484.2023.2266404\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractSurvival analysis is a statistical technique mainly used to analyze time-to-event data. Identification of influential observation attains greater importance since it leads to discovering new prognostic factors. Influential observation in survival typically points to individuals whose survival time is extremely short or long in comparison to others. Particularly, when the data possess more covariates than the observations, all classical approaches fail to perform. Hence, dimensionality reduction is necessary for choosing appropriate variables and it has been done by popular techniques such as LASSO and elastic net algorithm. This paper consider high-dimensional breast cancer data, and its dimensionality is reduced using variable selection methods. Subsequently, the rank product test and martingale residuals are used to identify an influential observation. Furthermore, a resampling technique is used to validate the consistency and robustness of the methods. The novelty of this paper lies in comparing the prediction accuracy of datasets with and without outliers using Random Survival Forest (RSF) for different training fractions. Comparatively, the RSF result demonstrates that the LASSO approach outperform others in the absence of outliers. Therefore, we suggest reducing dimensionality using the LASSO variable selection technique first, followed by removing likely outliers to improve the performance of classification algorithms.KEYWORDS: Survival analysisvariable selection methodsmartingale residualsrank product testrandom survival forest Disclosure statementThe authors declare that there is no conflict of interest regarding the publication of this paper.\",\"PeriodicalId\":36561,\"journal\":{\"name\":\"Communications in Statistics Case Studies Data Analysis and Applications\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications in Statistics Case Studies Data Analysis and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/23737484.2023.2266404\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Statistics Case Studies Data Analysis and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/23737484.2023.2266404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

摘要

摘要生存分析是一种主要用于分析时间事件数据的统计技术。确定有影响的观测结果更为重要,因为它会导致发现新的预后因素。对生存有影响的观察通常指出,与他人相比,个体的生存时间极短或极长。特别是,当数据具有比观测值更多的协变量时,所有经典方法都无法执行。因此,为了选择合适的变量,降维是必要的,LASSO和弹性网络算法等流行的技术已经完成了降维。本文考虑高维乳腺癌数据,采用变量选择方法对其进行降维。随后,使用秩积检验和鞅残差来确定有影响的观测值。此外,采用重采样技术验证了方法的一致性和鲁棒性。本文的新颖之处在于比较了使用随机生存森林(RSF)对不同训练分数具有和不具有异常值的数据集的预测精度。相比之下,RSF结果表明LASSO方法在没有异常值的情况下优于其他方法。因此,我们建议首先使用LASSO变量选择技术降低维数,然后去除可能的异常值,以提高分类算法的性能。关键词:生存分析;变量选择方法;残差;
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Detection of influential observations in high-dimensional survival data
AbstractSurvival analysis is a statistical technique mainly used to analyze time-to-event data. Identification of influential observation attains greater importance since it leads to discovering new prognostic factors. Influential observation in survival typically points to individuals whose survival time is extremely short or long in comparison to others. Particularly, when the data possess more covariates than the observations, all classical approaches fail to perform. Hence, dimensionality reduction is necessary for choosing appropriate variables and it has been done by popular techniques such as LASSO and elastic net algorithm. This paper consider high-dimensional breast cancer data, and its dimensionality is reduced using variable selection methods. Subsequently, the rank product test and martingale residuals are used to identify an influential observation. Furthermore, a resampling technique is used to validate the consistency and robustness of the methods. The novelty of this paper lies in comparing the prediction accuracy of datasets with and without outliers using Random Survival Forest (RSF) for different training fractions. Comparatively, the RSF result demonstrates that the LASSO approach outperform others in the absence of outliers. Therefore, we suggest reducing dimensionality using the LASSO variable selection technique first, followed by removing likely outliers to improve the performance of classification algorithms.KEYWORDS: Survival analysisvariable selection methodsmartingale residualsrank product testrandom survival forest Disclosure statementThe authors declare that there is no conflict of interest regarding the publication of this paper.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.00
自引率
0.00%
发文量
29
期刊最新文献
The reciprocal elastic net Detection of influential observations in high-dimensional survival data Small area estimation of trends in household living standards in Uganda using a GMANOVA-MANOVA model and repeated surveys Applications of a new loss and cost-based process capability index to electronic industries A methodological framework for imputing missing spatial data at an aggregate level and guaranteeing data privacy: the AFFINITY method; implementation in the context of the official spatial Greek census data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1