Predicting factors and top gene identification for survival data of breast cancer

IF 1.1 Q4 BIOPHYSICS AIMS Biophysics Pub Date : 2023-01-01 DOI:10.3934/biophy.2023006
Sarada Ghosh, Guruprasad Samanta, Manuel De la Sen
{"title":"Predicting factors and top gene identification for survival data of breast cancer","authors":"Sarada Ghosh, Guruprasad Samanta, Manuel De la Sen","doi":"10.3934/biophy.2023006","DOIUrl":null,"url":null,"abstract":"For high-throughput research with biological data-sets generated sequentially or by transcriptional micro-arrays, proteomics or other means, analytic techniques that address their high dimensional aspects remain desirable. The computation part basically predicts the tendency towards mortality due to breast cancer (BC) by using several classification methods, i.e., Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Decision Tree (DT), and compared the models' performances. We proceed with the RF method since it provides better results than any other underlying models based on accuracy. We have also demonstrated some traditional and competing risk models, illustrated the models with real data analysis, depicted their curves' natures and also compared their fits using prediction error curves and the concordance index. Furthermore, two different survival splitting rules are used by using separate Random Survival Forest (RSF) methods and also constructing the ranking of risk factors due to breast cancer. The results show that high-level grade and diameter are the most important predictors for mortality progression in the presence of competing events of death, and lymph nodes, age and angiography are other vital criteria for this purpose. We have also implemented RSF backward selection criteria, which enables top gene selection related to mortality progression due to breast cancer. This method identifies c-MYB, CDCA7, NUSAP1, BIRC5, ANGPTL4, JAG1, IL6ST, and remaining genes that are mainly responsible for mortality progression due to breast cancer. In this work, R software is used to obtain and evaluate the results.","PeriodicalId":7529,"journal":{"name":"AIMS Biophysics","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/biophy.2023006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

For high-throughput research with biological data-sets generated sequentially or by transcriptional micro-arrays, proteomics or other means, analytic techniques that address their high dimensional aspects remain desirable. The computation part basically predicts the tendency towards mortality due to breast cancer (BC) by using several classification methods, i.e., Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) and Decision Tree (DT), and compared the models' performances. We proceed with the RF method since it provides better results than any other underlying models based on accuracy. We have also demonstrated some traditional and competing risk models, illustrated the models with real data analysis, depicted their curves' natures and also compared their fits using prediction error curves and the concordance index. Furthermore, two different survival splitting rules are used by using separate Random Survival Forest (RSF) methods and also constructing the ranking of risk factors due to breast cancer. The results show that high-level grade and diameter are the most important predictors for mortality progression in the presence of competing events of death, and lymph nodes, age and angiography are other vital criteria for this purpose. We have also implemented RSF backward selection criteria, which enables top gene selection related to mortality progression due to breast cancer. This method identifies c-MYB, CDCA7, NUSAP1, BIRC5, ANGPTL4, JAG1, IL6ST, and remaining genes that are mainly responsible for mortality progression due to breast cancer. In this work, R software is used to obtain and evaluate the results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
乳腺癌生存数据的预测因素及顶级基因鉴定
对于使用顺序生成或通过转录微阵列、蛋白质组学或其他手段生成的生物数据集进行高通量研究,解决其高维方面的分析技术仍然是可取的。计算部分采用Logistic回归(LR)、随机森林(RF)、支持向量机(SVM)、线性判别分析(LDA)和决策树(DT)等几种分类方法,对乳腺癌(BC)死亡率趋势进行基本预测,并对模型性能进行比较。我们继续使用RF方法,因为它提供了比任何其他基于准确性的底层模型更好的结果。我们还展示了一些传统的和竞争的风险模型,用实际数据分析说明了这些模型,描述了它们的曲线性质,并使用预测误差曲线和一致性指数比较了它们的拟合。此外,通过使用不同的随机生存森林(RSF)方法,并构建乳腺癌危险因素排序,采用了两种不同的生存分裂规则。结果表明,在存在竞争性死亡事件的情况下,高级别和直径是死亡率进展的最重要预测因素,淋巴结、年龄和血管造影是其他重要标准。我们还实施了RSF向后选择标准,使与乳腺癌死亡率进展相关的顶级基因选择成为可能。该方法鉴定了c-MYB、CDCA7、NUSAP1、BIRC5、ANGPTL4、JAG1、IL6ST以及其他主要负责乳腺癌死亡率进展的基因。在这项工作中,使用R软件来获取和评估结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
AIMS Biophysics
AIMS Biophysics BIOPHYSICS-
CiteScore
2.40
自引率
20.00%
发文量
16
审稿时长
8 weeks
期刊介绍: AIMS Biophysics is an international Open Access journal devoted to publishing peer-reviewed, high quality, original papers in the field of biophysics. We publish the following article types: original research articles, reviews, editorials, letters, and conference reports. AIMS Biophysics welcomes, but not limited to, the papers from the following topics: · Structural biology · Biophysical technology · Bioenergetics · Membrane biophysics · Cellular Biophysics · Electrophysiology · Neuro-Biophysics · Biomechanics · Systems biology
期刊最新文献
Endoplasmic reticulum localization of phosphoinositide specific phospholipase C enzymes in U73122 cultured human osteoblasts Identification of potential SARS-CoV-2 papain-like protease inhibitors with the ability to interact with the catalytic triad Predicting factors and top gene identification for survival data of breast cancer A review of molecular biology detection methods for human adenovirus Natural bond orbital analysis of dication magnesium complexes [Mg(H2O)6]2+ and [[Mg(H2O)6](H2O)n]2+; n=1-4
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1