乳腺癌基因表达的特征选择与分类方法

IF 1.1 Q4 BIOPHYSICS AIMS Biophysics Pub Date : 2021-01-01 DOI:10.3934/biophy.2021029
Sarada Ghosh, Guruprasad Samanta, M. de La Sen
{"title":"乳腺癌基因表达的特征选择与分类方法","authors":"Sarada Ghosh, Guruprasad Samanta, M. de La Sen","doi":"10.3934/biophy.2021029","DOIUrl":null,"url":null,"abstract":"DNA microarray technology with biological data-set can monitor the expression levels of thousands of genes simultaneously. Microarray data analysis is important in phenotype classification of diseases. In this work, the computational part basically predicts the tendency towards mortality using different classification techniques by identifying features from the high dimensional dataset. We have analyzed the breast cancer transcriptional genomic data of 1554 transcripts captured over from 272 samples. This work presents effective methods for gene classification using Logistic Regression (LR), Random Forest (RF), Decision Tree (DT) and constructs a classifier with an upgraded rate of accuracy than all features together. The performance of these underlying methods are also compared with dimension reduction method, namely, Principal Component Analysis (PCA). The methods of feature reduction with RF, LR and decision tree (DT) provide better performance than PCA. It is observed that both techniques LR and RF identify TYMP, ERS1, C-MYB and TUBA1a genes. But some features corresponding to the genes such as ARID4B, DNMT3A, TOX3, RGS17 and PNLIP are uniquely pointed out by LR method which are leading to a significant role in breast cancer. The simulation is based on R-software.","PeriodicalId":7529,"journal":{"name":"AIMS Biophysics","volume":"1 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature selection and classification approaches in gene expression of breast cancer\",\"authors\":\"Sarada Ghosh, Guruprasad Samanta, M. de La Sen\",\"doi\":\"10.3934/biophy.2021029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA microarray technology with biological data-set can monitor the expression levels of thousands of genes simultaneously. Microarray data analysis is important in phenotype classification of diseases. In this work, the computational part basically predicts the tendency towards mortality using different classification techniques by identifying features from the high dimensional dataset. We have analyzed the breast cancer transcriptional genomic data of 1554 transcripts captured over from 272 samples. This work presents effective methods for gene classification using Logistic Regression (LR), Random Forest (RF), Decision Tree (DT) and constructs a classifier with an upgraded rate of accuracy than all features together. The performance of these underlying methods are also compared with dimension reduction method, namely, Principal Component Analysis (PCA). The methods of feature reduction with RF, LR and decision tree (DT) provide better performance than PCA. It is observed that both techniques LR and RF identify TYMP, ERS1, C-MYB and TUBA1a genes. But some features corresponding to the genes such as ARID4B, DNMT3A, TOX3, RGS17 and PNLIP are uniquely pointed out by LR method which are leading to a significant role in breast cancer. The simulation is based on R-software.\",\"PeriodicalId\":7529,\"journal\":{\"name\":\"AIMS Biophysics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AIMS Biophysics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3934/biophy.2021029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Biophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/biophy.2021029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOPHYSICS","Score":null,"Total":0}
引用次数: 0

摘要

具有生物数据集的DNA微阵列技术可以同时监测数千个基因的表达水平。微阵列数据分析在疾病表型分类中具有重要意义。在这项工作中,计算部分基本上通过识别高维数据集中的特征,使用不同的分类技术来预测死亡率的趋势。我们分析了从272个样本中捕获的1554个转录本的乳腺癌转录基因组数据。本文提出了使用逻辑回归(LR)、随机森林(RF)、决策树(DT)进行基因分类的有效方法,并构建了一个比所有特征加在一起准确率更高的分类器。这些基础方法的性能也与降维方法,即主成分分析(PCA)进行了比较。基于RF、LR和决策树(DT)的特征约简方法比PCA具有更好的性能。观察到,LR和RF技术都能识别TYMP、ERS1、C-MYB和TUBA1a基因。但一些与ARID4B、DNMT3A、TOX3、RGS17、PNLIP等基因相对应的特征被LR方法独特地指出,在乳腺癌中起重要作用。仿真是基于r软件进行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Feature selection and classification approaches in gene expression of breast cancer
DNA microarray technology with biological data-set can monitor the expression levels of thousands of genes simultaneously. Microarray data analysis is important in phenotype classification of diseases. In this work, the computational part basically predicts the tendency towards mortality using different classification techniques by identifying features from the high dimensional dataset. We have analyzed the breast cancer transcriptional genomic data of 1554 transcripts captured over from 272 samples. This work presents effective methods for gene classification using Logistic Regression (LR), Random Forest (RF), Decision Tree (DT) and constructs a classifier with an upgraded rate of accuracy than all features together. The performance of these underlying methods are also compared with dimension reduction method, namely, Principal Component Analysis (PCA). The methods of feature reduction with RF, LR and decision tree (DT) provide better performance than PCA. It is observed that both techniques LR and RF identify TYMP, ERS1, C-MYB and TUBA1a genes. But some features corresponding to the genes such as ARID4B, DNMT3A, TOX3, RGS17 and PNLIP are uniquely pointed out by LR method which are leading to a significant role in breast cancer. The simulation is based on R-software.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
AIMS Biophysics
AIMS Biophysics BIOPHYSICS-
CiteScore
2.40
自引率
20.00%
发文量
16
审稿时长
8 weeks
期刊介绍: AIMS Biophysics is an international Open Access journal devoted to publishing peer-reviewed, high quality, original papers in the field of biophysics. We publish the following article types: original research articles, reviews, editorials, letters, and conference reports. AIMS Biophysics welcomes, but not limited to, the papers from the following topics: · Structural biology · Biophysical technology · Bioenergetics · Membrane biophysics · Cellular Biophysics · Electrophysiology · Neuro-Biophysics · Biomechanics · Systems biology
期刊最新文献
Endoplasmic reticulum localization of phosphoinositide specific phospholipase C enzymes in U73122 cultured human osteoblasts Identification of potential SARS-CoV-2 papain-like protease inhibitors with the ability to interact with the catalytic triad Predicting factors and top gene identification for survival data of breast cancer A review of molecular biology detection methods for human adenovirus Natural bond orbital analysis of dication magnesium complexes [Mg(H2O)6]2+ and [[Mg(H2O)6](H2O)n]2+; n=1-4
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1