利用基因表达数据预测癌症转移的L1惩罚深度学习模型

IF 6.3 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Science and Technology Pub Date : 2023-06-01 DOI:10.1088/2632-2153/acd987

Jaeyoon Kim, Minhyeok Lee, Junhee Seok

{"title":"利用基因表达数据预测癌症转移的L1惩罚深度学习模型","authors":"Jaeyoon Kim, Minhyeok Lee, Junhee Seok","doi":"10.1088/2632-2153/acd987","DOIUrl":null,"url":null,"abstract":"Breast cancer has the highest incidence and death rate among women; moreover, its metastasis to other organs increases the mortality rate. Since several studies have reported gene expression and cancer prognosis to be related, the study of breast cancer metastasis using gene expression is crucial. To this end, a novel deep neural network architecture, deep learning-based cancer metastasis estimator (DeepCME), is proposed in this paper for predicting breast cancer metastasis. However, the problem of overfitting occurs frequently while training deep learning models using gene expression data because they contain a large number of genes and the sample size is rather small. To address overfitting, several regularization methods are implemented, such as L1 penalty, batch normalization, and dropout. To demonstrate the superior performance of our model, area under curve (AUC) scores are evaluated and then compared with five baseline models: logistic regression, support vector classifier (SVC), random forest, decision tree, and k-nearest neighbor. Considering results, DeepCME demonstrates the highest average AUC scores in most cross-validation cases, and the average AUC score of DeepCME is 0.754, which is approximately 12.9% higher than SVC, the second-best model. In addition, the 30 most significant genes related to breast cancer metastasis are identified based on DeepCME results and some are discussed in further detail considering the reports from some previous medical studies. Considering the high expense involved in measuring the expression of a single gene, the ability to develop the cost-effective and time-efficient tests using only a few key genes is valuable. Based on this study, we expect DeepCME to be utilized clinically for predicting breast cancer metastasis and be applied to other types of cancer as well after further research.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Deep learning model with L1 penalty for predicting breast cancer metastasis using gene expression data\",\"authors\":\"Jaeyoon Kim, Minhyeok Lee, Junhee Seok\",\"doi\":\"10.1088/2632-2153/acd987\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer has the highest incidence and death rate among women; moreover, its metastasis to other organs increases the mortality rate. Since several studies have reported gene expression and cancer prognosis to be related, the study of breast cancer metastasis using gene expression is crucial. To this end, a novel deep neural network architecture, deep learning-based cancer metastasis estimator (DeepCME), is proposed in this paper for predicting breast cancer metastasis. However, the problem of overfitting occurs frequently while training deep learning models using gene expression data because they contain a large number of genes and the sample size is rather small. To address overfitting, several regularization methods are implemented, such as L1 penalty, batch normalization, and dropout. To demonstrate the superior performance of our model, area under curve (AUC) scores are evaluated and then compared with five baseline models: logistic regression, support vector classifier (SVC), random forest, decision tree, and k-nearest neighbor. Considering results, DeepCME demonstrates the highest average AUC scores in most cross-validation cases, and the average AUC score of DeepCME is 0.754, which is approximately 12.9% higher than SVC, the second-best model. In addition, the 30 most significant genes related to breast cancer metastasis are identified based on DeepCME results and some are discussed in further detail considering the reports from some previous medical studies. Considering the high expense involved in measuring the expression of a single gene, the ability to develop the cost-effective and time-efficient tests using only a few key genes is valuable. Based on this study, we expect DeepCME to be utilized clinically for predicting breast cancer metastasis and be applied to other types of cancer as well after further research.\",\"PeriodicalId\":33757,\"journal\":{\"name\":\"Machine Learning Science and Technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Learning Science and Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1088/2632-2153/acd987\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/acd987","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

摘要

癌症在女性中发病率和死亡率最高；此外，它转移到其他器官会增加死亡率。由于一些研究报道了基因表达与癌症预后相关，因此利用基因表达研究癌症转移至关重要。为此，本文提出了一种新的深度神经网络结构——基于深度学习的癌症转移估计器（DeepCME），用于预测癌症转移。然而，在使用基因表达数据训练深度学习模型时，过拟合问题经常发生，因为它们包含大量基因，并且样本量相当小。为了解决过拟合问题，实现了几种正则化方法，如L1惩罚、批量归一化和丢弃。为了证明我们模型的优越性能，评估了曲线下面积（AUC）得分，然后将其与五个基线模型进行比较：逻辑回归、支持向量分类器（SVC）、随机森林、决策树和k近邻。从结果来看，DeepCME在大多数交叉验证案例中的平均AUC得分最高，DeepCME的平均AUC得分为0.754，比第二好模型SVC高出约12.9%。此外，根据DeepCME的结果，确定了与癌症转移相关的30个最重要的基因，并考虑到之前一些医学研究的报告，对其中一些基因进行了进一步详细的讨论。考虑到测量单个基因表达所涉及的高昂费用，仅使用少数关键基因开发成本效益高且时效性强的测试的能力是有价值的。基于这项研究，我们期望DeepCME在临床上用于预测癌症转移，并在进一步研究后应用于其他类型的癌症。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deep learning model with L1 penalty for predicting breast cancer metastasis using gene expression data

Breast cancer has the highest incidence and death rate among women; moreover, its metastasis to other organs increases the mortality rate. Since several studies have reported gene expression and cancer prognosis to be related, the study of breast cancer metastasis using gene expression is crucial. To this end, a novel deep neural network architecture, deep learning-based cancer metastasis estimator (DeepCME), is proposed in this paper for predicting breast cancer metastasis. However, the problem of overfitting occurs frequently while training deep learning models using gene expression data because they contain a large number of genes and the sample size is rather small. To address overfitting, several regularization methods are implemented, such as L1 penalty, batch normalization, and dropout. To demonstrate the superior performance of our model, area under curve (AUC) scores are evaluated and then compared with five baseline models: logistic regression, support vector classifier (SVC), random forest, decision tree, and k-nearest neighbor. Considering results, DeepCME demonstrates the highest average AUC scores in most cross-validation cases, and the average AUC score of DeepCME is 0.754, which is approximately 12.9% higher than SVC, the second-best model. In addition, the 30 most significant genes related to breast cancer metastasis are identified based on DeepCME results and some are discussed in further detail considering the reports from some previous medical studies. Considering the high expense involved in measuring the expression of a single gene, the ability to develop the cost-effective and time-efficient tests using only a few key genes is valuable. Based on this study, we expect DeepCME to be utilized clinically for predicting breast cancer metastasis and be applied to other types of cancer as well after further research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine Learning Science and Technology Computer Science-Artificial Intelligence

CiteScore

9.10

自引率

4.40%

发文量

审稿时长

5 weeks

期刊介绍： Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.