Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models.

Q2 Agricultural and Biological Sciences Genomics and Informatics Pub Date : 2022-06-01 Epub Date: 2022-06-30 DOI:10.5808/gi.22036

Hyunsuk Kim, Taesung Park, Jinyoung Jang, Seungyeoun Lee

{"title":"Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models.","authors":"Hyunsuk Kim, Taesung Park, Jinyoung Jang, Seungyeoun Lee","doi":"10.5808/gi.22036","DOIUrl":null,"url":null,"abstract":"<p><p>A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.</p>","PeriodicalId":36591,"journal":{"name":"Genomics and Informatics","volume":"20 2","pages":"e23"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9299568/pdf/","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5808/gi.22036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/6/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}

引用次数: 4

Abstract

A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

胰腺癌生存预测模型的比较:Cox模型与机器学习模型。

最近开发了一种生存预测模型，用于评估切除的非转移性胰腺导管腺癌的预后，该模型基于Cox模型，使用两个全国性数据库:监测，流行病学和最终结果(SEER)和韩国肿瘤登记系统-胆道胰腺(KOTUS-BP)。在这项研究中，我们应用了两种机器学习方法-随机生存森林(RSF)和支持向量机(SVM)-进行生存分析，并使用SEER和KOTUS-BP数据集比较了它们的预测性能。采用三种方案进行模型开发和评价。首先，我们利用来自SEER的数据进行模型开发，并使用来自KOTUS-BP的数据进行外部评估。其次，通过从KOTUS-BP获取用于模型开发的数据和从SEER获取用于外部评估的数据来交换这两个数据集。最后，我们将这两个数据集对半混合，并利用混合数据集进行模型开发和验证。我们使用来自SEER的9624例患者和来自KOTUS-BP的3281例患者构建了一个包含7个协变量的预测模型:年龄、性别、组织学分化、辅助治疗、切除边缘状态和美国癌症联合委员会第8版t期和n期。对比三种方案，混合数据集下Cox模型、RSF和SVM的性能优于未混合数据集下的性能。当使用混合数据集时，Cox模型的c指数、1年、2年和3年曲线下的时间依赖面积分别为0.644、0.698、0.680和0.687。Cox模型的表现略好于RSF和SVM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊