医院癌症登记大数据集缺失值的UICC分期随机预测模型的开发、实施和验证

Proceedings of the International Conference on Health Informatics and Medical Application Technology Pub Date : 2023-01-01 DOI:10.5220/0011667700003414

S. Appelbaum, D. Krüerke, Stephan Baumgartner, M. Schenker, T. Ostermann

{"title":"医院癌症登记大数据集缺失值的UICC分期随机预测模型的开发、实施和验证","authors":"S. Appelbaum, D. Krüerke, Stephan Baumgartner, M. Schenker, T. Ostermann","doi":"10.5220/0011667700003414","DOIUrl":null,"url":null,"abstract":": Cancer is still a fatal disease in many cases, despite intensive research into prevention, treatment and follow-up. In this context, an important parameter is the stage of the cancer. The TNM/UICC classification is an important method to describe a cancer. It dates back to the surgeon Pierre Denoix and is an important prognostic factor for patient survival. Unfortunately, despite its importance, the TNM/UICC classification is often poorly documented in cancer registries. The aim of this work is to investigate the possibility of predicting UICC stages using statistical learning methods based on cancer registry data. Data from the Cancer Registry Clinic Arlesheim (CRCA) were used for this analysis. It contains a total of 5,305 records of which 1,539 cases were eligible for data analysis. For prediction classification and regression trees, random forests, gradient tree boosting and logistic regression are used as statistical methods for the problem at hand. As performance measures Mean misclassification error (mmce), area under the receiver operating curve (AUC) and Cohen’s kappa are applied. Misclassification rates were in the range of 28.0% to 30.4%. AUCs ranged between 0.73 and 0.80 and Cohen kappa showed values between 0.39 and 0.44 which only show a moderate predictive performance. However, with only 1,539 records, the data set considered here was significantly lower than those of larger cancer registries, so that the results found here should be interpreted with caution.","PeriodicalId":20676,"journal":{"name":"Proceedings of the International Conference on Health Informatics and Medical Application Technology","volume":"145 1","pages":"117-123"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development, Implementation and Validation of a Stochastic Prediction Model of UICC Stages for Missing Values in Large Data Sets in a Hospital Cancer Registry\",\"authors\":\"S. Appelbaum, D. Krüerke, Stephan Baumgartner, M. Schenker, T. Ostermann\",\"doi\":\"10.5220/0011667700003414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Cancer is still a fatal disease in many cases, despite intensive research into prevention, treatment and follow-up. In this context, an important parameter is the stage of the cancer. The TNM/UICC classification is an important method to describe a cancer. It dates back to the surgeon Pierre Denoix and is an important prognostic factor for patient survival. Unfortunately, despite its importance, the TNM/UICC classification is often poorly documented in cancer registries. The aim of this work is to investigate the possibility of predicting UICC stages using statistical learning methods based on cancer registry data. Data from the Cancer Registry Clinic Arlesheim (CRCA) were used for this analysis. It contains a total of 5,305 records of which 1,539 cases were eligible for data analysis. For prediction classification and regression trees, random forests, gradient tree boosting and logistic regression are used as statistical methods for the problem at hand. As performance measures Mean misclassification error (mmce), area under the receiver operating curve (AUC) and Cohen’s kappa are applied. Misclassification rates were in the range of 28.0% to 30.4%. AUCs ranged between 0.73 and 0.80 and Cohen kappa showed values between 0.39 and 0.44 which only show a moderate predictive performance. However, with only 1,539 records, the data set considered here was significantly lower than those of larger cancer registries, so that the results found here should be interpreted with caution.\",\"PeriodicalId\":20676,\"journal\":{\"name\":\"Proceedings of the International Conference on Health Informatics and Medical Application Technology\",\"volume\":\"145 1\",\"pages\":\"117-123\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Health Informatics and Medical Application Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0011667700003414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Health Informatics and Medical Application Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0011667700003414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

尽管在预防、治疗和随访方面进行了深入的研究，但在许多情况下，癌症仍然是一种致命的疾病。在这种情况下，一个重要的参数是癌症的阶段。TNM/UICC分类是描述癌症的重要方法。它可以追溯到外科医生皮埃尔·德诺瓦，是患者生存的重要预后因素。不幸的是，尽管TNM/UICC分类很重要，但它在癌症登记处的记录却很少。这项工作的目的是研究使用基于癌症登记数据的统计学习方法预测UICC分期的可能性。来自Arlesheim癌症登记诊所(CRCA)的数据用于本分析。它共包含5305条记录，其中1539例符合数据分析条件。对于预测分类和回归树，使用随机森林、梯度树增强和逻辑回归作为手边问题的统计方法。采用平均误分类误差(mmce)、受者工作曲线下面积(AUC)和Cohen’s kappa作为性能指标。误诊率为28.0% ~ 30.4%。auc值在0.73 ~ 0.80之间，Cohen kappa值在0.39 ~ 0.44之间，仅表现出中等的预测性能。然而，只有1539条记录，这里考虑的数据集明显低于大型癌症登记处的数据集，因此，这里发现的结果应该谨慎解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Development, Implementation and Validation of a Stochastic Prediction Model of UICC Stages for Missing Values in Large Data Sets in a Hospital Cancer Registry

: Cancer is still a fatal disease in many cases, despite intensive research into prevention, treatment and follow-up. In this context, an important parameter is the stage of the cancer. The TNM/UICC classification is an important method to describe a cancer. It dates back to the surgeon Pierre Denoix and is an important prognostic factor for patient survival. Unfortunately, despite its importance, the TNM/UICC classification is often poorly documented in cancer registries. The aim of this work is to investigate the possibility of predicting UICC stages using statistical learning methods based on cancer registry data. Data from the Cancer Registry Clinic Arlesheim (CRCA) were used for this analysis. It contains a total of 5,305 records of which 1,539 cases were eligible for data analysis. For prediction classification and regression trees, random forests, gradient tree boosting and logistic regression are used as statistical methods for the problem at hand. As performance measures Mean misclassification error (mmce), area under the receiver operating curve (AUC) and Cohen’s kappa are applied. Misclassification rates were in the range of 28.0% to 30.4%. AUCs ranged between 0.73 and 0.80 and Cohen kappa showed values between 0.39 and 0.44 which only show a moderate predictive performance. However, with only 1,539 records, the data set considered here was significantly lower than those of larger cancer registries, so that the results found here should be interpreted with caution.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Health Informatics and Medical Application Technology

自引率

0.00%

发文量