S. Appelbaum, D. Krüerke, Stephan Baumgartner, M. Schenker, T. Ostermann
{"title":"医院癌症登记大数据集缺失值的UICC分期随机预测模型的开发、实施和验证","authors":"S. Appelbaum, D. Krüerke, Stephan Baumgartner, M. Schenker, T. Ostermann","doi":"10.5220/0011667700003414","DOIUrl":null,"url":null,"abstract":": Cancer is still a fatal disease in many cases, despite intensive research into prevention, treatment and follow-up. In this context, an important parameter is the stage of the cancer. The TNM/UICC classification is an important method to describe a cancer. It dates back to the surgeon Pierre Denoix and is an important prognostic factor for patient survival. Unfortunately, despite its importance, the TNM/UICC classification is often poorly documented in cancer registries. The aim of this work is to investigate the possibility of predicting UICC stages using statistical learning methods based on cancer registry data. Data from the Cancer Registry Clinic Arlesheim (CRCA) were used for this analysis. It contains a total of 5,305 records of which 1,539 cases were eligible for data analysis. For prediction classification and regression trees, random forests, gradient tree boosting and logistic regression are used as statistical methods for the problem at hand. As performance measures Mean misclassification error (mmce), area under the receiver operating curve (AUC) and Cohen’s kappa are applied. Misclassification rates were in the range of 28.0% to 30.4%. AUCs ranged between 0.73 and 0.80 and Cohen kappa showed values between 0.39 and 0.44 which only show a moderate predictive performance. However, with only 1,539 records, the data set considered here was significantly lower than those of larger cancer registries, so that the results found here should be interpreted with caution.","PeriodicalId":20676,"journal":{"name":"Proceedings of the International Conference on Health Informatics and Medical Application Technology","volume":"145 1","pages":"117-123"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development, Implementation and Validation of a Stochastic Prediction Model of UICC Stages for Missing Values in Large Data Sets in a Hospital Cancer Registry\",\"authors\":\"S. Appelbaum, D. Krüerke, Stephan Baumgartner, M. Schenker, T. Ostermann\",\"doi\":\"10.5220/0011667700003414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Cancer is still a fatal disease in many cases, despite intensive research into prevention, treatment and follow-up. In this context, an important parameter is the stage of the cancer. The TNM/UICC classification is an important method to describe a cancer. It dates back to the surgeon Pierre Denoix and is an important prognostic factor for patient survival. Unfortunately, despite its importance, the TNM/UICC classification is often poorly documented in cancer registries. The aim of this work is to investigate the possibility of predicting UICC stages using statistical learning methods based on cancer registry data. Data from the Cancer Registry Clinic Arlesheim (CRCA) were used for this analysis. It contains a total of 5,305 records of which 1,539 cases were eligible for data analysis. For prediction classification and regression trees, random forests, gradient tree boosting and logistic regression are used as statistical methods for the problem at hand. As performance measures Mean misclassification error (mmce), area under the receiver operating curve (AUC) and Cohen’s kappa are applied. Misclassification rates were in the range of 28.0% to 30.4%. AUCs ranged between 0.73 and 0.80 and Cohen kappa showed values between 0.39 and 0.44 which only show a moderate predictive performance. However, with only 1,539 records, the data set considered here was significantly lower than those of larger cancer registries, so that the results found here should be interpreted with caution.\",\"PeriodicalId\":20676,\"journal\":{\"name\":\"Proceedings of the International Conference on Health Informatics and Medical Application Technology\",\"volume\":\"145 1\",\"pages\":\"117-123\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Health Informatics and Medical Application Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0011667700003414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Health Informatics and Medical Application Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0011667700003414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development, Implementation and Validation of a Stochastic Prediction Model of UICC Stages for Missing Values in Large Data Sets in a Hospital Cancer Registry
: Cancer is still a fatal disease in many cases, despite intensive research into prevention, treatment and follow-up. In this context, an important parameter is the stage of the cancer. The TNM/UICC classification is an important method to describe a cancer. It dates back to the surgeon Pierre Denoix and is an important prognostic factor for patient survival. Unfortunately, despite its importance, the TNM/UICC classification is often poorly documented in cancer registries. The aim of this work is to investigate the possibility of predicting UICC stages using statistical learning methods based on cancer registry data. Data from the Cancer Registry Clinic Arlesheim (CRCA) were used for this analysis. It contains a total of 5,305 records of which 1,539 cases were eligible for data analysis. For prediction classification and regression trees, random forests, gradient tree boosting and logistic regression are used as statistical methods for the problem at hand. As performance measures Mean misclassification error (mmce), area under the receiver operating curve (AUC) and Cohen’s kappa are applied. Misclassification rates were in the range of 28.0% to 30.4%. AUCs ranged between 0.73 and 0.80 and Cohen kappa showed values between 0.39 and 0.44 which only show a moderate predictive performance. However, with only 1,539 records, the data set considered here was significantly lower than those of larger cancer registries, so that the results found here should be interpreted with caution.