基于机器学习方法的肺癌分类预测

IF 0.4 Q4 MEDICAL INFORMATICS International Journal of Healthcare Information Systems and Informatics Pub Date : 2023-11-15 DOI:10.4018/ijhisi.333631

Dantong Li, Guixin Li, Shuang Li, Ashley Bang

{"title":"基于机器学习方法的肺癌分类预测","authors":"Dantong Li, Guixin Li, Shuang Li, Ashley Bang","doi":"10.4018/ijhisi.333631","DOIUrl":null,"url":null,"abstract":"The K-nearest neighbor interpolation method was used to fill in missing data of five indicators of coronary heart disease, diabetes, total cholesterol, triglycerides, and albumin;, and the SMOTE algorithm was used to balance the number of variable indicators. The Relief-F algorithm was used to remove 18 variable indicators and retain 42 variable indicators. LASSO and ridge regression algorithms were used to remove eight variable indicators and retain 52 variable indicators; The prediction accuracy, recall, and AUC values of the linear kernel support vector machine model filtered using Relief-F and LASSO features are high, and the prediction results are optimal; The test result of random forest screened by Relief-F and LASSO features is better than that of the support vector machine model. It is concluded that the random forest model screened by Relief-F features is better as a prediction of lung cancer typing. The research results provide theoretical data support for predicting lung cancer classification using machine learning methods.","PeriodicalId":56158,"journal":{"name":"International Journal of Healthcare Information Systems and Informatics","volume":"4623 2 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification Prediction of Lung Cancer Based on Machine Learning Method\",\"authors\":\"Dantong Li, Guixin Li, Shuang Li, Ashley Bang\",\"doi\":\"10.4018/ijhisi.333631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The K-nearest neighbor interpolation method was used to fill in missing data of five indicators of coronary heart disease, diabetes, total cholesterol, triglycerides, and albumin;, and the SMOTE algorithm was used to balance the number of variable indicators. The Relief-F algorithm was used to remove 18 variable indicators and retain 42 variable indicators. LASSO and ridge regression algorithms were used to remove eight variable indicators and retain 52 variable indicators; The prediction accuracy, recall, and AUC values of the linear kernel support vector machine model filtered using Relief-F and LASSO features are high, and the prediction results are optimal; The test result of random forest screened by Relief-F and LASSO features is better than that of the support vector machine model. It is concluded that the random forest model screened by Relief-F features is better as a prediction of lung cancer typing. The research results provide theoretical data support for predicting lung cancer classification using machine learning methods.\",\"PeriodicalId\":56158,\"journal\":{\"name\":\"International Journal of Healthcare Information Systems and Informatics\",\"volume\":\"4623 2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Healthcare Information Systems and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijhisi.333631\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Healthcare Information Systems and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijhisi.333631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

采用 K-nearest neighbor 插值法填补冠心病、糖尿病、总胆固醇、甘油三酯和白蛋白 5 个指标的缺失数据；采用 SMOTE 算法平衡变量指标的数量。使用 Relief-F 算法删除了 18 个变量指标，保留了 42 个变量指标。采用 Relief-F 和 LASSO 算法筛选的线性核支持向量机模型的预测准确率、召回率和 AUC 值均较高，预测结果最优；采用 Relief-F 和 LASSO 算法筛选的随机森林的测试结果优于支持向量机模型。由此得出结论，采用 Relief-F 特征筛选的随机森林模型在肺癌分型预测方面效果更好。研究结果为使用机器学习方法预测肺癌分型提供了理论数据支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Classification Prediction of Lung Cancer Based on Machine Learning Method

The K-nearest neighbor interpolation method was used to fill in missing data of five indicators of coronary heart disease, diabetes, total cholesterol, triglycerides, and albumin;, and the SMOTE algorithm was used to balance the number of variable indicators. The Relief-F algorithm was used to remove 18 variable indicators and retain 42 variable indicators. LASSO and ridge regression algorithms were used to remove eight variable indicators and retain 52 variable indicators; The prediction accuracy, recall, and AUC values of the linear kernel support vector machine model filtered using Relief-F and LASSO features are high, and the prediction results are optimal; The test result of random forest screened by Relief-F and LASSO features is better than that of the support vector machine model. It is concluded that the random forest model screened by Relief-F features is better as a prediction of lung cancer typing. The research results provide theoretical data support for predicting lung cancer classification using machine learning methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Healthcare Information Systems and Informatics MEDICAL INFORMATICS-

CiteScore

3.30

自引率

0.00%

发文量