{"title":"A hybrid learning method for distinguishing lung adenocarcinoma and squamous cell carcinoma","authors":"Anil Kumar Swain, A. Swetapadma, Jitendra Kumar Rout, Bunil Kumar Balabantaray","doi":"10.1108/dta-10-2022-0384","DOIUrl":null,"url":null,"abstract":"PurposeThe objective of the proposed work is to identify the most commonly occurring non–small cell carcinoma types, such as adenocarcinoma and squamous cell carcinoma, within the human population. Another objective of the work is to reduce the false positive rate during the classification.Design/methodology/approachIn this work, a hybrid method using convolutional neural networks (CNNs), extreme gradient boosting (XGBoost) and long-short-term memory networks (LSTMs) has been proposed to distinguish between lung adenocarcinoma and squamous cell carcinoma. To extract features from non–small cell lung carcinoma images, a three-layer convolution and three-layer max-pooling-based CNN is used. A few important features have been selected from the extracted features using the XGBoost algorithm as the optimal feature. Finally, LSTM has been used for the classification of carcinoma types. The accuracy of the proposed method is 99.57 per cent, and the false positive rate is 0.427 per cent.FindingsThe proposed CNN–XGBoost–LSTM hybrid method has significantly improved the results in distinguishing between adenocarcinoma and squamous cell carcinoma. The importance of the method can be outlined as follows: It has a very low false positive rate of 0.427 per cent. It has very high accuracy, i.e. 99.57 per cent. CNN-based features are providing accurate results in classifying lung carcinoma. It has the potential to serve as an assisting aid for doctors.Practical implicationsIt can be used by doctors as a secondary tool for the analysis of non–small cell lung cancers.Social implicationsIt can help rural doctors by sending the patients to specialized doctors for more analysis of lung cancer.Originality/valueIn this work, a hybrid method using CNN, XGBoost and LSTM has been proposed to distinguish between lung adenocarcinoma and squamous cell carcinoma. A three-layer convolution and three-layer max-pooling-based CNN is used to extract features from the non–small cell lung carcinoma images. A few important features have been selected from the extracted features using the XGBoost algorithm as the optimal feature. Finally, LSTM has been used for the classification of carcinoma types.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-10-2022-0384","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
PurposeThe objective of the proposed work is to identify the most commonly occurring non–small cell carcinoma types, such as adenocarcinoma and squamous cell carcinoma, within the human population. Another objective of the work is to reduce the false positive rate during the classification.Design/methodology/approachIn this work, a hybrid method using convolutional neural networks (CNNs), extreme gradient boosting (XGBoost) and long-short-term memory networks (LSTMs) has been proposed to distinguish between lung adenocarcinoma and squamous cell carcinoma. To extract features from non–small cell lung carcinoma images, a three-layer convolution and three-layer max-pooling-based CNN is used. A few important features have been selected from the extracted features using the XGBoost algorithm as the optimal feature. Finally, LSTM has been used for the classification of carcinoma types. The accuracy of the proposed method is 99.57 per cent, and the false positive rate is 0.427 per cent.FindingsThe proposed CNN–XGBoost–LSTM hybrid method has significantly improved the results in distinguishing between adenocarcinoma and squamous cell carcinoma. The importance of the method can be outlined as follows: It has a very low false positive rate of 0.427 per cent. It has very high accuracy, i.e. 99.57 per cent. CNN-based features are providing accurate results in classifying lung carcinoma. It has the potential to serve as an assisting aid for doctors.Practical implicationsIt can be used by doctors as a secondary tool for the analysis of non–small cell lung cancers.Social implicationsIt can help rural doctors by sending the patients to specialized doctors for more analysis of lung cancer.Originality/valueIn this work, a hybrid method using CNN, XGBoost and LSTM has been proposed to distinguish between lung adenocarcinoma and squamous cell carcinoma. A three-layer convolution and three-layer max-pooling-based CNN is used to extract features from the non–small cell lung carcinoma images. A few important features have been selected from the extracted features using the XGBoost algorithm as the optimal feature. Finally, LSTM has been used for the classification of carcinoma types.