{"title":"Stacking Ensemble Method for Early and Advanced Stage Lung Adenocarcinoma Classification Based on miRNA Expression","authors":"Adeel Khan, N. He, Irfan Tariq, Zhiyang Li","doi":"10.1145/3498731.3498742","DOIUrl":null,"url":null,"abstract":"Lung cancer and its various types are a leading cause of death across the globe. Many studies have pointed out that microRNAs (miRNAs) dysregulation can be a useful marker for variety of cancers, including lung cancer. Successful treatment of all cancers depends on clinical expertise, treatment resources, and the stage at the time of diagnosis. Therefore, we made an effort to find a novel miRNA expression marker to determine the stage of lung adenocarcinoma (LUAD). In this manuscript, we proposed a stack ensemble method for classifying early and advanced stage LUAD using miRNA expression data. In our benchmark dataset, 445 were early-stage, and 114 were advanced-stage LUAD patients. The benchmark dataset was imbalanced, so to balance our dataset, we used Synthetic Minority Over Sampling Technique (SMOTE). We then divided the balanced LUAD patient’s dataset into training dataset (80%) and testing dataset (20%). Random Forest (RF) technique was implemented for the selection of best optimal features (miRNA sequence expression) out of 1880 miRNAs, followed by machine learning (ML) Stack ensemble method to classify the early and advanced stage LUAD. Compared to the traditional ML classifier used as a baseline, the stack ensemble method classified the early and advanced stage LUAD more efficiently with 99% accuracy. The proposed method’s precision for early-stage LUAD was 92% and for advance stage LUAD 84%. Similarly, the recall of the proposed method for early and advanced stage LUAD was 82% and 93%, respectively. The F1-Score of the proposed method for early and advanced stage LUAD was 87% and 88%, respectively. To conclude, the results obtained clearly showed the effectiveness of ensemble method for the classification of early and advanced stage LUAD using miRNA expression data. The top 10 miRNAs sequences identified by the model can help make the best treatment decisions for early and advanced stage LUAD to increase the chances of survival.","PeriodicalId":166893,"journal":{"name":"Proceedings of the 2021 10th International Conference on Bioinformatics and Biomedical Science","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 10th International Conference on Bioinformatics and Biomedical Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3498731.3498742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Lung cancer and its various types are a leading cause of death across the globe. Many studies have pointed out that microRNAs (miRNAs) dysregulation can be a useful marker for variety of cancers, including lung cancer. Successful treatment of all cancers depends on clinical expertise, treatment resources, and the stage at the time of diagnosis. Therefore, we made an effort to find a novel miRNA expression marker to determine the stage of lung adenocarcinoma (LUAD). In this manuscript, we proposed a stack ensemble method for classifying early and advanced stage LUAD using miRNA expression data. In our benchmark dataset, 445 were early-stage, and 114 were advanced-stage LUAD patients. The benchmark dataset was imbalanced, so to balance our dataset, we used Synthetic Minority Over Sampling Technique (SMOTE). We then divided the balanced LUAD patient’s dataset into training dataset (80%) and testing dataset (20%). Random Forest (RF) technique was implemented for the selection of best optimal features (miRNA sequence expression) out of 1880 miRNAs, followed by machine learning (ML) Stack ensemble method to classify the early and advanced stage LUAD. Compared to the traditional ML classifier used as a baseline, the stack ensemble method classified the early and advanced stage LUAD more efficiently with 99% accuracy. The proposed method’s precision for early-stage LUAD was 92% and for advance stage LUAD 84%. Similarly, the recall of the proposed method for early and advanced stage LUAD was 82% and 93%, respectively. The F1-Score of the proposed method for early and advanced stage LUAD was 87% and 88%, respectively. To conclude, the results obtained clearly showed the effectiveness of ensemble method for the classification of early and advanced stage LUAD using miRNA expression data. The top 10 miRNAs sequences identified by the model can help make the best treatment decisions for early and advanced stage LUAD to increase the chances of survival.