{"title":"Identification of Lung Cancer in Smoker Person Using Ensemble Methods Based on Gene Expression Data","authors":"Otniel Abiezer., F. Nhita, I. Kurniawan","doi":"10.1109/IC2IE56416.2022.9970035","DOIUrl":null,"url":null,"abstract":"Cancer is a symptom of abnormal cell growth and is uncontrollable. Lung cancer is one of the most common types of cancer. Smoking is the leading cause of lung cancer. Early detection is essential because it can prevent lung cancer and get the proper treatment, such as a low-dose CT scan (LDCT). However, this effort still has drawbacks. With advances in DNA microarray technology, it is possible to measure the gene expression level of thousands of genes or cells in each tissue. The identification of lung cancer can be made using machine learning from the gene expression data (DNA microarray). In this study, a machine learning prediction model has been built using the Ensemble Methods, i.e. Random Forest and AdaBoost. The best model is Random Forest with 900 features and gets 0.77 for accuracy score and 0.80 for f1 score.","PeriodicalId":151165,"journal":{"name":"2022 5th International Conference of Computer and Informatics Engineering (IC2IE)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference of Computer and Informatics Engineering (IC2IE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2IE56416.2022.9970035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Cancer is a symptom of abnormal cell growth and is uncontrollable. Lung cancer is one of the most common types of cancer. Smoking is the leading cause of lung cancer. Early detection is essential because it can prevent lung cancer and get the proper treatment, such as a low-dose CT scan (LDCT). However, this effort still has drawbacks. With advances in DNA microarray technology, it is possible to measure the gene expression level of thousands of genes or cells in each tissue. The identification of lung cancer can be made using machine learning from the gene expression data (DNA microarray). In this study, a machine learning prediction model has been built using the Ensemble Methods, i.e. Random Forest and AdaBoost. The best model is Random Forest with 900 features and gets 0.77 for accuracy score and 0.80 for f1 score.