{"title":"Microarray Data Analysis for Diagnosis of Cancer Diseases by Machine Learning algorithm","authors":"Shemim Begum, Swaraj Samanta, Salauddin Ahmed, Debasis Chakraborty","doi":"10.1109/RAIT57693.2023.10127091","DOIUrl":null,"url":null,"abstract":"DNA microarrays can simultaneously measure the expression level of thousands of gene within a particular mRNA sample that provide information about the state of cells and tissues. Though these expressive values are useful in cancer classification, and understand the mechanisms involved in the genesis of disease processes, however, only a few genes out of these thousands of genes contribute towards disease classification. On this basis, usage of feature selection algorithm is favourable, as the main goal of feature selection algorithm is to identify the relevant features (here genes) efficiently. In this paper, we have applied four filter Feature Selection (FS) methods, namely, Mutual Information (MI), Pearson Correlation Coefficient (PCC), Chi2, ReliefF along with three well-known classifiers, namely, Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbour (KNN) on six microarray datasets (both binary and multi-class) namely, Leukemia, Lung, Lymphoma and Leukemia, Gastric, SRBCT and Childhood Tumor and recorded the accuracies.","PeriodicalId":281845,"journal":{"name":"2023 5th International Conference on Recent Advances in Information Technology (RAIT)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Recent Advances in Information Technology (RAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAIT57693.2023.10127091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
DNA microarrays can simultaneously measure the expression level of thousands of gene within a particular mRNA sample that provide information about the state of cells and tissues. Though these expressive values are useful in cancer classification, and understand the mechanisms involved in the genesis of disease processes, however, only a few genes out of these thousands of genes contribute towards disease classification. On this basis, usage of feature selection algorithm is favourable, as the main goal of feature selection algorithm is to identify the relevant features (here genes) efficiently. In this paper, we have applied four filter Feature Selection (FS) methods, namely, Mutual Information (MI), Pearson Correlation Coefficient (PCC), Chi2, ReliefF along with three well-known classifiers, namely, Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbour (KNN) on six microarray datasets (both binary and multi-class) namely, Leukemia, Lung, Lymphoma and Leukemia, Gastric, SRBCT and Childhood Tumor and recorded the accuracies.