H. Saadeh, Maha K. Saadeh, W. Almobaideen, Marwan Al-Tawil
{"title":"Evaluating the Optimal Number of Clusters to Identify Similar Gene Expression Patterns During Erythropoiesis","authors":"H. Saadeh, Maha K. Saadeh, W. Almobaideen, Marwan Al-Tawil","doi":"10.1109/cits55221.2022.9832988","DOIUrl":null,"url":null,"abstract":"Haematopoietic stem cells (HSC) are differentiated into red blood cells (erythrocytes) through a process called Erythropoiesis. During this process, the genes undergo global gene expression changes to reflect the present developmental stage. Unsupervised clustering aims at highlighting the co-expressed genes that share similar expression profiles. Some clustering algorithms, like the well-known and most commonly used K-means, need the number of clusters as input in order to group the data based on similarity measurements. Determining a sufficient number of clusters is not a straightforward task and might be tricky. Furthermore, the quality of the obtained clusters depends on how many clusters were used. In this study, three cluster validation metrics; Silhouette Score, Calinski Harabaz Index, and DaviesBouldin Score were used to evaluate the clusters obtained from the different clustering algorithms applied. For the data of Erythropoiesis, two clusters were identified as sufficient.","PeriodicalId":136239,"journal":{"name":"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer, Information and Telecommunication Systems (CITS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cits55221.2022.9832988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Haematopoietic stem cells (HSC) are differentiated into red blood cells (erythrocytes) through a process called Erythropoiesis. During this process, the genes undergo global gene expression changes to reflect the present developmental stage. Unsupervised clustering aims at highlighting the co-expressed genes that share similar expression profiles. Some clustering algorithms, like the well-known and most commonly used K-means, need the number of clusters as input in order to group the data based on similarity measurements. Determining a sufficient number of clusters is not a straightforward task and might be tricky. Furthermore, the quality of the obtained clusters depends on how many clusters were used. In this study, three cluster validation metrics; Silhouette Score, Calinski Harabaz Index, and DaviesBouldin Score were used to evaluate the clusters obtained from the different clustering algorithms applied. For the data of Erythropoiesis, two clusters were identified as sufficient.