Shivangi Raghav , Aastha Suri , Deepika Kumar , Aakansha Aakansha , Muskan Rathore , Sudipta Roy
{"title":"A hierarchical clustering approach for colorectal cancer molecular subtypes identification from gene expression data","authors":"Shivangi Raghav , Aastha Suri , Deepika Kumar , Aakansha Aakansha , Muskan Rathore , Sudipta Roy","doi":"10.1016/j.imed.2023.04.002","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Colorectal cancer (CRC) is the second leading cause of cancer fatalities and the third most common human disease. Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly. Studies have highlighted the significance of CRC as a major cause of mortality worldwide and the potential benefits of identifying molecular subtypes to tailor treatment strategies and improve patient outcomes.</p></div><div><h3>Methods</h3><p>This study proposed an unsupervised learning approach using hierarchical clustering and feature selection to identify molecular subtypes and compares its performance with that of conventional methods. The proposed model contained gene expression data from CRC patients obtained from Kaggle and used dimension reduction techniques followed by Z-score-based outlier removal. Agglomerative hierarchy clustering was used to identify molecular subtypes, with a <em>P</em>-value-based approach for feature selection. The performance of the model was evaluated using various classifiers including multilayer perceptron (MLP).</p></div><div><h3>Results</h3><p>The proposed methodology outperformed conventional methods, with the MLP classifier achieving the highest accuracy of 89% after feature selection. The model successfully identified molecular subtypes of CRC and differentiated between different subtypes based on their gene expression profiles.</p></div><div><h3>Conclusion</h3><p>This method could aid in developing tailored therapeutic strategies for CRC patients, although there is a need for further validation and evaluation of its clinical significance.</p></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"4 1","pages":"Pages 43-51"},"PeriodicalIF":4.4000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667102623000396/pdfft?md5=36a536dfadc2d24ccb19caedafb9a1f9&pid=1-s2.0-S2667102623000396-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667102623000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Colorectal cancer (CRC) is the second leading cause of cancer fatalities and the third most common human disease. Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly. Studies have highlighted the significance of CRC as a major cause of mortality worldwide and the potential benefits of identifying molecular subtypes to tailor treatment strategies and improve patient outcomes.
Methods
This study proposed an unsupervised learning approach using hierarchical clustering and feature selection to identify molecular subtypes and compares its performance with that of conventional methods. The proposed model contained gene expression data from CRC patients obtained from Kaggle and used dimension reduction techniques followed by Z-score-based outlier removal. Agglomerative hierarchy clustering was used to identify molecular subtypes, with a P-value-based approach for feature selection. The performance of the model was evaluated using various classifiers including multilayer perceptron (MLP).
Results
The proposed methodology outperformed conventional methods, with the MLP classifier achieving the highest accuracy of 89% after feature selection. The model successfully identified molecular subtypes of CRC and differentiated between different subtypes based on their gene expression profiles.
Conclusion
This method could aid in developing tailored therapeutic strategies for CRC patients, although there is a need for further validation and evaluation of its clinical significance.