Sivakorn Kozuevanich, Jonathan H. Chan, A. Meechai
{"title":"基于gsnfs的标记识别中的特征选择","authors":"Sivakorn Kozuevanich, Jonathan H. Chan, A. Meechai","doi":"10.1145/3365953.3365964","DOIUrl":null,"url":null,"abstract":"Gene Sub-Network-based Feature Selection (GSNFS) is a method capable of handling case-control and multiclass studies for gene sub-network biomarker identification by an integrated analysis of gene expression, gene-set and network data. It has previously been shown to reasonably identify sub-network markers for lung cancer. However, previous studies have not assessed the importance of each subnetwork identified by GSNFS. In this work, we applied correlation-based and information gain feature selection techniques to rank the identified sub-network biomarkers (gene-set). First, the top- and bottom- 5 ranked gene-sets were selected and investigated the classification performance. Expectedly, the top-ranked gene-sets provided an excellent performance while the bottom-ranked gene-sets showed a poor performance. The identified top-ranked gene-sets such as MAPK signalling pathway were known to relate to cancer. Furthermore, combined top-ranked gene-sets from top 2 up to top 30 showed a further improvement on the performance when compared to using individual gene-sets. The results in this study are promising as significantly fewer subnetworks were needed to build a classifier and gave a comparable performance to a full data-set classifier.","PeriodicalId":158189,"journal":{"name":"Proceedings of the Tenth International Conference on Computational Systems-Biology and Bioinformatics","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Feature selection in GSNFS-based marker identification\",\"authors\":\"Sivakorn Kozuevanich, Jonathan H. Chan, A. Meechai\",\"doi\":\"10.1145/3365953.3365964\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene Sub-Network-based Feature Selection (GSNFS) is a method capable of handling case-control and multiclass studies for gene sub-network biomarker identification by an integrated analysis of gene expression, gene-set and network data. It has previously been shown to reasonably identify sub-network markers for lung cancer. However, previous studies have not assessed the importance of each subnetwork identified by GSNFS. In this work, we applied correlation-based and information gain feature selection techniques to rank the identified sub-network biomarkers (gene-set). First, the top- and bottom- 5 ranked gene-sets were selected and investigated the classification performance. Expectedly, the top-ranked gene-sets provided an excellent performance while the bottom-ranked gene-sets showed a poor performance. The identified top-ranked gene-sets such as MAPK signalling pathway were known to relate to cancer. Furthermore, combined top-ranked gene-sets from top 2 up to top 30 showed a further improvement on the performance when compared to using individual gene-sets. The results in this study are promising as significantly fewer subnetworks were needed to build a classifier and gave a comparable performance to a full data-set classifier.\",\"PeriodicalId\":158189,\"journal\":{\"name\":\"Proceedings of the Tenth International Conference on Computational Systems-Biology and Bioinformatics\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Tenth International Conference on Computational Systems-Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3365953.3365964\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Tenth International Conference on Computational Systems-Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3365953.3365964","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature selection in GSNFS-based marker identification
Gene Sub-Network-based Feature Selection (GSNFS) is a method capable of handling case-control and multiclass studies for gene sub-network biomarker identification by an integrated analysis of gene expression, gene-set and network data. It has previously been shown to reasonably identify sub-network markers for lung cancer. However, previous studies have not assessed the importance of each subnetwork identified by GSNFS. In this work, we applied correlation-based and information gain feature selection techniques to rank the identified sub-network biomarkers (gene-set). First, the top- and bottom- 5 ranked gene-sets were selected and investigated the classification performance. Expectedly, the top-ranked gene-sets provided an excellent performance while the bottom-ranked gene-sets showed a poor performance. The identified top-ranked gene-sets such as MAPK signalling pathway were known to relate to cancer. Furthermore, combined top-ranked gene-sets from top 2 up to top 30 showed a further improvement on the performance when compared to using individual gene-sets. The results in this study are promising as significantly fewer subnetworks were needed to build a classifier and gave a comparable performance to a full data-set classifier.