Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706602
A. Srivastava, Abhinav Asati, M. Bhattacharya
An Accurate, Fast and Noise-Adaptive segmentation of Brain MR Images for clinical Analysis is a challenging problem. An improved Hybrid Clustering Algorithm is presented here, which integrates the concept of recently popularized Rough Sets and that of Fuzzy Sets. The concept of lower and upper approximations of rough sets is incorporated to handle uncertainty, vagueness, and incompleteness in class definition. For making the segmentation robust to Noise and intensity in-homogeneity, the images are proposed to be pre-processed with a neighbourhood averaging spatial filter. To accelerate the segmentation process, a novel Suppressed Rough Fuzzy C-Means model is presented in which a membership suppression mechanism has been implemented, which creates competition among clusters to speed-up the clustering process. The effectiveness of the presented algorithm along with comparison with other related algorithm has been demonstrated on a set of MR and CT scan images. The results using MRI data show that our method provides better results compared to standard Fuzzy C-Means based algorithms and other modified similar techniques.
{"title":"A fast and noise-adaptive rough-fuzzy hybrid algorithm for medical image segmentation","authors":"A. Srivastava, Abhinav Asati, M. Bhattacharya","doi":"10.1109/BIBM.2010.5706602","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706602","url":null,"abstract":"An Accurate, Fast and Noise-Adaptive segmentation of Brain MR Images for clinical Analysis is a challenging problem. An improved Hybrid Clustering Algorithm is presented here, which integrates the concept of recently popularized Rough Sets and that of Fuzzy Sets. The concept of lower and upper approximations of rough sets is incorporated to handle uncertainty, vagueness, and incompleteness in class definition. For making the segmentation robust to Noise and intensity in-homogeneity, the images are proposed to be pre-processed with a neighbourhood averaging spatial filter. To accelerate the segmentation process, a novel Suppressed Rough Fuzzy C-Means model is presented in which a membership suppression mechanism has been implemented, which creates competition among clusters to speed-up the clustering process. The effectiveness of the presented algorithm along with comparison with other related algorithm has been demonstrated on a set of MR and CT scan images. The results using MRI data show that our method provides better results compared to standard Fuzzy C-Means based algorithms and other modified similar techniques.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706574
Weixiang Liu, Songfeng Zheng, Sen Jia, L. Shen, Xianghua Fu
Nonnegative matrix factorization is used extensively for feature extraction and clustering analysis. Recently many sparsity/sparseness constraints, such as L1 penalty, are introduced for sparse nonnegative matrix factorization. Inspired by sparsity measures from linear regression model, this paper proposes to integrate nonnegative matrix factorization with another sparsity constraint, the elastic net. The experimental results of clustering analysis on three gene expression datasets demonstrate the effectiveness of the proposed method.
{"title":"Sparse nonnegative matrix factorization with the elastic net","authors":"Weixiang Liu, Songfeng Zheng, Sen Jia, L. Shen, Xianghua Fu","doi":"10.1109/BIBM.2010.5706574","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706574","url":null,"abstract":"Nonnegative matrix factorization is used extensively for feature extraction and clustering analysis. Recently many sparsity/sparseness constraints, such as L1 penalty, are introduced for sparse nonnegative matrix factorization. Inspired by sparsity measures from linear regression model, this paper proposes to integrate nonnegative matrix factorization with another sparsity constraint, the elastic net. The experimental results of clustering analysis on three gene expression datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115739171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706618
Prabhjit Kaur, K. Sheikh, A. Kirilyuk, Ksenia Kirilyuk, H. Ressom, A. Cheema, B. Kallakury
Pancreatic cancer (PC) is the fourth leading cause of cancer death in the United States, with 4% survival 5 years after diagnosis. Patients with pancreatic cancer are usually diagnosed at late stages, when the disease is incurable. Sensitive and more specific biomarkers are thus critical for supporting new prevention, diagnostic or therapeutic strategies. Here, we report mass spectrometry-based metabolomic profiling of human pancreatic tumor and normal tissue. Multivariate data analysis shows significant alterations in the profiles of the tumor metabolome as compared to the normal. These findings offer an information-rich matrix for discovering novel biomarkers with potential for diagnostic or prognostic purposes.
{"title":"Metabolomic profiling for biomarker discovery in pancreatic cancer","authors":"Prabhjit Kaur, K. Sheikh, A. Kirilyuk, Ksenia Kirilyuk, H. Ressom, A. Cheema, B. Kallakury","doi":"10.1109/BIBM.2010.5706618","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706618","url":null,"abstract":"Pancreatic cancer (PC) is the fourth leading cause of cancer death in the United States, with 4% survival 5 years after diagnosis. Patients with pancreatic cancer are usually diagnosed at late stages, when the disease is incurable. Sensitive and more specific biomarkers are thus critical for supporting new prevention, diagnostic or therapeutic strategies. Here, we report mass spectrometry-based metabolomic profiling of human pancreatic tumor and normal tissue. Multivariate data analysis shows significant alterations in the profiles of the tumor metabolome as compared to the normal. These findings offer an information-rich matrix for discovering novel biomarkers with potential for diagnostic or prognostic purposes.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116292778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706631
Natthakan Iam-on, Simon M. Garrett, C. Price, Tossapon Boongoen
Clinical data has been employed as the major factor for traditional cancer prognosis. However, this classic approach may be ineffective for analyzing morphologically indistinguishable tumor subtypes. As such, the microarray technology emerges as the promising alternative. Despite a large number of microarray studies, the actual clinical application of gene expression data analysis remains limited due to the complexity of generated data and the noise level. Recently, the integrative cluster analysis of both clinical and gene expression data has shown to be an effective alternative to overcome the above-mentioned problems. This paper presents a novel method for using cluster ensembles that is accurate for analyzing heterogeneous biological data. It overcomes the problem of selecting an appropriate clustering algorithm or parameter setting of any potential candidate, especially with a new set of data. The evaluation on real biological and benchmark datasets suggests that the quality of the proposed model is higher than many state-of-the-art cluster ensemble techniques and standard clustering algorithms. Also, its performance is robust to the parameter perturbation, thus providing a reliable and useful means for data analysts and bioinformaticians. Online supplementary is available at http://users.aber.ac.uk/nii07/bibm2010.
{"title":"Link-based cluster ensembles for heterogeneous biological data analysis","authors":"Natthakan Iam-on, Simon M. Garrett, C. Price, Tossapon Boongoen","doi":"10.1109/BIBM.2010.5706631","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706631","url":null,"abstract":"Clinical data has been employed as the major factor for traditional cancer prognosis. However, this classic approach may be ineffective for analyzing morphologically indistinguishable tumor subtypes. As such, the microarray technology emerges as the promising alternative. Despite a large number of microarray studies, the actual clinical application of gene expression data analysis remains limited due to the complexity of generated data and the noise level. Recently, the integrative cluster analysis of both clinical and gene expression data has shown to be an effective alternative to overcome the above-mentioned problems. This paper presents a novel method for using cluster ensembles that is accurate for analyzing heterogeneous biological data. It overcomes the problem of selecting an appropriate clustering algorithm or parameter setting of any potential candidate, especially with a new set of data. The evaluation on real biological and benchmark datasets suggests that the quality of the proposed model is higher than many state-of-the-art cluster ensemble techniques and standard clustering algorithms. Also, its performance is robust to the parameter perturbation, thus providing a reliable and useful means for data analysts and bioinformaticians. Online supplementary is available at http://users.aber.ac.uk/nii07/bibm2010.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125394998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706591
Zheng Xia, Xiao-feng Zhou, Wei Chen, Chunqi Chang
Recently a network-constraint regression model[1] is proposed to incorporate the prior biological knowledge to perform regression and variable selection. In their method, a l1-norm of the coefficients is defined to impose sparse, meanwhile a Laplacian operation on the biological graph is designed to encourage smoothness of the coefficients along the network. However the grouping effect of their Laplacian smoothness operation only exits when the two connected genes both have positive or negative effects on the response. To overcome this problem, we proposed to apply the Laplacian operation on the absolute values of the coefficients to take account of the positive and negative effects. Here, we call the presented method as graph-based elastic net (GENet) because the proposed method has similar grouping effect with elastic net(ENet)[2] except the smoothness of two coefficients are specified by the network in GENet. Further, an efficient algorithm which has same spirit with LARS [3] is developed to solve our optimization problem. Simulation studies showed that the proposed method has better performance than network-constrained regularization without absolute values. Application to Alzheimer's disease(AD) microarray gene-expression dataset identified several subnetworks on Kyoto Encyclopedia of Genes and Genomes(KEGG) transcriptional pathways that are related to progression of AD. Many of those findings are confirmed by published literatures.
{"title":"A graph-based elastic net for variable selection and module identification for genomic data analysis","authors":"Zheng Xia, Xiao-feng Zhou, Wei Chen, Chunqi Chang","doi":"10.1109/BIBM.2010.5706591","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706591","url":null,"abstract":"Recently a network-constraint regression model[1] is proposed to incorporate the prior biological knowledge to perform regression and variable selection. In their method, a l1-norm of the coefficients is defined to impose sparse, meanwhile a Laplacian operation on the biological graph is designed to encourage smoothness of the coefficients along the network. However the grouping effect of their Laplacian smoothness operation only exits when the two connected genes both have positive or negative effects on the response. To overcome this problem, we proposed to apply the Laplacian operation on the absolute values of the coefficients to take account of the positive and negative effects. Here, we call the presented method as graph-based elastic net (GENet) because the proposed method has similar grouping effect with elastic net(ENet)[2] except the smoothness of two coefficients are specified by the network in GENet. Further, an efficient algorithm which has same spirit with LARS [3] is developed to solve our optimization problem. Simulation studies showed that the proposed method has better performance than network-constrained regularization without absolute values. Application to Alzheimer's disease(AD) microarray gene-expression dataset identified several subnetworks on Kyoto Encyclopedia of Genes and Genomes(KEGG) transcriptional pathways that are related to progression of AD. Many of those findings are confirmed by published literatures.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"PP 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126357434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706592
Tias Guns, Hong Sun, K. Marchal, Siegfried Nijssen
We propose a method for finding CRMs in a set of co-regulated genes. Each CRM consists of a set of binding sites of transcription factors. We wish to find CRMs involving the same transcription factors in multiple sequences. Finding such a combination of transcription factors is inherently a combinatorial problem. We solve this problem by combining the principles of itemset mining and constraint programming. The constraints involve the putative binding sites of transcription factors, the number of sequences in which they co-occur and the proximity of the binding sites. Genomic background sequences are used to assess the significance of the modules. We experimentally validate our approach and compare it with state-of-the-art techniques.
{"title":"Cis-regulatory module detection using constraint programming","authors":"Tias Guns, Hong Sun, K. Marchal, Siegfried Nijssen","doi":"10.1109/BIBM.2010.5706592","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706592","url":null,"abstract":"We propose a method for finding CRMs in a set of co-regulated genes. Each CRM consists of a set of binding sites of transcription factors. We wish to find CRMs involving the same transcription factors in multiple sequences. Finding such a combination of transcription factors is inherently a combinatorial problem. We solve this problem by combining the principles of itemset mining and constraint programming. The constraints involve the putative binding sites of transcription factors, the number of sequences in which they co-occur and the proximity of the binding sites. Genomic background sequences are used to assess the significance of the modules. We experimentally validate our approach and compare it with state-of-the-art techniques.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124902641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706611
Qinghua Jiang, Guohua Wang, Tianjiao Zhang, Yadong Wang
The identification of disease-related microRNAs is vital for understanding the pathogenesis of disease at the molecular level and may lead to the design of specific molecular tools for diagnosis, treatment and prevention. Experimental identification of disease-related microRNAs poses difficulties. Computational prediction of microRNA-disease associations is one of the complementary means. However, one major issue in microRNA studies is the lack of bioinformatics programs to accurately predict microRNA-disease associations. Herein, we present a machine learning-based approach for distinguishing positive microRNA-disease associations from negative microRNA-disease associations. A set of features was extracted for each positive and negative microRNA-disease association, and a support vector machine (SVM) classifier was trained, which achieved the area under the ROC curve of up to 0.8884 in 10-fold cross-validation procedure, indicating that the SVM-based approach described here can be used to predict potential microRNA-disease associations and formulate testable hypotheses to guide future biological experiments.
{"title":"Predicting human microRNA-disease associations based on support vector machine","authors":"Qinghua Jiang, Guohua Wang, Tianjiao Zhang, Yadong Wang","doi":"10.1109/BIBM.2010.5706611","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706611","url":null,"abstract":"The identification of disease-related microRNAs is vital for understanding the pathogenesis of disease at the molecular level and may lead to the design of specific molecular tools for diagnosis, treatment and prevention. Experimental identification of disease-related microRNAs poses difficulties. Computational prediction of microRNA-disease associations is one of the complementary means. However, one major issue in microRNA studies is the lack of bioinformatics programs to accurately predict microRNA-disease associations. Herein, we present a machine learning-based approach for distinguishing positive microRNA-disease associations from negative microRNA-disease associations. A set of features was extracted for each positive and negative microRNA-disease association, and a support vector machine (SVM) classifier was trained, which achieved the area under the ROC curve of up to 0.8884 in 10-fold cross-validation procedure, indicating that the SVM-based approach described here can be used to predict potential microRNA-disease associations and formulate testable hypotheses to guide future biological experiments.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125228097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706617
Shouyi Wang, W. Chaovalitwongse, Stephen Wong
Epileptic seizure prediction is still a very challenging and unsolved problem for medical professionals. The current bottleneck of seizure prediction techniques is the lack of flexibility for different patients with an incredible variety of epileptic seizures. This study proposes a novel self-adaptation mechanism which successfully combines reinforcement learning, online monitoring and adaptive control theory for seizure prediction. The proposed method eliminates a sophisticated threshold-tuning/optimization process, and has a great potential of flexibility and adaptability to a wide range of patients with various types of seizures. The proposed prediction system was tested on five patients with epilepsy. With the best parameter settings, it achieved an averaged accuracy of 71.34%, which is considerably better than a chance model. The autonomous adaptation property of the system offers a promising path towards development of practical online seizure prediction techniques for physicians and patients.
{"title":"A novel reinforcement learning framework for online adaptive seizure prediction","authors":"Shouyi Wang, W. Chaovalitwongse, Stephen Wong","doi":"10.1109/BIBM.2010.5706617","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706617","url":null,"abstract":"Epileptic seizure prediction is still a very challenging and unsolved problem for medical professionals. The current bottleneck of seizure prediction techniques is the lack of flexibility for different patients with an incredible variety of epileptic seizures. This study proposes a novel self-adaptation mechanism which successfully combines reinforcement learning, online monitoring and adaptive control theory for seizure prediction. The proposed method eliminates a sophisticated threshold-tuning/optimization process, and has a great potential of flexibility and adaptability to a wide range of patients with various types of seizures. The proposed prediction system was tested on five patients with epilepsy. With the best parameter settings, it achieved an averaged accuracy of 71.34%, which is considerably better than a chance model. The autonomous adaptation property of the system offers a promising path towards development of practical online seizure prediction techniques for physicians and patients.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123468424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706606
Yifeng Li, A. Ngom
Non-negative information can benefit the analysis of microarray data. This paper investigates the classification performance of non-negative matrix factorization (NMF) over gene-sample data. We also extends it to higher-order version for classification of clinical time-series data represented by tensor. Experiments show that NMF and the higher-order NMF can achieve at least comparable prediction performance.
{"title":"Non-negative matrix and tensor factorization based classification of clinical microarray gene expression data","authors":"Yifeng Li, A. Ngom","doi":"10.1109/BIBM.2010.5706606","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706606","url":null,"abstract":"Non-negative information can benefit the analysis of microarray data. This paper investigates the classification performance of non-negative matrix factorization (NMF) over gene-sample data. We also extends it to higher-order version for classification of clinical time-series data represented by tensor. Experiments show that NMF and the higher-order NMF can achieve at least comparable prediction performance.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131530408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706572
Shuichi Kawano, Teppei Shimamura, A. Niida, S. Imoto, R. Yamaguchi, Masao Nagasaki, Ryo Yoshida, C. Print, S. Miyano
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the sparse probabilistic principal component analysis. A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.
{"title":"Discovering functional gene pathways associated with cancer heterogeneity via sparse supervised learning","authors":"Shuichi Kawano, Teppei Shimamura, A. Niida, S. Imoto, R. Yamaguchi, Masao Nagasaki, Ryo Yoshida, C. Print, S. Miyano","doi":"10.1109/BIBM.2010.5706572","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706572","url":null,"abstract":"We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the sparse probabilistic principal component analysis. A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116476225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}