Lung cancer is the leading cause of cancer deaths worldwide. The identification of lung cancer risk disease sub-networks not only cancer deaths worldwide. The identification of lung cancer risk disease sub-networks not only helps toy helps to understand lung cancer mechanism better, but also provide the potential benefits for the early diagnosis and lead to important applications such as drug targeting. Although some researches are devoted to investigating the carcinogenic process of lung cancer, these approaches have still some limitation. In this paper, the differentially expressed genes are scored and ranked in according to the method of augmented fuzzy measure similarity for obtaining the seed genes. Then, the model of random walk with restarts is used to identify risk disease sub-networks in the PPI network. At last 37 risk disease sub-networks are exploited from the PPI network, which play an important potential role in the carcinogenic process of the lung cancer disease. In terms of the proof and comments in the existing literatures, the identified results show that the proposed method works well in identifying the significant lung cancer risk disease sub-networks, and it is also suitable to recognize other complex risk disease sub-networks.
{"title":"A seed-based approach to identify risk disease sub-networks in human lung cancer","authors":"Yi-Bin Wang, Yong-mei Cheng, Shaowu Zhang, Wei Chen","doi":"10.1109/ISB.2012.6314125","DOIUrl":"https://doi.org/10.1109/ISB.2012.6314125","url":null,"abstract":"Lung cancer is the leading cause of cancer deaths worldwide. The identification of lung cancer risk disease sub-networks not only cancer deaths worldwide. The identification of lung cancer risk disease sub-networks not only helps toy helps to understand lung cancer mechanism better, but also provide the potential benefits for the early diagnosis and lead to important applications such as drug targeting. Although some researches are devoted to investigating the carcinogenic process of lung cancer, these approaches have still some limitation. In this paper, the differentially expressed genes are scored and ranked in according to the method of augmented fuzzy measure similarity for obtaining the seed genes. Then, the model of random walk with restarts is used to identify risk disease sub-networks in the PPI network. At last 37 risk disease sub-networks are exploited from the PPI network, which play an important potential role in the carcinogenic process of the lung cancer disease. In terms of the proof and comments in the existing literatures, the identified results show that the proposed method works well in identifying the significant lung cancer risk disease sub-networks, and it is also suitable to recognize other complex risk disease sub-networks.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134200813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-27DOI: 10.1109/ISB.2012.6314145
Jingde Bu, Jiayan Wu, Meili Chen, Jingfa Xiao, Jun Yu, Xue-bin Chi, Zhong Jin
RNA-Seq is a revolutionary whole transcriptome shotgun sequencing technology performed by high-throughput sequencers, which provide more comprehensive information on differential expression of genes and benefit on novel splice variants identification. RNA-Seq reads is so short that it's a great challenge on mapping reads back to the reference effectively, especially when they span two or more exons. To improve the mapping efficiency, we introduce here a bi-direction alignment tool - BAsplice, which use RNA-Seq data to detect splice junctions without any additional information. Compare with another splice junction mapping software, SOAPsplice, BAsplice performs better in call rate and running time, but a little worse in accuracy. BAsplice is a free open-source software written in C language. It is available at https://github.com/vlcc/basplice.
{"title":"BAsplice: Bi-direction alignment for detecting splice junctions","authors":"Jingde Bu, Jiayan Wu, Meili Chen, Jingfa Xiao, Jun Yu, Xue-bin Chi, Zhong Jin","doi":"10.1109/ISB.2012.6314145","DOIUrl":"https://doi.org/10.1109/ISB.2012.6314145","url":null,"abstract":"RNA-Seq is a revolutionary whole transcriptome shotgun sequencing technology performed by high-throughput sequencers, which provide more comprehensive information on differential expression of genes and benefit on novel splice variants identification. RNA-Seq reads is so short that it's a great challenge on mapping reads back to the reference effectively, especially when they span two or more exons. To improve the mapping efficiency, we introduce here a bi-direction alignment tool - BAsplice, which use RNA-Seq data to detect splice junctions without any additional information. Compare with another splice junction mapping software, SOAPsplice, BAsplice performs better in call rate and running time, but a little worse in accuracy. BAsplice is a free open-source software written in C language. It is available at https://github.com/vlcc/basplice.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133070870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-08-20DOI: 10.1109/ISB.2012.6314128
M. Hayashida, M. Kamada, Jiangning Song, T. Akutsu
Understanding of interactions between proteins and RNAs is essential to reveal networks and functions of molecules in cellular systems. Many studies have been done for analyzing and investigating interactions between protein residues and RNA bases. For interactions between protein residues, it is supported that residues at interacting sites have co-evolved with the corresponding residues in the partner protein to keep the interactions between the proteins. In our previous work, on the basis of this idea, we calculated mutual information (MI) between residues from multiple sequence alignments of homologous proteins for identifying interacting pairs of residues in interacting proteins, and combined it with the discriminative random field (DRF), which is useful to extract some characteristic regions from an image in the field of image processing, and is a special type of conditional random fields (CRFs). In a similar way, in this paper, we make use of mutual information for predicting interactions between protein residues and RNA bases. Furthermore, we introduce labels of amino acids and bases as features of a simple two-dimensional CRF instead of DRF. To evaluate our method, we perform computational experiments for several interactions between Pfam domains and Rfam entries. The results suggest that the CRF model with MI and labels is more useful than the CRF model with only MI.
{"title":"Predicting protein-RNA residue-base contacts using two-dimensional conditional random field","authors":"M. Hayashida, M. Kamada, Jiangning Song, T. Akutsu","doi":"10.1109/ISB.2012.6314128","DOIUrl":"https://doi.org/10.1109/ISB.2012.6314128","url":null,"abstract":"Understanding of interactions between proteins and RNAs is essential to reveal networks and functions of molecules in cellular systems. Many studies have been done for analyzing and investigating interactions between protein residues and RNA bases. For interactions between protein residues, it is supported that residues at interacting sites have co-evolved with the corresponding residues in the partner protein to keep the interactions between the proteins. In our previous work, on the basis of this idea, we calculated mutual information (MI) between residues from multiple sequence alignments of homologous proteins for identifying interacting pairs of residues in interacting proteins, and combined it with the discriminative random field (DRF), which is useful to extract some characteristic regions from an image in the field of image processing, and is a special type of conditional random fields (CRFs). In a similar way, in this paper, we make use of mutual information for predicting interactions between protein residues and RNA bases. Furthermore, we introduce labels of amino acids and bases as features of a simple two-dimensional CRF instead of DRF. To evaluate our method, we perform computational experiments for several interactions between Pfam domains and Rfam entries. The results suggest that the CRF model with MI and labels is more useful than the CRF model with only MI.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127093053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Glioblastoma multiforme (GBM) is the most common and aggressive type of brain tumor in humans. Distinguishing “driver” mutations from passively selected “passengers” is a central challenge in computational cancer biology. Because of mutational heterogeneity, analyses that extend beyond single genes are often restricted to examine known pathways and functional modules for enrichment of somatic mutations. In this paper we present a network-based method to identify mutated core modules for tumors without any prior information other than the data of somatic mutations and gene expressions from tumor patients. Firstly, two networks with weighted vertices and weighted edges are constructed by using the mutations and expressions, respectively. Then these two networks are combined to get an integrative network, for which an optimization model is used to identify the most coherent subnetworks. With the significance and exclusivity tests we get the core modules for tumors. By applying our method to The Cancer Genome Atlas (TCGA) GBM data, we obtained three core modules, which contain not only oncogenes and tumor suppressors that have been previously implicated in GBM pathogenesis (e.g., EGFR, TP53, PTEN, NF1 and RB1), but also some genes which have not or rarely been reported earlier in the context of glioblastoma multiforme (e.g., DST, PRAME and SYNE1). Thus, in addition to present generally applicable methodology, our findings provide several GBM candidate genes for further studies.
{"title":"Identifying mutated core modules in glioblastoma by integrative network analysis","authors":"Junhua Zhang, Shihua Zhang, Yong Wang, Junfei Zhao, Xiang-Sun Zhang","doi":"10.1109/ISB.2012.6314154","DOIUrl":"https://doi.org/10.1109/ISB.2012.6314154","url":null,"abstract":"Glioblastoma multiforme (GBM) is the most common and aggressive type of brain tumor in humans. Distinguishing “driver” mutations from passively selected “passengers” is a central challenge in computational cancer biology. Because of mutational heterogeneity, analyses that extend beyond single genes are often restricted to examine known pathways and functional modules for enrichment of somatic mutations. In this paper we present a network-based method to identify mutated core modules for tumors without any prior information other than the data of somatic mutations and gene expressions from tumor patients. Firstly, two networks with weighted vertices and weighted edges are constructed by using the mutations and expressions, respectively. Then these two networks are combined to get an integrative network, for which an optimization model is used to identify the most coherent subnetworks. With the significance and exclusivity tests we get the core modules for tumors. By applying our method to The Cancer Genome Atlas (TCGA) GBM data, we obtained three core modules, which contain not only oncogenes and tumor suppressors that have been previously implicated in GBM pathogenesis (e.g., EGFR, TP53, PTEN, NF1 and RB1), but also some genes which have not or rarely been reported earlier in the context of glioblastoma multiforme (e.g., DST, PRAME and SYNE1). Thus, in addition to present generally applicable methodology, our findings provide several GBM candidate genes for further studies.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"24 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113935421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-08-01DOI: 10.1109/ISB.2012.6314147
C. Fu, Ling Jing, S. Deng, G. Jin
Identification of oncogenic genes from comprehensive genomics data with large sample size is of challenge. Here, we apply a well-established computational model, Bayesian factor and regression model (BFRM), to predict unknown colon cancer genes from colon adenocarcinoma genomic data. The BFRM takes advantages of its latent factors to characterize the underlying association between genes and the large number of colon cancer patients. Based on the known cancer genes in Online Mendelian Inheritance in Man (OMIM), we addressed three important latent factors focusing on characterization of heterogeneity of expression patterns related to specific oncogenic genes from the microarray data of 174 colon cancer patients. We found that the three latent factors can be employed to predict unknown colon cancer genes using the known oncogenic genes. These predicted unknown cancer genes were extensively validated by using the new somatic genes identified in the same patients from DNA sequencing data.
{"title":"Identification of oncogenic genes for colon adenocarcinoma from genomics data","authors":"C. Fu, Ling Jing, S. Deng, G. Jin","doi":"10.1109/ISB.2012.6314147","DOIUrl":"https://doi.org/10.1109/ISB.2012.6314147","url":null,"abstract":"Identification of oncogenic genes from comprehensive genomics data with large sample size is of challenge. Here, we apply a well-established computational model, Bayesian factor and regression model (BFRM), to predict unknown colon cancer genes from colon adenocarcinoma genomic data. The BFRM takes advantages of its latent factors to characterize the underlying association between genes and the large number of colon cancer patients. Based on the known cancer genes in Online Mendelian Inheritance in Man (OMIM), we addressed three important latent factors focusing on characterization of heterogeneity of expression patterns related to specific oncogenic genes from the microarray data of 174 colon cancer patients. We found that the three latent factors can be employed to predict unknown colon cancer genes using the known oncogenic genes. These predicted unknown cancer genes were extensively validated by using the new somatic genes identified in the same patients from DNA sequencing data.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128920423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-08-01DOI: 10.1109/ISB.2012.6314126
Zhiping Liu, Wanwei Zhang, K. Horimoto, Luonan Chen
With rapid accumulation of functional relationships between biological molecules, knowledge-based networks have been constructed and stocked in many databases. These networks provide the curated and comprehensive information for the functional linkages among genes and proteins, while their activities are highly related with specific phenotypes and conditions. To evaluate a knowledge-based network in a specific condition, measuring the consistency between its structure and the conditionally specific gene expression profiling data is an important criterion. In this work, we propose a Gaussian graphical model to evaluate the documented regulatory networks by the consistency between network architectures and time-series gene expression profiles. By developing a dynamical Bayesian network model, we derive a new method to evaluate gene regulatory networks in both simulated and true time series microarray data. The regulatory networks are evaluated by matching a network structure and gene expressions, which are achieved by randomly rewiring the regulatory structures. To demonstrate the effectiveness of our method, we identify the significant regulatory networks in response to the time series gene expression of circadian rhythm. Moreover, the knowledge-based networks are screened and ranked by their consistencies of structures based on dynamical gene expressions.
{"title":"A Gaussian graphical model for identifying significantly responsive regulatory networks from time series gene expression data","authors":"Zhiping Liu, Wanwei Zhang, K. Horimoto, Luonan Chen","doi":"10.1109/ISB.2012.6314126","DOIUrl":"https://doi.org/10.1109/ISB.2012.6314126","url":null,"abstract":"With rapid accumulation of functional relationships between biological molecules, knowledge-based networks have been constructed and stocked in many databases. These networks provide the curated and comprehensive information for the functional linkages among genes and proteins, while their activities are highly related with specific phenotypes and conditions. To evaluate a knowledge-based network in a specific condition, measuring the consistency between its structure and the conditionally specific gene expression profiling data is an important criterion. In this work, we propose a Gaussian graphical model to evaluate the documented regulatory networks by the consistency between network architectures and time-series gene expression profiles. By developing a dynamical Bayesian network model, we derive a new method to evaluate gene regulatory networks in both simulated and true time series microarray data. The regulatory networks are evaluated by matching a network structure and gene expressions, which are achieved by randomly rewiring the regulatory structures. To demonstrate the effectiveness of our method, we identify the significant regulatory networks in response to the time series gene expression of circadian rhythm. Moreover, the knowledge-based networks are screened and ranked by their consistencies of structures based on dynamical gene expressions.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132205813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-08-01DOI: 10.1109/ISB.2012.6314146
Xinghuo Ye, Juan Liu, Fang-Xiang Wu
Genes, transcription factors (TF), microRNAs (miRNA) are well-known to have important regulating roles in dynamic biological processes. In the last years, many studies have been devoted to the elucidation of transcriptional or post-transcriptional regulating activities of TFs or miRNAs, respectively. However, very limited attempts have been made to consider the dynamic characteristics of miRNA-TF-mRNA circuits, which are the biological network motifs considering miRNAs, TFs and genes as a whole in the complicated biological procedures like mouse lung development. Here we propose to mine miRNA-TF-mRNA circuits related to the mouse lung development by integrating TF-mRNA, miRNA-mRNA, TF-miRNA, and time-course expression data, and to further analyze the variations of these circuits in different stages of the lung development. To our best knowledge, this is the first time to take transcriptional and post-transcriptional information together to describe the mouse lung development. Our preliminary results show that miRNA-TF-mRNA circuits vary in different stages of the lung development and play different roles.
{"title":"Dynamic miRNA-TF-mRNA circuits in mouse lung development","authors":"Xinghuo Ye, Juan Liu, Fang-Xiang Wu","doi":"10.1109/ISB.2012.6314146","DOIUrl":"https://doi.org/10.1109/ISB.2012.6314146","url":null,"abstract":"Genes, transcription factors (TF), microRNAs (miRNA) are well-known to have important regulating roles in dynamic biological processes. In the last years, many studies have been devoted to the elucidation of transcriptional or post-transcriptional regulating activities of TFs or miRNAs, respectively. However, very limited attempts have been made to consider the dynamic characteristics of miRNA-TF-mRNA circuits, which are the biological network motifs considering miRNAs, TFs and genes as a whole in the complicated biological procedures like mouse lung development. Here we propose to mine miRNA-TF-mRNA circuits related to the mouse lung development by integrating TF-mRNA, miRNA-mRNA, TF-miRNA, and time-course expression data, and to further analyze the variations of these circuits in different stages of the lung development. To our best knowledge, this is the first time to take transcriptional and post-transcriptional information together to describe the mouse lung development. Our preliminary results show that miRNA-TF-mRNA circuits vary in different stages of the lung development and play different roles.","PeriodicalId":224011,"journal":{"name":"2012 IEEE 6th International Conference on Systems Biology (ISB)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116950339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}