Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706564
Cen Gao, Jing Li
MicroRNAs are one type of noncoding RNA that regulate their target mRNAs before mRNAs are translated into proteins. Although it has been demonstrated that the regulation is through partial binding of the seed region of a miRNA and its targets, the mechanism of this process is not fully discovered. Some biological experiments have shown that even perfect base pairing in the seed region does not always guarantee the down-regulation of the targets. It has been suspected that some other characteristics of mRNAs may facilitate the regulation. An earlier study (1) has identified five additional features beyond seed matching that seem to significantly affect repressions. However, the observation that evolutionally conserved targets have shown significantly more destabilization comparing to nonconserved targets with the same score using these five features leads to the suspicion that additional features remain to be discovered. This motivates our study to identify additional features that may differentiate down-regulated mRNAs (positive set) from those not down-regulated ones (negative set) provided both sets have perfect seed matches with miRNAs. Our first attempt to search for different sequence motifs around seed site regions in the two different sets is not successful. We further construct a set of 18 sequence/structure features based on domain knowledge and evaluate them individually and jointly. By employing feature selection techniques in combination with several classification methods, we have been able to identify a subset of features that may facilitate the down-regulation of mRNAs. Our results can be incorporated into target prediction algorithms to further improve target specificities.
{"title":"Machine learning approaches for the investigation of features beyond seed matches affecting miRNA binding","authors":"Cen Gao, Jing Li","doi":"10.1109/BIBM.2010.5706564","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706564","url":null,"abstract":"MicroRNAs are one type of noncoding RNA that regulate their target mRNAs before mRNAs are translated into proteins. Although it has been demonstrated that the regulation is through partial binding of the seed region of a miRNA and its targets, the mechanism of this process is not fully discovered. Some biological experiments have shown that even perfect base pairing in the seed region does not always guarantee the down-regulation of the targets. It has been suspected that some other characteristics of mRNAs may facilitate the regulation. An earlier study (1) has identified five additional features beyond seed matching that seem to significantly affect repressions. However, the observation that evolutionally conserved targets have shown significantly more destabilization comparing to nonconserved targets with the same score using these five features leads to the suspicion that additional features remain to be discovered. This motivates our study to identify additional features that may differentiate down-regulated mRNAs (positive set) from those not down-regulated ones (negative set) provided both sets have perfect seed matches with miRNAs. Our first attempt to search for different sequence motifs around seed site regions in the two different sets is not successful. We further construct a set of 18 sequence/structure features based on domain knowledge and evaluate them individually and jointly. By employing feature selection techniques in combination with several classification methods, we have been able to identify a subset of features that may facilitate the down-regulation of mRNAs. Our results can be incorporated into target prediction algorithms to further improve target specificities.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123263357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706585
Andrew K. Rider, Geoffrey H. Siwo, S. Emrich, M. Ferdig, N. Chawla
Clustering is a common step in the analysis of microarray data. Microarrays enable simultaneous high-throughput measurement of the expression level of genes. These data can be used to explore relationships between genes and can guide development of drugs and further research. A typical first step in the analysis of these data is to use an agglomerative hierarchical clustering algorithm on the correlation between all gene pairs. While this simple approach has been successful it fails to identify many genetic interactions that may be important for drug design and other important applications. We present an approach to the clustering of expression data that utilizes known gene-gene interaction data to improve results for already commonly used clustering techniques. The approach creates an ensemble similarity measure that can be used as input to common clustering techniques and provides results with increased biological significance while not altering the clustering approach at all.
{"title":"A supervised learning approach to the unsupervised clustering of genes","authors":"Andrew K. Rider, Geoffrey H. Siwo, S. Emrich, M. Ferdig, N. Chawla","doi":"10.1109/BIBM.2010.5706585","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706585","url":null,"abstract":"Clustering is a common step in the analysis of microarray data. Microarrays enable simultaneous high-throughput measurement of the expression level of genes. These data can be used to explore relationships between genes and can guide development of drugs and further research. A typical first step in the analysis of these data is to use an agglomerative hierarchical clustering algorithm on the correlation between all gene pairs. While this simple approach has been successful it fails to identify many genetic interactions that may be important for drug design and other important applications. We present an approach to the clustering of expression data that utilizes known gene-gene interaction data to improve results for already commonly used clustering techniques. The approach creates an ensemble similarity measure that can be used as input to common clustering techniques and provides results with increased biological significance while not altering the clustering approach at all.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114946479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706643
Mehmet Tan, Faruk Polat, R. Alhajj
Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.
{"title":"Feature selection for graph kernels","authors":"Mehmet Tan, Faruk Polat, R. Alhajj","doi":"10.1109/BIBM.2010.5706643","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706643","url":null,"abstract":"Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126826198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706653
Meenakshi Mishra, Hongliang Fei, Jun Huan
As the number of new chemicals developed and being used keep adding every year, having the toxic profiles of each chemical becomes a daunting challenge. To meet this information gap, EPA suggested that certain in vitro assays and computational methods, which predict toxicity related information in much lesser time and cost than traditional in vivo methods, may be used. In this paper, we use computational techniques to use results from certain in vitro assays applied on 309 chemicals (whose toxicity profile is readily available) along with the molecular descriptors and other computed physical-chemical properties of the chemicals to predict the toxicity caused by chemical at a particular endpoint. The dataset is available from EPA TOXCAST group online. We show that Random Forest and Naïve Bayes have a good performance on this dataset. We also show that using small and related trees in random forest help to further improve the performance.
{"title":"Computational prediction of toxicity","authors":"Meenakshi Mishra, Hongliang Fei, Jun Huan","doi":"10.1109/BIBM.2010.5706653","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706653","url":null,"abstract":"As the number of new chemicals developed and being used keep adding every year, having the toxic profiles of each chemical becomes a daunting challenge. To meet this information gap, EPA suggested that certain in vitro assays and computational methods, which predict toxicity related information in much lesser time and cost than traditional in vivo methods, may be used. In this paper, we use computational techniques to use results from certain in vitro assays applied on 309 chemicals (whose toxicity profile is readily available) along with the molecular descriptors and other computed physical-chemical properties of the chemicals to predict the toxicity caused by chemical at a particular endpoint. The dataset is available from EPA TOXCAST group online. We show that Random Forest and Naïve Bayes have a good performance on this dataset. We also show that using small and related trees in random forest help to further improve the performance.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706630
Jintao Zhang, G. Lushington, Jun Huan
Despite intense investment growth and technology development, there is an observed bottleneck in drug discovery and development over the past decade. NIH started the Molecular Libraries Initiative (MLI) in 2004 to enlarge the pool for potential drug targets, especially from the “undruggable” part of human genome, and potential drug candidates from much broader types of drug-like small molecules. In this paper we used the concepts of network biology to integrate MLI data with other biological databases such as DrugBank and UniHI, and evaluated the potential of MLI target proteins being new drug targets. Our analysis provided some measures of the value of the MLI data as a resource for both basic chemical biology research and future therapeutic discovery.
{"title":"Exploratory analysis of the BioAssay Network with implications to therapeutic discovery","authors":"Jintao Zhang, G. Lushington, Jun Huan","doi":"10.1109/BIBM.2010.5706630","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706630","url":null,"abstract":"Despite intense investment growth and technology development, there is an observed bottleneck in drug discovery and development over the past decade. NIH started the Molecular Libraries Initiative (MLI) in 2004 to enlarge the pool for potential drug targets, especially from the “undruggable” part of human genome, and potential drug candidates from much broader types of drug-like small molecules. In this paper we used the concepts of network biology to integrate MLI data with other biological databases such as DrugBank and UniHI, and evaluated the potential of MLI target proteins being new drug targets. Our analysis provided some measures of the value of the MLI data as a resource for both basic chemical biology research and future therapeutic discovery.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127748640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706554
Xin Chen, Xiaohua Hu, Xiajiong Shen, G. Rosen
Recently, the concept of a species containing both core and distributed genes, known as the supra- or pangenome theory, has been introduced. In this paper, we aim to develop a new method that is able to analyze the genome-level composition of DNA sequences, in order to characterize a set of common genomic features shared by the same species and tell their functional roles. To achieve this end, we firstly apply a composition-based approach to break down DNA sequences into sub-reads called the ‘N-mer’ and represent the sequences by N-mer frequencies. Then, we introduce the Latent Dirichlet Allocation (LDA) model to study the genome-level statistic patterns (a.k.a. latent topics) of the ‘N-mer’ features. Each estimated latent topic represents a certain component of the whole genome. With the help of the BioJava toolkit, we access to the gene region information of reference sequences from the NCBI database. We use our data mining framework to investigate two areas: 1) do strains within species share similar core and distributed topics? and 2) do genes with similar functional roles contain similar latent topics? After studying the mutual information between latent topics and gene regions, we provide examples of each, where the BioCyc database is used to correlate pathway and reaction information to the genes. The examples demonstrate the effectiveness of proposed method.
{"title":"Probabilistic topic modeling for genomic data interpretation","authors":"Xin Chen, Xiaohua Hu, Xiajiong Shen, G. Rosen","doi":"10.1109/BIBM.2010.5706554","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706554","url":null,"abstract":"Recently, the concept of a species containing both core and distributed genes, known as the supra- or pangenome theory, has been introduced. In this paper, we aim to develop a new method that is able to analyze the genome-level composition of DNA sequences, in order to characterize a set of common genomic features shared by the same species and tell their functional roles. To achieve this end, we firstly apply a composition-based approach to break down DNA sequences into sub-reads called the ‘N-mer’ and represent the sequences by N-mer frequencies. Then, we introduce the Latent Dirichlet Allocation (LDA) model to study the genome-level statistic patterns (a.k.a. latent topics) of the ‘N-mer’ features. Each estimated latent topic represents a certain component of the whole genome. With the help of the BioJava toolkit, we access to the gene region information of reference sequences from the NCBI database. We use our data mining framework to investigate two areas: 1) do strains within species share similar core and distributed topics? and 2) do genes with similar functional roles contain similar latent topics? After studying the mutual information between latent topics and gene regions, we provide examples of each, where the BioCyc database is used to correlate pathway and reaction information to the genes. The examples demonstrate the effectiveness of proposed method.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132434944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706645
Jin Xu, Qiwei Li, Xiaodan Fan, V. Li, S. Li
Evolutionary Monte Carlo (EMC) algorithm is an effective and powerful method to sample complicated distributions. Short adjacent repeats identification problem (SARIP), i.e., searching for the common sequence pattern in multiple DNA sequences, is considered as one of the key challenges in the field of bioinformatics. A recently proposed Markov chain Monte Carlo (MCMC) algorithm has demonstrated its effectiveness in solving SARIP. However, high computation time and inevitable local optima hinder its wide application. In this paper, we apply EMC to parallelize the MCMC algorithm to solve SARIP. Our proposed EMC scheme is implemented on a parallel platform and the simulation results show that, compared with the conventional MCMC algorithm, EMC not only improves the quality of final solution but also reduces the computation time.
{"title":"An Evolutionary Monte Carlo algorithm for identifying short adjacent repeats in multiple sequences","authors":"Jin Xu, Qiwei Li, Xiaodan Fan, V. Li, S. Li","doi":"10.1109/BIBM.2010.5706645","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706645","url":null,"abstract":"Evolutionary Monte Carlo (EMC) algorithm is an effective and powerful method to sample complicated distributions. Short adjacent repeats identification problem (SARIP), i.e., searching for the common sequence pattern in multiple DNA sequences, is considered as one of the key challenges in the field of bioinformatics. A recently proposed Markov chain Monte Carlo (MCMC) algorithm has demonstrated its effectiveness in solving SARIP. However, high computation time and inevitable local optima hinder its wide application. In this paper, we apply EMC to parallelize the MCMC algorithm to solve SARIP. Our proposed EMC scheme is implemented on a parallel platform and the simulation results show that, compared with the conventional MCMC algorithm, EMC not only improves the quality of final solution but also reduces the computation time.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"641 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133102165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706576
Wei Zhang, S. Emrich, Erliang Zeng
Analysis of gene expression data has emerged as an important approach to discover active pathways related to biological phenotypes. Previous pathway analysis methods use all genes in a pathway for linking it to a particular phenotype. Using only a subset of informative genes, however, could better classify samples. Here, we propose a two-stage machine learning approach for pathway analysis. During the first stage, informative genes that can represent a pathway are selected using feature selection methods. These “representative genes” are mostly associated with the phenotype of interest. In the second stage, pathways are ranked based on their “representative genes” using classification methods. We applied our two-stage approach on three gene expression datasets. The results indicate our method does outperform methods that consider every gene in a pathway.
{"title":"A two-stage machine learning approach for pathway analysis","authors":"Wei Zhang, S. Emrich, Erliang Zeng","doi":"10.1109/BIBM.2010.5706576","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706576","url":null,"abstract":"Analysis of gene expression data has emerged as an important approach to discover active pathways related to biological phenotypes. Previous pathway analysis methods use all genes in a pathway for linking it to a particular phenotype. Using only a subset of informative genes, however, could better classify samples. Here, we propose a two-stage machine learning approach for pathway analysis. During the first stage, informative genes that can represent a pathway are selected using feature selection methods. These “representative genes” are mostly associated with the phenotype of interest. In the second stage, pathways are ranked based on their “representative genes” using classification methods. We applied our two-stage approach on three gene expression datasets. The results indicate our method does outperform methods that consider every gene in a pathway.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123955882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706595
Zina M. Ibrahim, A. Ngom, Ahmed Y. Tawfik
This paper extends our work to using qualitative probability to model the naturally-occurring motifs of gene regulatory networks. Having showed in [16] that the qualitative relations defining QPN graphs exhibit a direct mapping to the naturally-occurring network motifs embedded in Gene Regulatory Networks, this work is concerned with generalizing QPN constructs to create a high-level framework from which any regulatory network motif can be derived. Experimental results using time-series data of the Saccha-romyces Cerevisiae show the effectiveness of our approach in providing a more accurate description of the regulatory motifs in the Saccharomyces Cerevisiae gene regulatory network compared to our previous definitions.
{"title":"A dynamic qualitative probabilistic network approach for extracting gene regulatory network motifs","authors":"Zina M. Ibrahim, A. Ngom, Ahmed Y. Tawfik","doi":"10.1109/BIBM.2010.5706595","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706595","url":null,"abstract":"This paper extends our work to using qualitative probability to model the naturally-occurring motifs of gene regulatory networks. Having showed in [16] that the qualitative relations defining QPN graphs exhibit a direct mapping to the naturally-occurring network motifs embedded in Gene Regulatory Networks, this work is concerned with generalizing QPN constructs to create a high-level framework from which any regulatory network motif can be derived. Experimental results using time-series data of the Saccha-romyces Cerevisiae show the effectiveness of our approach in providing a more accurate description of the regulatory motifs in the Saccharomyces Cerevisiae gene regulatory network compared to our previous definitions.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-12-01DOI: 10.1109/BIBM.2010.5706565
Xumeng Li, F. Feltus, Xiaoqian Sun, Zijun Wang, Feng Luo
Identification of genes and pathways involving in diseases and physiological conditions is a major task in systems biology. In this study, we develop a new non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also propose a simulated annealing algorithm to find the optimal configuration of the Ising model. We test the Ising model to two breast cancer microarray data sets. The results show that more cancer related differentially expressed subnetworks and genes are identified by the Ising model than by the Markov random filed (MRF) model.
{"title":"A non-parameter Ising model for network-based identification of differentially expressed genes in recurrent breast cancer patients","authors":"Xumeng Li, F. Feltus, Xiaoqian Sun, Zijun Wang, Feng Luo","doi":"10.1109/BIBM.2010.5706565","DOIUrl":"https://doi.org/10.1109/BIBM.2010.5706565","url":null,"abstract":"Identification of genes and pathways involving in diseases and physiological conditions is a major task in systems biology. In this study, we develop a new non-parameter Ising model to integrate protein-protein interaction network and microarray data for identifying differentially expressed (DE) genes. We also propose a simulated annealing algorithm to find the optimal configuration of the Ising model. We test the Ising model to two breast cancer microarray data sets. The results show that more cancer related differentially expressed subnetworks and genes are identified by the Ising model than by the Markov random filed (MRF) model.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115217355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}