Pub Date : 2009-01-01Epub Date: 2009-04-08DOI: 10.1155/2009/924601
Julien Epps
Many studies of biological sequence data have examined sequence structure in terms of periodicity, and various methods for measuring periodicity have been suggested for this purpose. This paper compares two such methods, autocorrelation and the Fourier transform, using synthetic periodic sequences, and explains the differences in periodicity estimates produced by each. A hybrid autocorrelation-integer period discrete Fourier transform is proposed that combines the advantages of both techniques. Collectively, this representation and a recently proposed variant on the discrete Fourier transform offer alternatives to the widely used autocorrelation for the periodicity characterization of sequence data. Finally, these methods are compared for various tetramers of interest in C. elegans chromosome I.
{"title":"A hybrid technique for the periodicity characterization of genomic sequence data.","authors":"Julien Epps","doi":"10.1155/2009/924601","DOIUrl":"https://doi.org/10.1155/2009/924601","url":null,"abstract":"<p><p>Many studies of biological sequence data have examined sequence structure in terms of periodicity, and various methods for measuring periodicity have been suggested for this purpose. This paper compares two such methods, autocorrelation and the Fourier transform, using synthetic periodic sequences, and explains the differences in periodicity estimates produced by each. A hybrid autocorrelation-integer period discrete Fourier transform is proposed that combines the advantages of both techniques. Collectively, this representation and a recently proposed variant on the discrete Fourier transform offer alternatives to the widely used autocorrelation for the periodicity characterization of sequence data. Finally, these methods are compared for various tetramers of interest in C. elegans chromosome I.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"924601"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/924601","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28183517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingzhou Joe Song, Chris K Lewis, Eric R Lance, Elissa J Chesler, Roumyana Kirova Yordanova, Michael A Langston, Kerrie H Lodowski, Susan E Bergeson
Gene expression time course data can be used not only to detect differentially expressed genes but also to find temporal associations among genes. The problem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from transcriptomic data is addressed. A network reconstruction algorithm was developed that uses statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. The multinomial hypothesis testing-based network reconstruction allows for explicit specification of the false-positive rate, unique from all extant network inference algorithms. The method is superior to dynamic Bayesian network modeling in a simulation study. Temporal gene expression data from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol are used for modeling. Genes from major neuronal pathways are identified as putative components of the alcohol response mechanism. Nine of these genes have associations with alcohol reported in literature. Several other potentially relevant genes, compatible with independent results from literature mining, may play a role in the response to alcohol. Additional, previously unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.
{"title":"Reconstructing generalized logical networks of transcriptional regulation in mouse brain from temporal gene expression data.","authors":"Mingzhou Joe Song, Chris K Lewis, Eric R Lance, Elissa J Chesler, Roumyana Kirova Yordanova, Michael A Langston, Kerrie H Lodowski, Susan E Bergeson","doi":"10.1155/2009/545176","DOIUrl":"https://doi.org/10.1155/2009/545176","url":null,"abstract":"<p><p>Gene expression time course data can be used not only to detect differentially expressed genes but also to find temporal associations among genes. The problem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from transcriptomic data is addressed. A network reconstruction algorithm was developed that uses statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. The multinomial hypothesis testing-based network reconstruction allows for explicit specification of the false-positive rate, unique from all extant network inference algorithms. The method is superior to dynamic Bayesian network modeling in a simulation study. Temporal gene expression data from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol are used for modeling. Genes from major neuronal pathways are identified as putative components of the alcohol response mechanism. Nine of these genes have associations with alcohol reported in literature. Several other potentially relevant genes, compatible with independent results from literature mining, may play a role in the response to alcohol. Additional, previously unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"545176"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/545176","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9801481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reverse engineering of gene regulatory networks has been an intensively studied topic in bioinformatics since it constitutes an intermediate step from explorative to causative gene expression analysis. Many methods have been proposed through recent years leading to a wide range of mathematical approaches. In practice, different mathematical approaches will generate different resulting network structures, thus, it is very important for users to assess the performance of these algorithms. We have conducted a comparative study with six different reverse engineering methods, including relevance networks, neural networks, and Bayesian networks. Our approach consists of the generation of defined benchmark data, the analysis of these data with the different methods, and the assessment of algorithmic performances by statistical analyses. Performance was judged by network size and noise levels. The results of the comparative study highlight the neural network approach as best performing method among those under study.
{"title":"Reverse engineering of gene regulatory networks: a comparative study.","authors":"Hendrik Hache, Hans Lehrach, Ralf Herwig","doi":"10.1155/2009/617281","DOIUrl":"https://doi.org/10.1155/2009/617281","url":null,"abstract":"<p><p>Reverse engineering of gene regulatory networks has been an intensively studied topic in bioinformatics since it constitutes an intermediate step from explorative to causative gene expression analysis. Many methods have been proposed through recent years leading to a wide range of mathematical approaches. In practice, different mathematical approaches will generate different resulting network structures, thus, it is very important for users to assess the performance of these algorithms. We have conducted a comparative study with six different reverse engineering methods, including relevance networks, neural networks, and Bayesian networks. Our approach consists of the generation of defined benchmark data, the analysis of these data with the different methods, and the assessment of algorithmic performances by statistical analyses. Performance was judged by network size and noise levels. The results of the comparative study highlight the neural network approach as best performing method among those under study.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"617281"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/617281","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9433273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-01Epub Date: 2009-04-14DOI: 10.1155/2009/491074
Byung-Jun Yoon
When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the "seed" regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment.
{"title":"Efficient alignment of RNAs with pseudoknots using sequence alignment constraints.","authors":"Byung-Jun Yoon","doi":"10.1155/2009/491074","DOIUrl":"https://doi.org/10.1155/2009/491074","url":null,"abstract":"<p><p>When aligning RNAs, it is important to consider both the secondary structure similarity and primary sequence similarity to find an accurate alignment. However, algorithms that can handle RNA secondary structures typically have high computational complexity that limits their utility. For this reason, there have been a number of attempts to find useful alignment constraints that can reduce the computations without sacrificing the alignment accuracy. In this paper, we propose a new method for finding effective alignment constraints for fast and accurate structural alignment of RNAs, including pseudoknots. In the proposed method, we use a profile-HMM to identify the \"seed\" regions that can be aligned with high confidence. We also estimate the position range of the aligned bases that are located outside the seed regions. The location of the seed regions and the estimated range of the alignment positions are then used to establish the sequence alignment constraints. We incorporated the proposed constraints into the profile context-sensitive HMM (profile-csHMM) based RNA structural alignment algorithm. Experiments indicate that the proposed method can make the alignment speed up to 11 times faster without degrading the accuracy of the RNA alignment.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"491074"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/491074","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28129009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-01Epub Date: 2008-10-16DOI: 10.1155/2009/362309
Simon Rosenfeld
Two major approaches are known in the field of stochastic dynamics of intracellular biochemical networks. The first one places the focus of attention on the fact that many biochemical constituents vitally important for the network functionality may be present only in small quantities within the cell, and therefore the regulatory process is essentially discrete and prone to relatively big fluctuations. The second approach treats the regulatory process as essentially continuous. Complex pseudostochastic behavior in such processes may occur due to multistability and oscillatory motions within limit cycles. In this paper we outline the third scenario of stochasticity in the regulatory process. This scenario is only conceivable in high-dimensional highly nonlinear systems. In particular, we show that burstiness, a well-known phenomenon in the biology of gene expression, is a natural consequence of high dimensionality coupled with high nonlinearity. In mathematical terms, burstiness is associated with heavy-tailed probability distributions of stochastic processes describing the dynamics of the system. We demonstrate how the "shot" noise originates from purely deterministic behavior of the underlying dynamical system. We conclude that the limiting stochastic process may be accurately approximated by the "heavy-tailed" generalized Pareto process which is a direct mathematical expression of burstiness.
{"title":"Origins of stochasticity and burstiness in high-dimensional biochemical networks.","authors":"Simon Rosenfeld","doi":"10.1155/2009/362309","DOIUrl":"https://doi.org/10.1155/2009/362309","url":null,"abstract":"<p><p>Two major approaches are known in the field of stochastic dynamics of intracellular biochemical networks. The first one places the focus of attention on the fact that many biochemical constituents vitally important for the network functionality may be present only in small quantities within the cell, and therefore the regulatory process is essentially discrete and prone to relatively big fluctuations. The second approach treats the regulatory process as essentially continuous. Complex pseudostochastic behavior in such processes may occur due to multistability and oscillatory motions within limit cycles. In this paper we outline the third scenario of stochasticity in the regulatory process. This scenario is only conceivable in high-dimensional highly nonlinear systems. In particular, we show that burstiness, a well-known phenomenon in the biology of gene expression, is a natural consequence of high dimensionality coupled with high nonlinearity. In mathematical terms, burstiness is associated with heavy-tailed probability distributions of stochastic processes describing the dynamics of the system. We demonstrate how the \"shot\" noise originates from purely deterministic behavior of the underlying dynamical system. We conclude that the limiting stochastic process may be accurately approximated by the \"heavy-tailed\" generalized Pareto process which is a direct mathematical expression of burstiness.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"362309"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/362309","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"27814720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-01Epub Date: 2010-03-02DOI: 10.1155/2009/504069
Youting Sun, Ulisses Braga-Neto, Edward R Dougherty
Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.
{"title":"Impact of missing value imputation on classification for DNA microarray gene expression data--a model-based study.","authors":"Youting Sun, Ulisses Braga-Neto, Edward R Dougherty","doi":"10.1155/2009/504069","DOIUrl":"https://doi.org/10.1155/2009/504069","url":null,"abstract":"<p><p>Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":"2009 ","pages":"504069"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/504069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28772066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-01Epub Date: 2009-04-15DOI: 10.1155/2009/683463
Wentao Zhao, Erchin Serpedin, Edward R Dougherty
Based on time series gene expressions, cyclic genes can be recognized via spectral analysis and statistical periodicity detection tests. These cyclic genes are usually associated with cyclic biological processes, for example, cell cycle and circadian rhythm. The power of a scheme is practically measured by comparing the detected periodically expressed genes with experimentally verified genes participating in a cyclic process. However, in the above mentioned procedure the valuable prior knowledge only serves as an evaluation benchmark, and it is not fully exploited in the implementation of the algorithm. In addition, partial data sets are also disregarded due to their nonstationarity. This paper proposes a novel algorithm to identify cyclic-process-involved genes by integrating the prior knowledge with the gene expression analysis. The proposed algorithm is applied on data sets corresponding to Saccharomyces cerevisiae and Drosophila melanogaster, respectively. Biological evidences are found to validate the roles of the discovered genes in cell cycle and circadian rhythm. Dendrograms are presented to cluster the identified genes and to reveal expression patterns. It is corroborated that the proposed novel identification scheme provides a valuable technique for unveiling pathways related to cyclic processes.
{"title":"Identifying genes involved in cyclic processes by combining gene expression analysis and prior knowledge.","authors":"Wentao Zhao, Erchin Serpedin, Edward R Dougherty","doi":"10.1155/2009/683463","DOIUrl":"https://doi.org/10.1155/2009/683463","url":null,"abstract":"<p><p>Based on time series gene expressions, cyclic genes can be recognized via spectral analysis and statistical periodicity detection tests. These cyclic genes are usually associated with cyclic biological processes, for example, cell cycle and circadian rhythm. The power of a scheme is practically measured by comparing the detected periodically expressed genes with experimentally verified genes participating in a cyclic process. However, in the above mentioned procedure the valuable prior knowledge only serves as an evaluation benchmark, and it is not fully exploited in the implementation of the algorithm. In addition, partial data sets are also disregarded due to their nonstationarity. This paper proposes a novel algorithm to identify cyclic-process-involved genes by integrating the prior knowledge with the gene expression analysis. The proposed algorithm is applied on data sets corresponding to Saccharomyces cerevisiae and Drosophila melanogaster, respectively. Biological evidences are found to validate the roles of the discovered genes in cell cycle and circadian rhythm. Dendrograms are presented to cluster the identified genes and to reveal expression patterns. It is corroborated that the proposed novel identification scheme provides a valuable technique for unveiling pathways related to cyclic processes.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"683463"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/683463","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28129936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nested effects models (NEMs) are a class of probabilistic models that were designed to reconstruct a hidden signalling structure from a large set of observable effects caused by active interventions into the signalling pathway. We give a more flexible formulation of NEMs in the language of Bayesian networks. Our framework constitutes a natural generalization of the original NEM model, since it explicitly states the assumptions that are tacitly underlying the original version. Our approach gives rise to new learning methods for NEMs, which have been implemented in the R/Bioconductor package nem. We validate these methods in a simulation study and apply them to a synthetic lethality dataset in yeast.
{"title":"A Bayesian network view on nested effects models.","authors":"Cordula Zeller, Holger Fröhlich, Achim Tresch","doi":"10.1155/2009/195272","DOIUrl":"https://doi.org/10.1155/2009/195272","url":null,"abstract":"<p><p>Nested effects models (NEMs) are a class of probabilistic models that were designed to reconstruct a hidden signalling structure from a large set of observable effects caused by active interventions into the signalling pathway. We give a more flexible formulation of NEMs in the language of Bayesian networks. Our framework constitutes a natural generalization of the original NEM model, since it explicitly states the assumptions that are tacitly underlying the original version. Our approach gives rise to new learning methods for NEMs, which have been implemented in the R/Bioconductor package nem. We validate these methods in a simulation study and apply them to a synthetic lethality dataset in yeast.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"195272"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/195272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9785774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-01Epub Date: 2009-04-23DOI: 10.1155/2009/250306
Erchin Serpedin, Javier Garcia-Frias, Yufei Huang, Ulisses Braga-Neto
The recent development of high-throughput molecular genetics technologies has brought a major impact to bioinformatics and systems biology. These technologies have made possible the measurement of the expression profiles of genes and proteins in a highly parallel and integrated fashion. The examination of the huge amounts of genomic and proteomic data holds the promise for understanding the complex interactions between genes and proteins, the functional processes of a cell, and the impact of various factors on a cell, and ultimately, for enabling the design of new technologies for intelligent management of diseases. This special issue focuses on modeling and processing of data arising in bioinformatics, genomics, and proteomics using signal processing methods. The importance of signal processing techniques is due to their important role in extracting, processing, and interpreting the information contained in genomic and proteomic data. It is our hope that signal processing methods will lead to new advances and insights in uncovering the structure, functioning and evolution of biological systems. The special issue consists of nine papers that span a wide range of problems and applications in bioinformatics, genomics, and proteomics such as design of compressive sensing microarrays, analysis of missing values in microarray data, and effect of imputation techniques on post genomic inference methods, RNA sequence alignment, detection of periodicity in genomic sequences and gene expression profiles, clustering and classification of gene and protein expression data, and intervention in probabilistic Boolean networks. Next, we will briefly introduce the papers reported in this special issue. W. Dai et al. analyze how to design a microarray that it is fit for compressive sensing and that captures also the biochemistry of probe-target DNA hybridization. Algorithms and design results are reported for determining probe sequences that satisfy the binding requirements and for evaluating the target concentrations. M. S. B. Sehgal et al. address the general problem of improving post genomic knowledge discovery procedures such as the selection of the most significant genes and inference of gene regulatory networks using missing microarray data imputation techniques. It is shown that instead of neglecting missing data, recycling microarray data via robust imputation techniques can yield substantial performance improvements in the subsequent post genomic discovery procedures. B.-J. Yoon developed a novel efficient and robust approach for fast and accurate structural alignment of RNAs, including pseudoknots. The proposed method turns out to accelerate the dynamic programming algorithm for family-specific models such as profile-csHMMs and CMs, and to be robust to small parameter changes that are present in the model used to predict the constraint. The paper by J. Epps explains in detail the origins of ambiguity in period estimation for
{"title":"Applications of signal processing techniques to bioinformatics, genomics, and proteomics.","authors":"Erchin Serpedin, Javier Garcia-Frias, Yufei Huang, Ulisses Braga-Neto","doi":"10.1155/2009/250306","DOIUrl":"https://doi.org/10.1155/2009/250306","url":null,"abstract":"The recent development of high-throughput molecular genetics technologies has brought a major impact to bioinformatics and systems biology. These technologies have made possible the measurement of the expression profiles of genes and proteins in a highly parallel and integrated fashion. The examination of the huge amounts of genomic and proteomic data holds the promise for understanding the complex interactions between genes and proteins, the functional processes of a cell, and the impact of various factors on a cell, and ultimately, for enabling the design of new technologies for intelligent management of diseases. This special issue focuses on modeling and processing of data arising in bioinformatics, genomics, and proteomics using signal processing methods. The importance of signal processing techniques is due to their important role in extracting, processing, and interpreting the information contained in genomic and proteomic data. It is our hope that signal processing methods will lead to new advances and insights in uncovering the structure, functioning and evolution of biological systems. The special issue consists of nine papers that span a wide range of problems and applications in bioinformatics, genomics, and proteomics such as design of compressive sensing microarrays, analysis of missing values in microarray data, and effect of imputation techniques on post genomic inference methods, RNA sequence alignment, detection of periodicity in genomic sequences and gene expression profiles, clustering and classification of gene and protein expression data, and intervention in probabilistic Boolean networks. Next, we will briefly introduce the papers reported in this special issue. W. Dai et al. analyze how to design a microarray that it is fit for compressive sensing and that captures also the biochemistry of probe-target DNA hybridization. Algorithms and design results are reported for determining probe sequences that satisfy the binding requirements and for evaluating the target concentrations. M. S. B. Sehgal et al. address the general problem of improving post genomic knowledge discovery procedures such as the selection of the most significant genes and inference of gene regulatory networks using missing microarray data imputation techniques. It is shown that instead of neglecting missing data, recycling microarray data via robust imputation techniques can yield substantial performance improvements in the subsequent post genomic discovery procedures. B.-J. Yoon developed a novel efficient and robust approach for fast and accurate structural alignment of RNAs, including pseudoknots. The proposed method turns out to accelerate the dynamic programming algorithm for family-specific models such as profile-csHMMs and CMs, and to be robust to small parameter changes that are present in the model used to predict the constraint. The paper by J. Epps explains in detail the origins of ambiguity in period estimation for ","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"250306"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2009/250306","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"28217201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-01-01Epub Date: 2009-04-23DOI: 10.1155/2009/195712
Travis J Hestilow, Yufei Huang
A method for gene clustering from expression profiles using shape information is presented. The conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and Escherichia coli microarray expression data and initial validation of the Escherichia coli results, it is shown that the method has promise in being able to better cluster time-series microarray data according to shape similarity.
{"title":"Clustering of gene expression data based on shape similarity.","authors":"Travis J Hestilow, Yufei Huang","doi":"10.1155/2009/195712","DOIUrl":"10.1155/2009/195712","url":null,"abstract":"<p><p>A method for gene clustering from expression profiles using shape information is presented. The conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and Escherichia coli microarray expression data and initial validation of the Escherichia coli results, it is shown that the method has promise in being able to better cluster time-series microarray data according to shape similarity.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":"195712"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9425663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}