IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics最新文献
Pub Date : 2013-11-01DOI: 10.1109/GENSIPS.2013.6735918
Po-Yen Wu, John H Phan, May D Wang
One way to gain a more comprehensive picture of the complex function of a cell is to study the transcriptome. A promising technology for studying the transcriptome is RNA sequencing, an application of which is to quantify elements in the transcriptome and to link quantitative observations to biology. Although numerous quantification algorithms are publicly available, no method of systematically assessing these algorithms has been developed. To meet the need for such an assessment, we present an approach that includes (1) simulated and real datasets, (2) three alignment strategies, and (3) six quantification algorithms. Examining the normalized root-mean-square error, the percentage error of the coefficient of variation, and the distribution of the coefficient of variation, we found that quantification algorithms with the input of sequence alignment reported in the transcriptomic coordinate usually performed better in terms of the multiple metrics proposed in this study.
{"title":"An Approach for Assessing RNA-seq Quantification Algorithms in Replication Studies.","authors":"Po-Yen Wu, John H Phan, May D Wang","doi":"10.1109/GENSIPS.2013.6735918","DOIUrl":"10.1109/GENSIPS.2013.6735918","url":null,"abstract":"<p><p>One way to gain a more comprehensive picture of the complex function of a cell is to study the transcriptome. A promising technology for studying the transcriptome is RNA sequencing, an application of which is to quantify elements in the transcriptome and to link quantitative observations to biology. Although numerous quantification algorithms are publicly available, no method of systematically assessing these algorithms has been developed. To meet the need for such an assessment, we present an approach that includes (1) simulated and real datasets, (2) three alignment strategies, and (3) six quantification algorithms. Examining the normalized root-mean-square error, the percentage error of the coefficient of variation, and the distribution of the coefficient of variation, we found that quantification algorithms with the input of sequence alignment reported in the transcriptomic coordinate usually performed better in terms of the multiple metrics proposed in this study.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4981182/pdf/nihms806776.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34369976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/GENSIPS.2013.6735914
Rolando J Olivares, Arvind Rao, Jeffrey S Morris, Veerabhadran Baladandayuthapani
We propose a method to integrate high-dimensional genomics datasets across multiple platforms with multiple imaging outcomes. This new statistical framework uses a hierarchical model to integrate biological relationships across platforms to identify genes that associate with multiple correlated imaging outcomes. Our two-stage hierarchical model uses the information shared across the platforms and thus increasing the predictive power to identify the relevant genes. We assess the performance of our proposed method through simulation and apply to data obtained from the Cancer Genome Atlas Glioblastoma Multiforme dataset. Our proposed method discovers multiple copy number and microRNA regulated genes that are related to patients' imaging outcomes in glioblastoma.
{"title":"Integrative Analysis of Multi-modal Correlated Imaging-Genomics Data in Glioblastoma.","authors":"Rolando J Olivares, Arvind Rao, Jeffrey S Morris, Veerabhadran Baladandayuthapani","doi":"10.1109/GENSIPS.2013.6735914","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735914","url":null,"abstract":"<p><p>We propose a method to integrate high-dimensional genomics datasets across multiple platforms with multiple imaging outcomes. This new statistical framework uses a hierarchical model to integrate biological relationships across platforms to identify genes that associate with multiple correlated imaging outcomes. Our two-stage hierarchical model uses the information shared across the platforms and thus increasing the predictive power to identify the relevant genes. We assess the performance of our proposed method through simulation and apply to data obtained from the Cancer Genome Atlas Glioblastoma Multiforme dataset. Our proposed method discovers multiple copy number and microRNA regulated genes that are related to patients' imaging outcomes in glioblastoma.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2013.6735914","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9281852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-11-01DOI: 10.1109/GENSIPS.2013.6735913
Anindya Bhadra, Veerabhadran Baladandayuthapani
While individual studies have demonstrated that mRNA expressions are affected by copy number aberrations and microRNAs, their integrative analysis has largely been ignored. In this article, we use recently developed high-dimensional regression techniques to perform the integrative analysis of such data in the context of Glioblastoma Multiforme (GBM). It is revealed that copy numbers are more potent regulators of mRNA levels than microRNAs. We also infer the mRNA expression network after adjusting the effect of microR-NAs and copy numbers. Our association analysis demonstrates the expression levels of the genes IRS1 and GRB2 are strongly associated with the underlying variation in copy numbers, but we fail to detect significant associations with microRNA levels.
{"title":"Integrative Sparse Bayesian Analysis of High-dimensional Multi-platform Genomic Data in Glioblastoma.","authors":"Anindya Bhadra, Veerabhadran Baladandayuthapani","doi":"10.1109/GENSIPS.2013.6735913","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735913","url":null,"abstract":"<p><p>While individual studies have demonstrated that mRNA expressions are affected by copy number aberrations and microRNAs, their integrative analysis has largely been ignored. In this article, we use recently developed high-dimensional regression techniques to perform the integrative analysis of such data in the context of Glioblastoma Multiforme (GBM). It is revealed that copy numbers are more potent regulators of mRNA levels than microRNAs. We also infer the mRNA expression network after adjusting the effect of microR-NAs and copy numbers. Our association analysis demonstrates the expression levels of the genes IRS1 and GRB2 are strongly associated with the underlying variation in copy numbers, but we fail to detect significant associations with microRNA levels.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2013.6735913","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9281851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/GENSIPS.2012.6507729
John H Phan, Po-Yen Wu, May D Wang
Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline by assessing its potential for novel isoform discovery and by validating its quantification performance using qRT-PCR.
{"title":"Improving the Flexibility of RNA-Seq Data Analysis Pipelines.","authors":"John H Phan, Po-Yen Wu, May D Wang","doi":"10.1109/GENSIPS.2012.6507729","DOIUrl":"https://doi.org/10.1109/GENSIPS.2012.6507729","url":null,"abstract":"<p><p>Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline by assessing its potential for novel isoform discovery and by validating its quantification performance using qRT-PCR.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2012.6507729","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34316750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/GENSIPS.2012.6507747
Yanxun Xu, Jie Zhang, Yuan Yuan, Riten Mitra, Peter Müller, Yuan Ji
We integrate three TCGA data sets including measurements on matched DNA copy numbers (C), DNA methylation (M), and mRNA expression (E) over 500+ ovarian cancer samples. The integrative analysis is based on a Bayesian graphical model treating the three types of measurements as three vertices in a network. The graph is used as a convenient way to parameterize and display the dependence structure. Edges connecting vertices infer specific types of regulatory relationships. For example, an edge between M and E and a lack of edge between C and E implies methylation-controlled transcription, which is robust to copy number changes. In other words, the mRNA expression is sensitive to methylational variation but not copy number variation. We apply the graphical model to each of the genes in the TCGA data independently and provide a comprehensive list of inferred profiles. Examples are provided based on simulated data as well.
我们整合了三个 TCGA 数据集,包括 500 多个卵巢癌样本中匹配的 DNA 拷贝数(C)、DNA 甲基化(M)和 mRNA 表达(E)的测量数据。整合分析基于贝叶斯图模型,将三种测量结果视为网络中的三个顶点。图形是参数化和显示依赖结构的便捷方法。连接顶点的边推断出特定类型的调控关系。例如,M 和 E 之间有边,而 C 和 E 之间没有边,这意味着甲基化控制的转录对拷贝数变化具有稳健性。换句话说,mRNA 表达对甲基化变化敏感,而对拷贝数变化不敏感。我们将图形模型独立应用于 TCGA 数据中的每一个基因,并提供了一份推断出的概况综合列表。我们还提供了基于模拟数据的示例。
{"title":"A Bayesian Graphical Model for Integrative Analysis of TCGA Data.","authors":"Yanxun Xu, Jie Zhang, Yuan Yuan, Riten Mitra, Peter Müller, Yuan Ji","doi":"10.1109/GENSIPS.2012.6507747","DOIUrl":"10.1109/GENSIPS.2012.6507747","url":null,"abstract":"<p><p>We integrate three TCGA data sets including measurements on matched DNA copy numbers (C), DNA methylation (M), and mRNA expression (E) over 500+ ovarian cancer samples. The integrative analysis is based on a Bayesian graphical model treating the three types of measurements as three vertices in a network. The graph is used as a convenient way to parameterize and display the dependence structure. Edges connecting vertices infer specific types of regulatory relationships. For example, an edge between M and E and a lack of edge between C and E implies methylation-controlled transcription, which is robust to copy number changes. In other words, the mRNA expression is sensitive to methylational variation but not copy number variation. We apply the graphical model to each of the genes in the TCGA data independently and provide a comprehensive list of inferred profiles. Examples are provided based on simulated data as well.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4387199/pdf/nihms673684.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33203635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-12-01DOI: 10.1109/GENSIPS.2012.6507742
Riten Mitra, Peter Mueller, Yuan Ji, Gordon Mills, Yiling Lu
Advances in functional proteomic technologies have significantly enriched our knowledge of protein functions and their interactions in bio-molecular pathways. We discuss inference for RPPA (reverse phase protein array) data that measure the expression of the protein markers over time. We exploit the dynamical nature of the experiment to build a directed network of protein interactions. For this, we employ a Bayesian graphical model with an informative prior that favors sparsity. Conditional on the network, we model dependence at the level of latent binary indicators rather than the raw expression measurements. One of the key features of the proposed approach is a hierarchical model that allows for the dependence structure to be shared across different experiments, in the case of the motivating application across different drugs and doses. This is critical to facilitate meaningful inference with the limited available sample sizes. The second key feature is a sparsity inducing prior on the dependence structure. We show an application of the method to data measuring abundance of phosphorylated proteins in a human ovarian cell line.
{"title":"Sparse Bayesian Graphical Models for RPPA Time Course Data.","authors":"Riten Mitra, Peter Mueller, Yuan Ji, Gordon Mills, Yiling Lu","doi":"10.1109/GENSIPS.2012.6507742","DOIUrl":"https://doi.org/10.1109/GENSIPS.2012.6507742","url":null,"abstract":"<p><p>Advances in functional proteomic technologies have significantly enriched our knowledge of protein functions and their interactions in bio-molecular pathways. We discuss inference for RPPA (reverse phase protein array) data that measure the expression of the protein markers over time. We exploit the dynamical nature of the experiment to build a directed network of protein interactions. For this, we employ a Bayesian graphical model with an informative prior that favors sparsity. Conditional on the network, we model dependence at the level of latent binary indicators rather than the raw expression measurements. One of the key features of the proposed approach is a hierarchical model that allows for the dependence structure to be shared across different experiments, in the case of the motivating application across different drugs and doses. This is critical to facilitate meaningful inference with the limited available sample sizes. The second key feature is a sparsity inducing prior on the dependence structure. We show an application of the method to data measuring abundance of phosphorylated proteins in a human ovarian cell line.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2012.6507742","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33204255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-01DOI: 10.1109/GENSiPS.2011.6169426
Benjamin Rodriguez, Hok-Hei Tam, David Frankhouser, Michael Trimarchi, Mark Murphy, Chris Kuo, Deval Parikh, Bryan Ball, Sebastian Schwind, John Curfman, William Blum, Guido Marcucci, Pearlly Yan, Ralf Bundschuh
Advances in whole genome profiling have revolutionized the cancer research field, but at the same time have raised new bioinformatics challenges. For next generation sequencing (NGS), these include data storage, computational costs, sequence processing and alignment, delineating appropriate statistical measures, and data visualization. The NGS application MethylCap-seq involves the in vitro capture of methylated DNA and subsequent analysis of enriched fragments by massively parallel sequencing. Here, we present a scalable, flexible workflow for MethylCap-seq Quality Control, secondary data analysis, tertiary analysis of multiple experimental groups, and data visualization. This workflow and its suite of features will assist biologists in conducting methylation profiling projects and facilitate meaningful biological interpretation.
{"title":"A Scalable, Flexible Workflow for MethylCap-Seq Data Analysis.","authors":"Benjamin Rodriguez, Hok-Hei Tam, David Frankhouser, Michael Trimarchi, Mark Murphy, Chris Kuo, Deval Parikh, Bryan Ball, Sebastian Schwind, John Curfman, William Blum, Guido Marcucci, Pearlly Yan, Ralf Bundschuh","doi":"10.1109/GENSiPS.2011.6169426","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169426","url":null,"abstract":"<p><p>Advances in whole genome profiling have revolutionized the cancer research field, but at the same time have raised new bioinformatics challenges. For next generation sequencing (NGS), these include data storage, computational costs, sequence processing and alignment, delineating appropriate statistical measures, and data visualization. The NGS application MethylCap-seq involves the in vitro capture of methylated DNA and subsequent analysis of enriched fragments by massively parallel sequencing. Here, we present a scalable, flexible workflow for MethylCap-seq Quality Control, secondary data analysis, tertiary analysis of multiple experimental groups, and data visualization. This workflow and its suite of features will assist biologists in conducting methylation profiling projects and facilitate meaningful biological interpretation.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSiPS.2011.6169426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30559158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-01DOI: 10.1109/GENSiPS.2011.6169462
Peilin Jia, Zhongming Zhao
The recent success of genome-wide association (GWA) studies has greatly expanded our understanding of many complex diseases by delivering previously unknown loci and genes. A large number of GWAS datasets have already been made available, with more being generated. To explore the underlying moderate and weak signals, we recently developed a network-based dense module search (DMS) method for identification of disease candidate genes from GWAS datasets, leveraging on the joint effect of multiple genes. DMS is designed to dynamically search for the best nodes in a step-wise fashion and, thus, could overcome the limitation of pre-defined gene sets. Here, we propose an improved version of DMS, the topologically-adjusted DMS, to facilitate the analysis of complex diseases. Building on the previous version of DMS, we improved the randomization process by taking into account the topological character, aiming to adjust the bias potentially caused by high-degree nodes in the whole network. We demonstrated the topologically-adjusted DMS algorithm in a GWAS dataset for schizophrenia. We found the improved DMS strategy could effectively identify candidate genes while reducing the burden of high-degree nodes. In our evaluation, we found more candidate genes identified by the topologically-adjusted DMS algorithm have been reported in the previous association studies, suggesting this new algorithm has better performance than the unweighted DMS algorithm. Finally, our functional analysis of the top module genes revealed that they are enriched in immune-related pathways.
{"title":"Network-assisted Causal Gene Detection in Genome-wide Association Studies: An Improved Module Search Algorithm.","authors":"Peilin Jia, Zhongming Zhao","doi":"10.1109/GENSiPS.2011.6169462","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169462","url":null,"abstract":"<p><p>The recent success of genome-wide association (GWA) studies has greatly expanded our understanding of many complex diseases by delivering previously unknown loci and genes. A large number of GWAS datasets have already been made available, with more being generated. To explore the underlying moderate and weak signals, we recently developed a network-based dense module search (DMS) method for identification of disease candidate genes from GWAS datasets, leveraging on the joint effect of multiple genes. DMS is designed to dynamically search for the best nodes in a step-wise fashion and, thus, could overcome the limitation of pre-defined gene sets. Here, we propose an improved version of DMS, the topologically-adjusted DMS, to facilitate the analysis of complex diseases. Building on the previous version of DMS, we improved the randomization process by taking into account the topological character, aiming to adjust the bias potentially caused by high-degree nodes in the whole network. We demonstrated the topologically-adjusted DMS algorithm in a GWAS dataset for schizophrenia. We found the improved DMS strategy could effectively identify candidate genes while reducing the burden of high-degree nodes. In our evaluation, we found more candidate genes identified by the topologically-adjusted DMS algorithm have been reported in the previous association studies, suggesting this new algorithm has better performance than the unweighted DMS algorithm. Finally, our functional analysis of the top module genes revealed that they are enriched in immune-related pathways.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSiPS.2011.6169462","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30839246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MicroRNAs (miRNAs) are 21 or 22 nucleotides noncoding RNAs known to possess important post-transcriptional regulatory functions. Identifying targeting genes that miRNAs regulate is important for understanding their specific biological functions. Usually, miRNAs down-regulate target genes through binding to the complementary sites in the 3' untranslated region (UTR) of the targets. Since the binding of the miRNAs of animals is not a perfect one-to-one match with the complementary sites of their targets, it is difficult to find targets of animal miRNAs by accessing their alignment to the 3' UTRs of potential targets. More sophisticated computational approaches are desirable and have been proposed as a result. The most popular algorithms include TargetScan, miRanda, and PicTar. However, they share similar methodology and are restricted by the human observation of conserved nature of miRNAs and their targets. In this article, we develop a statistical learning based approach that uses support vector machine (SVM) as a classifier to predict miRNA targets. SVM have been applied in many fields such as pattern recognition, computational biology, and medical image analysis. With SVM, information is gained automatically from relevant data and therefore human bias can be removed in the decision process.
{"title":"A MACHINE LEARNING APPROACH FOR miRNA TARGET PREDICTION.","authors":"Hui Liu, Dong Yue, Lin Zhang, Zhiqiang Bai, Xiufen Lei, Shou-Jiang Gao, Yufei Huang","doi":"10.1109/GENSIPS.2008.4555655","DOIUrl":"https://doi.org/10.1109/GENSIPS.2008.4555655","url":null,"abstract":"MicroRNAs (miRNAs) are 21 or 22 nucleotides noncoding RNAs known to possess important post-transcriptional regulatory functions. Identifying targeting genes that miRNAs regulate is important for understanding their specific biological functions. Usually, miRNAs down-regulate target genes through binding to the complementary sites in the 3' untranslated region (UTR) of the targets. Since the binding of the miRNAs of animals is not a perfect one-to-one match with the complementary sites of their targets, it is difficult to find targets of animal miRNAs by accessing their alignment to the 3' UTRs of potential targets. More sophisticated computational approaches are desirable and have been proposed as a result. The most popular algorithms include TargetScan, miRanda, and PicTar. However, they share similar methodology and are restricted by the human observation of conserved nature of miRNAs and their targets. In this article, we develop a statistical learning based approach that uses support vector machine (SVM) as a classifier to predict miRNA targets. SVM have been applied in many fields such as pattern recognition, computational biology, and medical image analysis. With SVM, information is gained automatically from relevant data and therefore human bias can be removed in the decision process.","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2008.4555655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29026407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-01DOI: 10.1109/GENSIPS.2008.4555659
Jia Meng, Shou-Jiang Gao, Yufei Huang
An algorithm for the discovery of time varying modules using genome-wide expression data is present here. When applied to large-scale time serious data, our method is designed to discover not only the transcription modules but also their timing information, which is rarely annotated by the existing approaches. Rather than assuming commonly defined time constant transcription modules, a module is depicted as a set of genes that are co-regulated during a specific period of time, i.e., a time dependent transcription module (TDTM). A rigorous mathematical definition of TDTM is provided, which is serve as an objective function for retrieving modules. Based on the definition, an effective signature algorithm is proposed that iteratively searches the transcription modules from the time series data. The proposed method was tested on the simulated systems and applied to the human time series microarray data during Kaposi's sarcoma-associated herpesvirus (KSHV) infection. The result has been verified by Expression Analysis Systematic Explorer.
{"title":"An Iterative Time Windowed Signature Algorithm for Time Dependent Transcription Module Discovery.","authors":"Jia Meng, Shou-Jiang Gao, Yufei Huang","doi":"10.1109/GENSIPS.2008.4555659","DOIUrl":"https://doi.org/10.1109/GENSIPS.2008.4555659","url":null,"abstract":"<p><p>An algorithm for the discovery of time varying modules using genome-wide expression data is present here. When applied to large-scale time serious data, our method is designed to discover not only the transcription modules but also their timing information, which is rarely annotated by the existing approaches. Rather than assuming commonly defined time constant transcription modules, a module is depicted as a set of genes that are co-regulated during a specific period of time, i.e., a time dependent transcription module (TDTM). A rigorous mathematical definition of TDTM is provided, which is serve as an objective function for retrieving modules. Based on the definition, an effective signature algorithm is proposed that iteratively searches the transcription modules from the time series data. The proposed method was tested on the simulated systems and applied to the human time series microarray data during Kaposi's sarcoma-associated herpesvirus (KSHV) infection. The result has been verified by Expression Analysis Systematic Explorer.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2008.4555659","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30170607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics