首页 > 最新文献

IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics最新文献

英文 中文
An Approach for Assessing RNA-seq Quantification Algorithms in Replication Studies. 在复制研究中评估 RNA-seq 定量算法的方法。
Po-Yen Wu, John H Phan, May D Wang

One way to gain a more comprehensive picture of the complex function of a cell is to study the transcriptome. A promising technology for studying the transcriptome is RNA sequencing, an application of which is to quantify elements in the transcriptome and to link quantitative observations to biology. Although numerous quantification algorithms are publicly available, no method of systematically assessing these algorithms has been developed. To meet the need for such an assessment, we present an approach that includes (1) simulated and real datasets, (2) three alignment strategies, and (3) six quantification algorithms. Examining the normalized root-mean-square error, the percentage error of the coefficient of variation, and the distribution of the coefficient of variation, we found that quantification algorithms with the input of sequence alignment reported in the transcriptomic coordinate usually performed better in terms of the multiple metrics proposed in this study.

要更全面地了解细胞的复杂功能,方法之一是研究转录组。研究转录组的一项有前途的技术是 RNA 测序,其应用之一是量化转录组中的元素,并将定量观测与生物学联系起来。虽然有许多量化算法可以公开获得,但还没有开发出系统评估这些算法的方法。为了满足这种评估需要,我们提出了一种方法,其中包括:(1)模拟数据集和真实数据集;(2)三种配准策略;(3)六种量化算法。通过考察归一化均方根误差、变异系数百分比误差和变异系数分布,我们发现以转录组坐标中报告的序列比对为输入的量化算法通常在本研究提出的多个指标方面表现较好。
{"title":"An Approach for Assessing RNA-seq Quantification Algorithms in Replication Studies.","authors":"Po-Yen Wu, John H Phan, May D Wang","doi":"10.1109/GENSIPS.2013.6735918","DOIUrl":"10.1109/GENSIPS.2013.6735918","url":null,"abstract":"<p><p>One way to gain a more comprehensive picture of the complex function of a cell is to study the transcriptome. A promising technology for studying the transcriptome is RNA sequencing, an application of which is to quantify elements in the transcriptome and to link quantitative observations to biology. Although numerous quantification algorithms are publicly available, no method of systematically assessing these algorithms has been developed. To meet the need for such an assessment, we present an approach that includes (1) simulated and real datasets, (2) three alignment strategies, and (3) six quantification algorithms. Examining the normalized root-mean-square error, the percentage error of the coefficient of variation, and the distribution of the coefficient of variation, we found that quantification algorithms with the input of sequence alignment reported in the transcriptomic coordinate usually performed better in terms of the multiple metrics proposed in this study.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4981182/pdf/nihms806776.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34369976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative Analysis of Multi-modal Correlated Imaging-Genomics Data in Glioblastoma. 胶质母细胞瘤多模态相关成像基因组数据的综合分析。
Rolando J Olivares, Arvind Rao, Jeffrey S Morris, Veerabhadran Baladandayuthapani

We propose a method to integrate high-dimensional genomics datasets across multiple platforms with multiple imaging outcomes. This new statistical framework uses a hierarchical model to integrate biological relationships across platforms to identify genes that associate with multiple correlated imaging outcomes. Our two-stage hierarchical model uses the information shared across the platforms and thus increasing the predictive power to identify the relevant genes. We assess the performance of our proposed method through simulation and apply to data obtained from the Cancer Genome Atlas Glioblastoma Multiforme dataset. Our proposed method discovers multiple copy number and microRNA regulated genes that are related to patients' imaging outcomes in glioblastoma.

我们提出了一种跨多个平台整合具有多种成像结果的高维基因组学数据集的方法。这个新的统计框架使用分层模型来整合跨平台的生物关系,以识别与多种相关成像结果相关的基因。我们的两阶段分层模型使用了跨平台共享的信息,从而提高了识别相关基因的预测能力。我们通过模拟评估我们提出的方法的性能,并应用于从癌症基因组图谱胶质母细胞瘤多形式数据集获得的数据。我们提出的方法发现了与胶质母细胞瘤患者影像学结果相关的多个拷贝数和microRNA调节基因。
{"title":"Integrative Analysis of Multi-modal Correlated Imaging-Genomics Data in Glioblastoma.","authors":"Rolando J Olivares,&nbsp;Arvind Rao,&nbsp;Jeffrey S Morris,&nbsp;Veerabhadran Baladandayuthapani","doi":"10.1109/GENSIPS.2013.6735914","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735914","url":null,"abstract":"<p><p>We propose a method to integrate high-dimensional genomics datasets across multiple platforms with multiple imaging outcomes. This new statistical framework uses a hierarchical model to integrate biological relationships across platforms to identify genes that associate with multiple correlated imaging outcomes. Our two-stage hierarchical model uses the information shared across the platforms and thus increasing the predictive power to identify the relevant genes. We assess the performance of our proposed method through simulation and apply to data obtained from the Cancer Genome Atlas Glioblastoma Multiforme dataset. Our proposed method discovers multiple copy number and microRNA regulated genes that are related to patients' imaging outcomes in glioblastoma.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2013.6735914","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9281852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Integrative Sparse Bayesian Analysis of High-dimensional Multi-platform Genomic Data in Glioblastoma. 胶质母细胞瘤高维多平台基因组数据的整合稀疏贝叶斯分析。
Anindya Bhadra, Veerabhadran Baladandayuthapani

While individual studies have demonstrated that mRNA expressions are affected by copy number aberrations and microRNAs, their integrative analysis has largely been ignored. In this article, we use recently developed high-dimensional regression techniques to perform the integrative analysis of such data in the context of Glioblastoma Multiforme (GBM). It is revealed that copy numbers are more potent regulators of mRNA levels than microRNAs. We also infer the mRNA expression network after adjusting the effect of microR-NAs and copy numbers. Our association analysis demonstrates the expression levels of the genes IRS1 and GRB2 are strongly associated with the underlying variation in copy numbers, but we fail to detect significant associations with microRNA levels.

虽然个别研究表明mRNA表达受到拷贝数畸变和microrna的影响,但它们的综合分析在很大程度上被忽视了。在本文中,我们使用最近开发的高维回归技术对多形性胶质母细胞瘤(GBM)背景下的这些数据进行综合分析。研究表明,拷贝数比microrna更有效地调节mRNA水平。我们还在调整了microrna和拷贝数的影响后推断了mRNA的表达网络。我们的关联分析表明,基因IRS1和GRB2的表达水平与拷贝数的潜在变化密切相关,但我们未能发现与microRNA水平的显著关联。
{"title":"Integrative Sparse Bayesian Analysis of High-dimensional Multi-platform Genomic Data in Glioblastoma.","authors":"Anindya Bhadra,&nbsp;Veerabhadran Baladandayuthapani","doi":"10.1109/GENSIPS.2013.6735913","DOIUrl":"https://doi.org/10.1109/GENSIPS.2013.6735913","url":null,"abstract":"<p><p>While individual studies have demonstrated that mRNA expressions are affected by copy number aberrations and microRNAs, their integrative analysis has largely been ignored. In this article, we use recently developed high-dimensional regression techniques to perform the integrative analysis of such data in the context of Glioblastoma Multiforme (GBM). It is revealed that copy numbers are more potent regulators of mRNA levels than microRNAs. We also infer the mRNA expression network after adjusting the effect of microR-NAs and copy numbers. Our association analysis demonstrates the expression levels of the genes IRS1 and GRB2 are strongly associated with the underlying variation in copy numbers, but we fail to detect significant associations with microRNA levels.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2013.6735913","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9281851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving the Flexibility of RNA-Seq Data Analysis Pipelines. 提高RNA-Seq数据分析管道的灵活性。
John H Phan, Po-Yen Wu, May D Wang

Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline by assessing its potential for novel isoform discovery and by validating its quantification performance using qRT-PCR.

用RNA-Seq准确定量基因或异构体表达依赖于转录组的完整知识。由于完整的基因组注释尚不存在,新的同种异构体的发现是RNA-Seq定量过程的重要组成部分。因此,典型的RNA-Seq管道包括转录组作图步骤,以量化已知基因和同种异构体,以及参考基因组作图步骤,以发现新的基因和同种异构体。有几种工具实现了这种方法,但它们的局限性在于,它们在转录组和参考基因组定位阶段都强制使用单一的定位算法。映射算法的选择会影响每个数据集的量化精度。因此,我们描述了一种方法,可以合并转录组和参考基因组定位阶段,前提是它们符合标准的SAM/BAM格式。当选择RNA-Seq数据分析管道时,该程序可以通过增加灵活性来潜在地提高基因或同工异构体定量的准确性。我们展示了一个灵活的RNA-Seq管道的例子,通过评估其发现新异构体的潜力,并通过使用qRT-PCR验证其定量性能。
{"title":"Improving the Flexibility of RNA-Seq Data Analysis Pipelines.","authors":"John H Phan,&nbsp;Po-Yen Wu,&nbsp;May D Wang","doi":"10.1109/GENSIPS.2012.6507729","DOIUrl":"https://doi.org/10.1109/GENSIPS.2012.6507729","url":null,"abstract":"<p><p>Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline by assessing its potential for novel isoform discovery and by validating its quantification performance using qRT-PCR.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2012.6507729","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34316750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Bayesian Graphical Model for Integrative Analysis of TCGA Data. 用于 TCGA 数据综合分析的贝叶斯图形模型。
Yanxun Xu, Jie Zhang, Yuan Yuan, Riten Mitra, Peter Müller, Yuan Ji

We integrate three TCGA data sets including measurements on matched DNA copy numbers (C), DNA methylation (M), and mRNA expression (E) over 500+ ovarian cancer samples. The integrative analysis is based on a Bayesian graphical model treating the three types of measurements as three vertices in a network. The graph is used as a convenient way to parameterize and display the dependence structure. Edges connecting vertices infer specific types of regulatory relationships. For example, an edge between M and E and a lack of edge between C and E implies methylation-controlled transcription, which is robust to copy number changes. In other words, the mRNA expression is sensitive to methylational variation but not copy number variation. We apply the graphical model to each of the genes in the TCGA data independently and provide a comprehensive list of inferred profiles. Examples are provided based on simulated data as well.

我们整合了三个 TCGA 数据集,包括 500 多个卵巢癌样本中匹配的 DNA 拷贝数(C)、DNA 甲基化(M)和 mRNA 表达(E)的测量数据。整合分析基于贝叶斯图模型,将三种测量结果视为网络中的三个顶点。图形是参数化和显示依赖结构的便捷方法。连接顶点的边推断出特定类型的调控关系。例如,M 和 E 之间有边,而 C 和 E 之间没有边,这意味着甲基化控制的转录对拷贝数变化具有稳健性。换句话说,mRNA 表达对甲基化变化敏感,而对拷贝数变化不敏感。我们将图形模型独立应用于 TCGA 数据中的每一个基因,并提供了一份推断出的概况综合列表。我们还提供了基于模拟数据的示例。
{"title":"A Bayesian Graphical Model for Integrative Analysis of TCGA Data.","authors":"Yanxun Xu, Jie Zhang, Yuan Yuan, Riten Mitra, Peter Müller, Yuan Ji","doi":"10.1109/GENSIPS.2012.6507747","DOIUrl":"10.1109/GENSIPS.2012.6507747","url":null,"abstract":"<p><p>We integrate three TCGA data sets including measurements on matched DNA copy numbers (C), DNA methylation (M), and mRNA expression (E) over 500+ ovarian cancer samples. The integrative analysis is based on a Bayesian graphical model treating the three types of measurements as three vertices in a network. The graph is used as a convenient way to parameterize and display the dependence structure. Edges connecting vertices infer specific types of regulatory relationships. For example, an edge between M and E and a lack of edge between C and E implies methylation-controlled transcription, which is robust to copy number changes. In other words, the mRNA expression is sensitive to methylational variation but not copy number variation. We apply the graphical model to each of the genes in the TCGA data independently and provide a comprehensive list of inferred profiles. Examples are provided based on simulated data as well.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4387199/pdf/nihms673684.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33203635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Bayesian Graphical Models for RPPA Time Course Data. RPPA时程数据的稀疏贝叶斯图模型。
Riten Mitra, Peter Mueller, Yuan Ji, Gordon Mills, Yiling Lu

Advances in functional proteomic technologies have significantly enriched our knowledge of protein functions and their interactions in bio-molecular pathways. We discuss inference for RPPA (reverse phase protein array) data that measure the expression of the protein markers over time. We exploit the dynamical nature of the experiment to build a directed network of protein interactions. For this, we employ a Bayesian graphical model with an informative prior that favors sparsity. Conditional on the network, we model dependence at the level of latent binary indicators rather than the raw expression measurements. One of the key features of the proposed approach is a hierarchical model that allows for the dependence structure to be shared across different experiments, in the case of the motivating application across different drugs and doses. This is critical to facilitate meaningful inference with the limited available sample sizes. The second key feature is a sparsity inducing prior on the dependence structure. We show an application of the method to data measuring abundance of phosphorylated proteins in a human ovarian cell line.

功能蛋白质组学技术的进步极大地丰富了我们对蛋白质功能及其在生物分子途径中的相互作用的认识。我们讨论了RPPA(逆相蛋白阵列)数据的推断,该数据测量了蛋白质标记物随时间的表达。我们利用实验的动力学性质来建立一个蛋白质相互作用的定向网络。为此,我们采用贝叶斯图形模型,该模型具有有利于稀疏性的信息先验。在网络的条件下,我们在潜在的二元指标水平上建模依赖,而不是原始的表达测量。提出的方法的关键特征之一是分层模型,在不同药物和剂量的激励应用的情况下,允许在不同的实验中共享依赖结构。这对于在有限的可用样本量下促进有意义的推断是至关重要的。第二个关键特征是依赖结构上的稀疏性诱导先验。我们展示了该方法的应用数据测量丰富的磷酸化蛋白在人卵巢细胞系。
{"title":"Sparse Bayesian Graphical Models for RPPA Time Course Data.","authors":"Riten Mitra,&nbsp;Peter Mueller,&nbsp;Yuan Ji,&nbsp;Gordon Mills,&nbsp;Yiling Lu","doi":"10.1109/GENSIPS.2012.6507742","DOIUrl":"https://doi.org/10.1109/GENSIPS.2012.6507742","url":null,"abstract":"<p><p>Advances in functional proteomic technologies have significantly enriched our knowledge of protein functions and their interactions in bio-molecular pathways. We discuss inference for RPPA (reverse phase protein array) data that measure the expression of the protein markers over time. We exploit the dynamical nature of the experiment to build a directed network of protein interactions. For this, we employ a Bayesian graphical model with an informative prior that favors sparsity. Conditional on the network, we model dependence at the level of latent binary indicators rather than the raw expression measurements. One of the key features of the proposed approach is a hierarchical model that allows for the dependence structure to be shared across different experiments, in the case of the motivating application across different drugs and doses. This is critical to facilitate meaningful inference with the limited available sample sizes. The second key feature is a sparsity inducing prior on the dependence structure. We show an application of the method to data measuring abundance of phosphorylated proteins in a human ovarian cell line.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2012.6507742","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33204255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Scalable, Flexible Workflow for MethylCap-Seq Data Analysis. 一个可扩展的,灵活的工作流程甲基cap - seq数据分析。
Benjamin Rodriguez, Hok-Hei Tam, David Frankhouser, Michael Trimarchi, Mark Murphy, Chris Kuo, Deval Parikh, Bryan Ball, Sebastian Schwind, John Curfman, William Blum, Guido Marcucci, Pearlly Yan, Ralf Bundschuh

Advances in whole genome profiling have revolutionized the cancer research field, but at the same time have raised new bioinformatics challenges. For next generation sequencing (NGS), these include data storage, computational costs, sequence processing and alignment, delineating appropriate statistical measures, and data visualization. The NGS application MethylCap-seq involves the in vitro capture of methylated DNA and subsequent analysis of enriched fragments by massively parallel sequencing. Here, we present a scalable, flexible workflow for MethylCap-seq Quality Control, secondary data analysis, tertiary analysis of multiple experimental groups, and data visualization. This workflow and its suite of features will assist biologists in conducting methylation profiling projects and facilitate meaningful biological interpretation.

全基因组图谱的进步使癌症研究领域发生了革命性的变化,但同时也提出了新的生物信息学挑战。对于下一代测序(NGS),这些包括数据存储、计算成本、序列处理和比对、描述适当的统计测量和数据可视化。NGS应用MethylCap-seq包括体外捕获甲基化DNA,随后通过大规模平行测序对富集片段进行分析。在这里,我们提出了一个可扩展的,灵活的工作流程,用于MethylCap-seq质量控制,二级数据分析,多个实验组的三级分析和数据可视化。该工作流程及其功能套件将帮助生物学家进行甲基化分析项目,并促进有意义的生物学解释。
{"title":"A Scalable, Flexible Workflow for MethylCap-Seq Data Analysis.","authors":"Benjamin Rodriguez,&nbsp;Hok-Hei Tam,&nbsp;David Frankhouser,&nbsp;Michael Trimarchi,&nbsp;Mark Murphy,&nbsp;Chris Kuo,&nbsp;Deval Parikh,&nbsp;Bryan Ball,&nbsp;Sebastian Schwind,&nbsp;John Curfman,&nbsp;William Blum,&nbsp;Guido Marcucci,&nbsp;Pearlly Yan,&nbsp;Ralf Bundschuh","doi":"10.1109/GENSiPS.2011.6169426","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169426","url":null,"abstract":"<p><p>Advances in whole genome profiling have revolutionized the cancer research field, but at the same time have raised new bioinformatics challenges. For next generation sequencing (NGS), these include data storage, computational costs, sequence processing and alignment, delineating appropriate statistical measures, and data visualization. The NGS application MethylCap-seq involves the in vitro capture of methylated DNA and subsequent analysis of enriched fragments by massively parallel sequencing. Here, we present a scalable, flexible workflow for MethylCap-seq Quality Control, secondary data analysis, tertiary analysis of multiple experimental groups, and data visualization. This workflow and its suite of features will assist biologists in conducting methylation profiling projects and facilitate meaningful biological interpretation.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSiPS.2011.6169426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30559158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Network-assisted Causal Gene Detection in Genome-wide Association Studies: An Improved Module Search Algorithm. 全基因组关联研究中的网络辅助因果基因检测:一种改进的模块搜索算法。
Peilin Jia, Zhongming Zhao

The recent success of genome-wide association (GWA) studies has greatly expanded our understanding of many complex diseases by delivering previously unknown loci and genes. A large number of GWAS datasets have already been made available, with more being generated. To explore the underlying moderate and weak signals, we recently developed a network-based dense module search (DMS) method for identification of disease candidate genes from GWAS datasets, leveraging on the joint effect of multiple genes. DMS is designed to dynamically search for the best nodes in a step-wise fashion and, thus, could overcome the limitation of pre-defined gene sets. Here, we propose an improved version of DMS, the topologically-adjusted DMS, to facilitate the analysis of complex diseases. Building on the previous version of DMS, we improved the randomization process by taking into account the topological character, aiming to adjust the bias potentially caused by high-degree nodes in the whole network. We demonstrated the topologically-adjusted DMS algorithm in a GWAS dataset for schizophrenia. We found the improved DMS strategy could effectively identify candidate genes while reducing the burden of high-degree nodes. In our evaluation, we found more candidate genes identified by the topologically-adjusted DMS algorithm have been reported in the previous association studies, suggesting this new algorithm has better performance than the unweighted DMS algorithm. Finally, our functional analysis of the top module genes revealed that they are enriched in immune-related pathways.

最近全基因组关联(GWA)研究的成功通过传递以前未知的位点和基因,极大地扩展了我们对许多复杂疾病的理解。已经提供了大量的GWAS数据集,并且正在生成更多的数据集。为了探索潜在的中度和弱信号,我们最近开发了一种基于网络的密集模块搜索(DMS)方法,利用多个基因的联合效应,从GWAS数据集中识别疾病候选基因。DMS被设计为以逐步的方式动态搜索最佳节点,因此可以克服预定义基因集的限制。在这里,我们提出了一个改进版本的DMS,拓扑调整DMS,以方便复杂疾病的分析。在之前版本的DMS的基础上,我们通过考虑拓扑特征来改进随机化过程,旨在调整整个网络中高节点可能导致的偏差。我们在精神分裂症的GWAS数据集中展示了拓扑调整的DMS算法。我们发现改进的DMS策略可以有效地识别候选基因,同时减少高节点的负担。在我们的评估中,我们发现通过拓扑调整的DMS算法识别的候选基因在之前的关联研究中已经有更多的报道,这表明该新算法比未加权的DMS算法具有更好的性能。最后,我们对顶级模块基因的功能分析显示,它们在免疫相关途径中富集。
{"title":"Network-assisted Causal Gene Detection in Genome-wide Association Studies: An Improved Module Search Algorithm.","authors":"Peilin Jia,&nbsp;Zhongming Zhao","doi":"10.1109/GENSiPS.2011.6169462","DOIUrl":"https://doi.org/10.1109/GENSiPS.2011.6169462","url":null,"abstract":"<p><p>The recent success of genome-wide association (GWA) studies has greatly expanded our understanding of many complex diseases by delivering previously unknown loci and genes. A large number of GWAS datasets have already been made available, with more being generated. To explore the underlying moderate and weak signals, we recently developed a network-based dense module search (DMS) method for identification of disease candidate genes from GWAS datasets, leveraging on the joint effect of multiple genes. DMS is designed to dynamically search for the best nodes in a step-wise fashion and, thus, could overcome the limitation of pre-defined gene sets. Here, we propose an improved version of DMS, the topologically-adjusted DMS, to facilitate the analysis of complex diseases. Building on the previous version of DMS, we improved the randomization process by taking into account the topological character, aiming to adjust the bias potentially caused by high-degree nodes in the whole network. We demonstrated the topologically-adjusted DMS algorithm in a GWAS dataset for schizophrenia. We found the improved DMS strategy could effectively identify candidate genes while reducing the burden of high-degree nodes. In our evaluation, we found more candidate genes identified by the topologically-adjusted DMS algorithm have been reported in the previous association studies, suggesting this new algorithm has better performance than the unweighted DMS algorithm. Finally, our functional analysis of the top module genes revealed that they are enriched in immune-related pathways.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSiPS.2011.6169462","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30839246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A MACHINE LEARNING APPROACH FOR miRNA TARGET PREDICTION. miRNA目标预测的机器学习方法。
Hui Liu, Dong Yue, Lin Zhang, Zhiqiang Bai, Xiufen Lei, Shou-Jiang Gao, Yufei Huang
MicroRNAs (miRNAs) are 21 or 22 nucleotides noncoding RNAs known to possess important post-transcriptional regulatory functions. Identifying targeting genes that miRNAs regulate is important for understanding their specific biological functions. Usually, miRNAs down-regulate target genes through binding to the complementary sites in the 3' untranslated region (UTR) of the targets. Since the binding of the miRNAs of animals is not a perfect one-to-one match with the complementary sites of their targets, it is difficult to find targets of animal miRNAs by accessing their alignment to the 3' UTRs of potential targets. More sophisticated computational approaches are desirable and have been proposed as a result. The most popular algorithms include TargetScan, miRanda, and PicTar. However, they share similar methodology and are restricted by the human observation of conserved nature of miRNAs and their targets. In this article, we develop a statistical learning based approach that uses support vector machine (SVM) as a classifier to predict miRNA targets. SVM have been applied in many fields such as pattern recognition, computational biology, and medical image analysis. With SVM, information is gained automatically from relevant data and therefore human bias can be removed in the decision process.
{"title":"A MACHINE LEARNING APPROACH FOR miRNA TARGET PREDICTION.","authors":"Hui Liu,&nbsp;Dong Yue,&nbsp;Lin Zhang,&nbsp;Zhiqiang Bai,&nbsp;Xiufen Lei,&nbsp;Shou-Jiang Gao,&nbsp;Yufei Huang","doi":"10.1109/GENSIPS.2008.4555655","DOIUrl":"https://doi.org/10.1109/GENSIPS.2008.4555655","url":null,"abstract":"MicroRNAs (miRNAs) are 21 or 22 nucleotides noncoding RNAs known to possess important post-transcriptional regulatory functions. Identifying targeting genes that miRNAs regulate is important for understanding their specific biological functions. Usually, miRNAs down-regulate target genes through binding to the complementary sites in the 3' untranslated region (UTR) of the targets. Since the binding of the miRNAs of animals is not a perfect one-to-one match with the complementary sites of their targets, it is difficult to find targets of animal miRNAs by accessing their alignment to the 3' UTRs of potential targets. More sophisticated computational approaches are desirable and have been proposed as a result. The most popular algorithms include TargetScan, miRanda, and PicTar. However, they share similar methodology and are restricted by the human observation of conserved nature of miRNAs and their targets. In this article, we develop a statistical learning based approach that uses support vector machine (SVM) as a classifier to predict miRNA targets. SVM have been applied in many fields such as pattern recognition, computational biology, and medical image analysis. With SVM, information is gained automatically from relevant data and therefore human bias can be removed in the decision process.","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2008.4555655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29026407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Iterative Time Windowed Signature Algorithm for Time Dependent Transcription Module Discovery. 时变转录模块发现的一种迭代时窗签名算法。
Jia Meng, Shou-Jiang Gao, Yufei Huang

An algorithm for the discovery of time varying modules using genome-wide expression data is present here. When applied to large-scale time serious data, our method is designed to discover not only the transcription modules but also their timing information, which is rarely annotated by the existing approaches. Rather than assuming commonly defined time constant transcription modules, a module is depicted as a set of genes that are co-regulated during a specific period of time, i.e., a time dependent transcription module (TDTM). A rigorous mathematical definition of TDTM is provided, which is serve as an objective function for retrieving modules. Based on the definition, an effective signature algorithm is proposed that iteratively searches the transcription modules from the time series data. The proposed method was tested on the simulated systems and applied to the human time series microarray data during Kaposi's sarcoma-associated herpesvirus (KSHV) infection. The result has been verified by Expression Analysis Systematic Explorer.

本文提出了一种利用全基因组表达数据发现时变模块的算法。当应用于大规模时间严肃数据时,我们的方法不仅可以发现转录模块,还可以发现它们的时间信息,这是现有方法很少注释的。与通常定义的时间常数转录模块不同,一个模块被描述为一组在特定时间内被共同调控的基因,即时间依赖性转录模块(TDTM)。给出了TDTM的严格数学定义,并将其作为检索模块的目标函数。在此基础上,提出了一种从时间序列数据中迭代搜索转录模块的有效签名算法。该方法在模拟系统上进行了测试,并应用于卡波西肉瘤相关疱疹病毒(KSHV)感染期间的人类时间序列微阵列数据。该结果已通过表达式分析系统探索者验证。
{"title":"An Iterative Time Windowed Signature Algorithm for Time Dependent Transcription Module Discovery.","authors":"Jia Meng,&nbsp;Shou-Jiang Gao,&nbsp;Yufei Huang","doi":"10.1109/GENSIPS.2008.4555659","DOIUrl":"https://doi.org/10.1109/GENSIPS.2008.4555659","url":null,"abstract":"<p><p>An algorithm for the discovery of time varying modules using genome-wide expression data is present here. When applied to large-scale time serious data, our method is designed to discover not only the transcription modules but also their timing information, which is rarely annotated by the existing approaches. Rather than assuming commonly defined time constant transcription modules, a module is depicted as a set of genes that are co-regulated during a specific period of time, i.e., a time dependent transcription module (TDTM). A rigorous mathematical definition of TDTM is provided, which is serve as an objective function for retrieving modules. Based on the definition, an effective signature algorithm is proposed that iteratively searches the transcription modules from the time series data. The proposed method was tested on the simulated systems and applied to the human time series microarray data during Kaposi's sarcoma-associated herpesvirus (KSHV) infection. The result has been verified by Expression Analysis Systematic Explorer.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2008.4555659","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30170607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1