首页 > 最新文献

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics最新文献

英文 中文
Pisces: An Accurate and Versatile Single Sample Somatic and Germline Variant Caller 双鱼座:一个准确和多功能的单一样本体细胞和生殖系变异来电者
Tamsen Dunn, G. Berry, Dorothea Emig-Agius, Yu Jiang, A. Iyer, N. Udar, Michael P. Strömberg
A method for robustly and accurately detecting rare DNA mutations in tumor samples is critical to cancer research. Because many clinical tissue repositories have only FFPE-degraded tumor samples, and no matched normal sample from healthy tissue available, being able to discriminate low frequency mutations from background noise in the absence of a matched normal sample is of particular importance to research. Current state of the art variant callers such as GATK and VarScan focus on germline variant calling (used for detecting inherited mutations following a Mendelian inheritance pattern) or, in the case of FreeBayes and MuTect, focus on tumor-normal joint variant calling (using the normal sample to help discriminate low frequency somatic mutations from back ground noise). We present Pisces, a tumor-only variant caller exclusively developed at Illumina for detecting low frequency mutations from next generation sequencing data. Pisces has been an integral part of the Illumina Truseq Amplicon workflow since 2012, and is available on BaseSpace and on the MiSeq sequencing platforms. Pisces has been available to the public on github, since 2015. (https://github.com/Illumina/Pisces) Since that time, the Pisces variant calling team have continued to develop Pisces, and have made available a suite of variant calling tools, including a ReadStitcher, Variant Phaser, and Variant Quality Recalibration tool, to be used along with the core variant caller, Pisces. Here, we describe the Pisces variant calling tools and core algorithms. We describe the common use cases for Pisces (not necessarily restricted to somatic variant calling). We also evaluate Pisces performance on somatic and germline datasets, both from the titration of well characterized samples, and from a corpus of 500 FFPE-treated clinical trial tumor samples, against other variant callers. Our results show that Pisces gives highly accurate results in a variety of contexts. We recommend Pisces for amplicon somatic and germline variant calling.
一种可靠、准确地检测肿瘤样本中罕见DNA突变的方法对癌症研究至关重要。由于许多临床组织库只有ffpe降解的肿瘤样本,而没有来自健康组织的匹配正常样本,因此在没有匹配正常样本的情况下,能够从背景噪声中区分低频突变对研究特别重要。目前的变体调用器,如GATK和VarScan,专注于种系变体调用(用于检测孟德尔遗传模式下的遗传突变),或者,在FreeBayes和MuTect的情况下,专注于肿瘤-正常关节变体调用(使用正常样本帮助从背景噪声中区分低频体细胞突变)。我们介绍了双鱼座,一个由Illumina独家开发的肿瘤变体调用者,用于从下一代测序数据中检测低频突变。自2012年以来,双鱼座一直是Illumina Truseq Amplicon工作流程中不可或缺的一部分,可在BaseSpace和MiSeq测序平台上使用。自2015年以来,双鱼座已经在github上向公众开放。(https://github.com/Illumina/Pisces)从那时起,双鱼座变体调用团队继续开发双鱼座,并提供了一套变体调用工具,包括ReadStitcher, variant Phaser和variant Quality Recalibration工具,与核心变体调用者双鱼座一起使用。在这里,我们描述了双鱼座变体调用工具和核心算法。我们描述了双鱼座的常见用例(不一定限于体细胞变体调用)。我们还评估了双鱼座在体细胞和种系数据集上的表现,这些数据集来自鉴定良好的样本的滴定,以及来自500个ffpe治疗的临床试验肿瘤样本的样本,与其他变异的调用者相比。我们的研究结果表明,双鱼座在各种情况下都能给出非常准确的结果。我们推荐双鱼座的扩增子体细胞和种系变异召唤。
{"title":"Pisces: An Accurate and Versatile Single Sample Somatic and Germline Variant Caller","authors":"Tamsen Dunn, G. Berry, Dorothea Emig-Agius, Yu Jiang, A. Iyer, N. Udar, Michael P. Strömberg","doi":"10.1145/3107411.3108203","DOIUrl":"https://doi.org/10.1145/3107411.3108203","url":null,"abstract":"A method for robustly and accurately detecting rare DNA mutations in tumor samples is critical to cancer research. Because many clinical tissue repositories have only FFPE-degraded tumor samples, and no matched normal sample from healthy tissue available, being able to discriminate low frequency mutations from background noise in the absence of a matched normal sample is of particular importance to research. Current state of the art variant callers such as GATK and VarScan focus on germline variant calling (used for detecting inherited mutations following a Mendelian inheritance pattern) or, in the case of FreeBayes and MuTect, focus on tumor-normal joint variant calling (using the normal sample to help discriminate low frequency somatic mutations from back ground noise). We present Pisces, a tumor-only variant caller exclusively developed at Illumina for detecting low frequency mutations from next generation sequencing data. Pisces has been an integral part of the Illumina Truseq Amplicon workflow since 2012, and is available on BaseSpace and on the MiSeq sequencing platforms. Pisces has been available to the public on github, since 2015. (https://github.com/Illumina/Pisces) Since that time, the Pisces variant calling team have continued to develop Pisces, and have made available a suite of variant calling tools, including a ReadStitcher, Variant Phaser, and Variant Quality Recalibration tool, to be used along with the core variant caller, Pisces. Here, we describe the Pisces variant calling tools and core algorithms. We describe the common use cases for Pisces (not necessarily restricted to somatic variant calling). We also evaluate Pisces performance on somatic and germline datasets, both from the titration of well characterized samples, and from a corpus of 500 FFPE-treated clinical trial tumor samples, against other variant callers. Our results show that Pisces gives highly accurate results in a variety of contexts. We recommend Pisces for amplicon somatic and germline variant calling.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130083217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
NGSPipes: Fostering Reproducibility and Scalability in Biosciences NGSPipes:促进生物科学的可重复性和可扩展性
Bruno Dantas, Calmenelias Fleitas, Alexandre Almeida, J. Forja, Alexandre P. Francisco, José Simão, Cátia Vaz
Biosciences have been revolutionised by NGS technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce, delaying the adoption of new methodologies and hindering innovation. Even if data and tools are freely available, pipelines for data analysis are in general barely described and their setup is far from trivial. NGSPipes addresses these issues reducing the efforts necessary to define, build and deploy pipelines, either at a local workstation or in the cloud. NGSPipes framework is freely available at http://ngspipes.github.io/.
在过去的几年里,NGS技术已经彻底改变了生物科学,为医疗、工业和环境应用带来了新的前景。尽管我们的动机来自于生物科学,但以下是许多科学领域的真实情况:发表的结果通常难以复制,延迟了新方法的采用并阻碍了创新。即使数据和工具是免费提供的,用于数据分析的管道通常也很少被描述,它们的设置也远非微不足道。ngspipe解决了这些问题,减少了在本地工作站或云中定义、构建和部署管道所需的工作量。ngspipe框架可以在http://ngspipes.github.io/上免费获得。
{"title":"NGSPipes: Fostering Reproducibility and Scalability in Biosciences","authors":"Bruno Dantas, Calmenelias Fleitas, Alexandre Almeida, J. Forja, Alexandre P. Francisco, José Simão, Cátia Vaz","doi":"10.1145/3107411.3108213","DOIUrl":"https://doi.org/10.1145/3107411.3108213","url":null,"abstract":"Biosciences have been revolutionised by NGS technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce, delaying the adoption of new methodologies and hindering innovation. Even if data and tools are freely available, pipelines for data analysis are in general barely described and their setup is far from trivial. NGSPipes addresses these issues reducing the efforts necessary to define, build and deploy pipelines, either at a local workstation or in the cloud. NGSPipes framework is freely available at http://ngspipes.github.io/.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"228 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114592650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
3D Genome Structure Modeling by Lorentzian Objective Function 基于Lorentzian目标函数的三维基因组结构建模
Tuan Trieu, Jianlin Cheng
Reconstructing 3D structure of a genome from chromosomal conformation capturing data such as Hi-C data has emerged as an important problem in bioinformatics and computational biology in the recent years. In this talk, I will present our latest method that uses Lorentzian function to describe distance restraints between chromosomal regions, which will be used to guide the reconstruction of 3D structures of individual chromosomes and an entire genome. The method is more robust against noisy distance restraints derived from Hi-C data than traditional objective functions such as squared error function and Gaussian probabilistic function. The method can handle both intra- and inter-chromosomal contacts effectively to build 3D structures of a big genome such as the human genome consisting of a number of chromosomes, which are not possible with most existing methods. We have released the Java source code that implements the method (called LorDG) at GitHub (https://github.com/BDM-Lab/LorDG), which is being used by the community to model 3D genome structures. We are currently further improving the method to build very high-resolution (e.g. 1KB base pair) 3D genome and chromosome models.
利用染色体构象捕获数据(如Hi-C数据)重建基因组的三维结构是近年来生物信息学和计算生物学中的一个重要问题。在这次演讲中,我将介绍我们最新的方法,使用洛伦兹函数来描述染色体区域之间的距离限制,这将用于指导单个染色体和整个基因组的三维结构的重建。与传统的误差平方函数和高斯概率函数等目标函数相比,该方法对来自Hi-C数据的噪声距离约束具有更好的鲁棒性。该方法可以有效地处理染色体内和染色体间的接触,以构建由许多染色体组成的人类基因组等大基因组的三维结构,这是大多数现有方法无法实现的。我们已经在GitHub (https://github.com/BDM-Lab/LorDG)上发布了实现该方法(称为LorDG)的Java源代码,社区正在使用它来建模3D基因组结构。我们目前正在进一步改进该方法,以建立非常高分辨率(例如1KB碱基对)的3D基因组和染色体模型。
{"title":"3D Genome Structure Modeling by Lorentzian Objective Function","authors":"Tuan Trieu, Jianlin Cheng","doi":"10.1145/3107411.3107455","DOIUrl":"https://doi.org/10.1145/3107411.3107455","url":null,"abstract":"Reconstructing 3D structure of a genome from chromosomal conformation capturing data such as Hi-C data has emerged as an important problem in bioinformatics and computational biology in the recent years. In this talk, I will present our latest method that uses Lorentzian function to describe distance restraints between chromosomal regions, which will be used to guide the reconstruction of 3D structures of individual chromosomes and an entire genome. The method is more robust against noisy distance restraints derived from Hi-C data than traditional objective functions such as squared error function and Gaussian probabilistic function. The method can handle both intra- and inter-chromosomal contacts effectively to build 3D structures of a big genome such as the human genome consisting of a number of chromosomes, which are not possible with most existing methods. We have released the Java source code that implements the method (called LorDG) at GitHub (https://github.com/BDM-Lab/LorDG), which is being used by the community to model 3D genome structures. We are currently further improving the method to build very high-resolution (e.g. 1KB base pair) 3D genome and chromosome models.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Statistical Analysis of Computed Energy Landscapes to Understand Dysfunction in Pathogenic Protein Variants 计算能量格局的统计分析,以了解致病性蛋白质变异的功能障碍
Wanli Qiao, T. Maximova, E. Plaku, Amarda Shehu
The energy landscape underscores the inherent nature of proteins as dynamic systems interconverting between structures with varying energies. The protein energy landscape contains much of the information needed to characterize protein equilibrium dynamics and relate it to function. It is now possible to reconstruct energy landscapes of medium-size proteins with sufficient prior structure data. These developments turn the focus to tools for analysis and comparison of energy landscapes as a means of formulating hypotheses on the impact of sequence mutations on (dys)function via altered landscape features. We present such a method here and provide a detailed evaluation of its capabilities on an enzyme central to human biology. The work presented here opens up an interesting avenue into automated analysis and summarization of landscapes that yields itself to machine learning approaches at the energy landscape level.
能量景观强调了蛋白质作为动态系统在不同能量结构之间相互转换的内在本质。蛋白质能量景观包含了表征蛋白质平衡动力学并将其与功能联系起来所需的许多信息。利用足够的先验结构数据,现在可以重建中等大小蛋白质的能量结构。这些发展将重点转向能源景观的分析和比较工具,作为通过改变景观特征来制定序列突变对(天)功能影响的假设的手段。我们在这里提出了这样一种方法,并提供了其对人类生物学中心酶的能力的详细评估。这里展示的工作为自动分析和总结景观开辟了一条有趣的途径,使其在能源景观层面上产生机器学习方法。
{"title":"Statistical Analysis of Computed Energy Landscapes to Understand Dysfunction in Pathogenic Protein Variants","authors":"Wanli Qiao, T. Maximova, E. Plaku, Amarda Shehu","doi":"10.1145/3107411.3107499","DOIUrl":"https://doi.org/10.1145/3107411.3107499","url":null,"abstract":"The energy landscape underscores the inherent nature of proteins as dynamic systems interconverting between structures with varying energies. The protein energy landscape contains much of the information needed to characterize protein equilibrium dynamics and relate it to function. It is now possible to reconstruct energy landscapes of medium-size proteins with sufficient prior structure data. These developments turn the focus to tools for analysis and comparison of energy landscapes as a means of formulating hypotheses on the impact of sequence mutations on (dys)function via altered landscape features. We present such a method here and provide a detailed evaluation of its capabilities on an enzyme central to human biology. The work presented here opens up an interesting avenue into automated analysis and summarization of landscapes that yields itself to machine learning approaches at the energy landscape level.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116564511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Tumor Neoantigens Derived from RNA Sequencing Analysis 来自RNA测序分析的肿瘤新抗原
Shaojun Tang, Suthee Rapisuwon, A. Wellstein, Subha Madhavan
Successful treatment of cancers with Immune Checkpoint Inhibitors (ICIs) has been associated with the mutational load of tumors. The biological rationale for this association between mutational load and ICI response is that neoantigens are generated by mutations in protein coding sequences that provide a steady flow of neoantigens to prime the immune system for the production of antigen-specific tumor-infiltrating lymphocytes (TILs). It is thought that mutant protein fragments will lead to altered MHC/peptide recognition and immune cell activation; ICI treatment enhances TIL functionality. Neoantigens are also relevant for an alternative, cell-based immunotherapeutic approach, i.e. Adoptive Cell Transfer (ACT). This concept of neoantigens derived from DNA mutations has led to an intense line of investigation to uncover relevant neoantigens. However, there has been mixed success with the current neoantigen discovery approach based on DNA mutation analysis of tumor samples by exome sequencing of genomic DNA. The current concept of neoantigens derived from mutant DNA ignores an alternative mechanism that can also generate neoantigens in cancers: Posttranscriptional editing of primary RNA. Here we propose to use full-length Single Molecule Real Time (SMRT) RNAseq to uncover pathologically edited mRNAs in cancers and complement the discovery of pathologic mRNA. We will discuss the respective algorithms and propose the combination with identification of candidate neoantigen peptides by mass spectrometry.
使用免疫检查点抑制剂(ICIs)成功治疗癌症与肿瘤的突变负荷有关。突变负荷和ICI反应之间的这种关联的生物学原理是,新抗原是由蛋白质编码序列的突变产生的,这些突变提供了稳定的新抗原流,为抗原特异性肿瘤浸润淋巴细胞(TILs)的产生启动免疫系统。据认为,突变蛋白片段将导致MHC/肽识别和免疫细胞活化的改变;ICI治疗可增强TIL功能。新抗原也与另一种基于细胞的免疫治疗方法相关,即过继细胞转移(ACT)。新抗原来源于DNA突变的这一概念导致了一种强烈的调查线,以发现相关的新抗原。然而,目前基于基因组DNA外显子组测序对肿瘤样本进行DNA突变分析的新抗原发现方法取得了不同程度的成功。目前的新抗原来源于突变DNA的概念忽略了另一种可以在癌症中产生新抗原的机制:初级RNA的转录后编辑。在这里,我们建议使用全长单分子实时(SMRT) RNAseq来发现癌症中病理编辑的mRNA,并补充病理mRNA的发现。我们将讨论各自的算法,并提出结合鉴定候选新抗原肽的质谱。
{"title":"Tumor Neoantigens Derived from RNA Sequencing Analysis","authors":"Shaojun Tang, Suthee Rapisuwon, A. Wellstein, Subha Madhavan","doi":"10.1145/3107411.3108210","DOIUrl":"https://doi.org/10.1145/3107411.3108210","url":null,"abstract":"Successful treatment of cancers with Immune Checkpoint Inhibitors (ICIs) has been associated with the mutational load of tumors. The biological rationale for this association between mutational load and ICI response is that neoantigens are generated by mutations in protein coding sequences that provide a steady flow of neoantigens to prime the immune system for the production of antigen-specific tumor-infiltrating lymphocytes (TILs). It is thought that mutant protein fragments will lead to altered MHC/peptide recognition and immune cell activation; ICI treatment enhances TIL functionality. Neoantigens are also relevant for an alternative, cell-based immunotherapeutic approach, i.e. Adoptive Cell Transfer (ACT). This concept of neoantigens derived from DNA mutations has led to an intense line of investigation to uncover relevant neoantigens. However, there has been mixed success with the current neoantigen discovery approach based on DNA mutation analysis of tumor samples by exome sequencing of genomic DNA. The current concept of neoantigens derived from mutant DNA ignores an alternative mechanism that can also generate neoantigens in cancers: Posttranscriptional editing of primary RNA. Here we propose to use full-length Single Molecule Real Time (SMRT) RNAseq to uncover pathologically edited mRNAs in cancers and complement the discovery of pathologic mRNA. We will discuss the respective algorithms and propose the combination with identification of candidate neoantigen peptides by mass spectrometry.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123487449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 2017 Computational Structural Bioinformatics Workshop: CSBW 2017 2017计算结构生物信息学研讨会:CSBW 2017
Nurit Haspel, Amarda Shehu, Kevin Molloy
The rapid accumulation of macromolecular structures presents a unique set of challenges and opportunities in the analysis, comparison, modeling, and prediction of macromolecular structures and interactions. This workshop aims to bring together researchers with expertise in bioinformatics, computational biology, structural biology, data mining, optimization and high performance computing to discuss new results, techniques, and research problems in computational structural bioinformatics.
大分子结构的快速积累为大分子结构和相互作用的分析、比较、建模和预测带来了独特的挑战和机遇。本次研讨会旨在汇集生物信息学、计算生物学、结构生物学、数据挖掘、优化和高性能计算领域的研究人员,讨论计算结构生物信息学的新成果、新技术和新研究问题。
{"title":"The 2017 Computational Structural Bioinformatics Workshop: CSBW 2017","authors":"Nurit Haspel, Amarda Shehu, Kevin Molloy","doi":"10.1145/3107411.3108166","DOIUrl":"https://doi.org/10.1145/3107411.3108166","url":null,"abstract":"The rapid accumulation of macromolecular structures presents a unique set of challenges and opportunities in the analysis, comparison, modeling, and prediction of macromolecular structures and interactions. This workshop aims to bring together researchers with expertise in bioinformatics, computational biology, structural biology, data mining, optimization and high performance computing to discuss new results, techniques, and research problems in computational structural bioinformatics.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122349046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Super-enhancer Dynamics Throughout Myogenesis 肌肉生成过程中的超级增强剂动力学
Basma Abdelkarim, T. Perkins
Genome-wide ChIP-seq analysis of transcription factor binding and histone marks has uncovered large regulatory domains, known as super-enhancers, that consist of clusters of active enhancers within 12.5 kb of each other. Super-enhancers are characterized by high abundance of H3K27ac histone marks, disproportionate binding of master regulators and coactivators, and drive the expression of important cell identity genes. The algorithm, Rank Ordering of Super-Enhancers (ROSE), was developed to identify SEs based on their characteristics and has been extensively used, on various cell types. Less attention has been aimed at understanding how super-enhancers change in different cellular contexts, and in particular, during the differentiation of stem cells. We use ROSE, in conjunction with other tools, to investigate the dynamics of super-enhancers across myogenesis. Using ChIP-seq data for various transcription factors and stage-matched RNA-seq data, we characterize several super-enhancer regions and their associated genes in myoblasts and myotubes, finding them to be largely stage-specific.
转录因子结合和组蛋白标记的全基因组ChIP-seq分析发现了被称为超级增强子的大型调控结构域,这些结构域由12.5 kb以内的活性增强子簇组成。超级增强子的特点是高丰度的H3K27ac组蛋白标记,与主调控因子和共激活因子不成比例的结合,并驱动重要细胞身份基因的表达。超级增强子Rank Ordering of super - enhancer (ROSE)算法是一种基于se特征的识别算法,已被广泛应用于各种细胞类型。很少有人关注超级增强子如何在不同的细胞环境中发生变化,特别是在干细胞分化过程中。我们使用ROSE,结合其他工具,来研究超级增强剂在肌肉形成过程中的动态。利用各种转录因子的ChIP-seq数据和阶段匹配的RNA-seq数据,我们表征了成肌细胞和肌管中的几个超级增强子区域及其相关基因,发现它们在很大程度上是阶段特异性的。
{"title":"Super-enhancer Dynamics Throughout Myogenesis","authors":"Basma Abdelkarim, T. Perkins","doi":"10.1145/3107411.3108187","DOIUrl":"https://doi.org/10.1145/3107411.3108187","url":null,"abstract":"Genome-wide ChIP-seq analysis of transcription factor binding and histone marks has uncovered large regulatory domains, known as super-enhancers, that consist of clusters of active enhancers within 12.5 kb of each other. Super-enhancers are characterized by high abundance of H3K27ac histone marks, disproportionate binding of master regulators and coactivators, and drive the expression of important cell identity genes. The algorithm, Rank Ordering of Super-Enhancers (ROSE), was developed to identify SEs based on their characteristics and has been extensively used, on various cell types. Less attention has been aimed at understanding how super-enhancers change in different cellular contexts, and in particular, during the differentiation of stem cells. We use ROSE, in conjunction with other tools, to investigate the dynamics of super-enhancers across myogenesis. Using ChIP-seq data for various transcription factors and stage-matched RNA-seq data, we characterize several super-enhancer regions and their associated genes in myoblasts and myotubes, finding them to be largely stage-specific.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125897192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session 17: Biological Modeling 会议详情:第17部分:生物建模
N. Yanamala
{"title":"Session details: Session 17: Biological Modeling","authors":"N. Yanamala","doi":"10.1145/3254560","DOIUrl":"https://doi.org/10.1145/3254560","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127468351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependency and AMR Embeddings for Drug-Drug Interaction Extraction from Biomedical Literature 生物医学文献中药物-药物相互作用提取的依赖关系和AMR嵌入
Yanshan Wang, Sijia Liu, M. Rastegar-Mojarad, Liwei Wang, F. Shen, Fei Liu, Hongfang Liu
Drug-drug interaction (DDI) is an unexpected change in a drug's effect on the human body when the drug and a second drug are co-prescribed and taken together. As many DDIs are frequently reported in biomedical literature, it is important to mine DDI information from literature to keep DDI knowledge up to date. One of the SemEval challenges in the year 2011 and 2013 was designed to tackle the task where the best system achieved an F1 score of 0.80. In this paper, we propose to utilize dependency embeddings and Abstract Meaning Representation (AMR) embeddings as features for extracting DDIs. Our contribution is two-fold. First, we employed dependency embeddings, previously shown effective for sentence classification, for DDI extraction. The dependency embeddings incorporated structural syntactic contexts into the embeddings, which were not present in the conventional word embeddings. Second, we proposed a novel syntactic embedding approach using AMR. AMR aims to abstract away from syntactic idiosyncrasies and attempts to capture only the core meaning of a sentence, which could potentially improve DDI extraction from sentences. Two classifiers (Support Vector Machine and Random Forest) taking these embedding features as input were evaluated on the DDIExtraction 2013 challenge corpus. The experimental results show the effectiveness of dependency and AMR embeddings in the DDI extraction task. The best performance was obtained by combining word, dependency and AMR embeddings (F1 score=0.84).
药物-药物相互作用(DDI)是一种药物与另一种药物共同处方并一起服用时对人体的作用发生的意想不到的变化。由于生物医学文献中经常报道DDI,因此从文献中挖掘DDI信息以保持DDI知识的更新非常重要。在2011年和2013年的SemEval挑战赛中,其中一项挑战的设计是让最好的系统在F1中得分达到0.80。本文提出利用依赖关系嵌入和抽象意义表示(AMR)嵌入作为提取ddi的特征。我们的贡献是双重的。首先,我们使用依赖嵌入进行DDI提取,该方法在之前的句子分类中被证明是有效的。依赖项嵌入将传统词嵌入中不存在的结构句法上下文整合到嵌入中。其次,我们提出了一种新的基于AMR的句法嵌入方法。AMR旨在从句法特质中抽象出来,并试图仅捕获句子的核心含义,这可能会提高从句子中提取DDI的能力。以这些嵌入特征为输入的两种分类器(支持向量机和随机森林)在DDIExtraction 2013挑战语料库上进行了评估。实验结果表明了依赖关系和AMR嵌入在DDI提取任务中的有效性。单词、依赖关系和AMR组合嵌入效果最佳(F1得分=0.84)。
{"title":"Dependency and AMR Embeddings for Drug-Drug Interaction Extraction from Biomedical Literature","authors":"Yanshan Wang, Sijia Liu, M. Rastegar-Mojarad, Liwei Wang, F. Shen, Fei Liu, Hongfang Liu","doi":"10.1145/3107411.3107426","DOIUrl":"https://doi.org/10.1145/3107411.3107426","url":null,"abstract":"Drug-drug interaction (DDI) is an unexpected change in a drug's effect on the human body when the drug and a second drug are co-prescribed and taken together. As many DDIs are frequently reported in biomedical literature, it is important to mine DDI information from literature to keep DDI knowledge up to date. One of the SemEval challenges in the year 2011 and 2013 was designed to tackle the task where the best system achieved an F1 score of 0.80. In this paper, we propose to utilize dependency embeddings and Abstract Meaning Representation (AMR) embeddings as features for extracting DDIs. Our contribution is two-fold. First, we employed dependency embeddings, previously shown effective for sentence classification, for DDI extraction. The dependency embeddings incorporated structural syntactic contexts into the embeddings, which were not present in the conventional word embeddings. Second, we proposed a novel syntactic embedding approach using AMR. AMR aims to abstract away from syntactic idiosyncrasies and attempts to capture only the core meaning of a sentence, which could potentially improve DDI extraction from sentences. Two classifiers (Support Vector Machine and Random Forest) taking these embedding features as input were evaluated on the DDIExtraction 2013 challenge corpus. The experimental results show the effectiveness of dependency and AMR embeddings in the DDI extraction task. The best performance was obtained by combining word, dependency and AMR embeddings (F1 score=0.84).","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128810002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Preconditioned Random Forest Regression: Application to Genome-Wide Study for Radiotherapy Toxicity Prediction 预条件随机森林回归:在放疗毒性预测全基因组研究中的应用
Sangkyun Lee, S. Kerns, B. Rosenstein, H. Ostrer, J. Deasy, J. Oh
Urinary toxicity after radiotherapy (RT) limits the quality of life of prostate cancer patients, and clinically actionable prediction has yet to be achieved. We aim to exploit genome-wide variants to accurately identify patients at higher congenital toxicity risk. We applied preconditioned random forest regression (PRFR) to predict four urinary symptoms. For a weak stream endpoint, the PRFR model achieved an area under the curve (AUC) of 0.7 on holdout validation. Preconditioning enhanced the performance of random forest. Gene ontology (GO) analysis showed that neurogenic biological processes are associated with the toxicity. Upon further validation, the predictive model can be used to potentially benefit the health of prostate cancer patients treated with radiotherapy.
放疗后尿毒性(RT)限制了前列腺癌患者的生活质量,临床可行的预测尚未实现。我们的目标是利用全基因组变异来准确识别具有较高先天性毒性风险的患者。我们应用预条件随机森林回归(PRFR)预测四种泌尿系统症状。对于弱流端点,PRFR模型在holdout验证时实现了0.7的曲线下面积(AUC)。预处理提高了随机森林的性能。基因本体论(GO)分析表明,神经源性生物学过程与毒性有关。在进一步验证后,该预测模型可用于潜在地有益于前列腺癌放疗患者的健康。
{"title":"Preconditioned Random Forest Regression: Application to Genome-Wide Study for Radiotherapy Toxicity Prediction","authors":"Sangkyun Lee, S. Kerns, B. Rosenstein, H. Ostrer, J. Deasy, J. Oh","doi":"10.1145/3107411.3108201","DOIUrl":"https://doi.org/10.1145/3107411.3108201","url":null,"abstract":"Urinary toxicity after radiotherapy (RT) limits the quality of life of prostate cancer patients, and clinically actionable prediction has yet to be achieved. We aim to exploit genome-wide variants to accurately identify patients at higher congenital toxicity risk. We applied preconditioned random forest regression (PRFR) to predict four urinary symptoms. For a weak stream endpoint, the PRFR model achieved an area under the curve (AUC) of 0.7 on holdout validation. Preconditioning enhanced the performance of random forest. Gene ontology (GO) analysis showed that neurogenic biological processes are associated with the toxicity. Upon further validation, the predictive model can be used to potentially benefit the health of prostate cancer patients treated with radiotherapy.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129263555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1