首页 > 最新文献

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics最新文献

英文 中文
Session details: Session 19: Automated Diagnosys and Prediction II 会议详情:第19部分:自动诊断和预测II
Dong Si
{"title":"Session details: Session 19: Automated Diagnosys and Prediction II","authors":"Dong Si","doi":"10.1145/3254562","DOIUrl":"https://doi.org/10.1145/3254562","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125466437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Associating Genomics and Clinical Information by Means of Semantic Based Ranking 基于语义排序的基因组学与临床信息关联
F. Cristiano, G. Tradigo, P. Veltri
Relating genomic data with clinical and disease information is a new challenge for life sciences research. High performance computational platforms allow huge quantity of biological data production with new technologies (e.g. Next Generation Sequencing techniques). Nowadays, genomic ontologies describing genes and functions, as well as databases containing diseases groups, are available. We focus on the problem of enriching genomic datasets containing miRNA genes by adding related disease information. The enrichment is performed by using ontologies to find genes-to-diseases associations. Ontologies are used to describe molecular genomic processes and functions, as well as disease classes and experimental details. International Classification of Diseases (ICD) is used for the classification of diseases and clinical information. Diseases are ranked by using a Google Page Rank based algorithm. An application tool called Surf App! has been coded and developed in R and tested on a neurological disease dataset.
将基因组数据与临床和疾病信息联系起来是生命科学研究的新挑战。高性能计算平台允许使用新技术(例如下一代测序技术)生产大量生物数据。如今,描述基因和功能的基因组本体以及包含疾病组的数据库都是可用的。我们专注于通过添加相关疾病信息来丰富包含miRNA基因的基因组数据集。富集是通过使用本体来寻找基因与疾病的关联来完成的。本体论用于描述分子基因组过程和功能,以及疾病类别和实验细节。国际疾病分类(ICD)用于疾病分类和临床信息。疾病排名使用基于谷歌页面排名的算法。一个叫做Surf App!已经用R语言编码和开发,并在神经系统疾病数据集上进行了测试。
{"title":"Associating Genomics and Clinical Information by Means of Semantic Based Ranking","authors":"F. Cristiano, G. Tradigo, P. Veltri","doi":"10.1145/3107411.3107436","DOIUrl":"https://doi.org/10.1145/3107411.3107436","url":null,"abstract":"Relating genomic data with clinical and disease information is a new challenge for life sciences research. High performance computational platforms allow huge quantity of biological data production with new technologies (e.g. Next Generation Sequencing techniques). Nowadays, genomic ontologies describing genes and functions, as well as databases containing diseases groups, are available. We focus on the problem of enriching genomic datasets containing miRNA genes by adding related disease information. The enrichment is performed by using ontologies to find genes-to-diseases associations. Ontologies are used to describe molecular genomic processes and functions, as well as disease classes and experimental details. International Classification of Diseases (ICD) is used for the classification of diseases and clinical information. Diseases are ranked by using a Google Page Rank based algorithm. An application tool called Surf App! has been coded and developed in R and tested on a neurological disease dataset.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121419747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Antidote Application: An Educational System for Treatment of Common Toxin Overdose 解毒剂应用:治疗常见毒素过量的教育系统
Jon Long, Yingyuan Zhang, V. Brusic, Lubomir T. Chitkushev, Guanglan Zhang
Poisonings account for almost 1% of emergency room visits each year. Time is a critical factor in dealing with a toxicologic emergency. Delay in dispensing the first antidote dose can lead to life-threatening sequelae. Current toxicological resources that support treatment decisions are broad in scope, time-consuming to read, or at times unavailable. Our review of current toxicological resources revealed a gap in their ability to provide expedient calculations and recommendations about appropriate course of treatment. To bridge the gap, we developed the Antidote Application (AA), a computational system that automatically provides patient-specific antidote treatment recommendations and individualized dose calculations. We implemented 27 algorithms that describe FDA (the US Food and Drug Administration) approved use and evidence-based practices found in primary literature for the treatment of common toxin exposure. The AA covers 29 antidotes recommended by Poison Control and toxicology experts, 19 poison classes and 31 poisons, which represent over 200 toxic entities. To the best of our knowledge, the AA is the first educational decision support system in toxicology that provides patient-specific treatment recommendations and drug dose calculations. The AA is publicly available at http://projects.met-hilab.org/antidote/.
中毒每年占急诊室就诊人数的近1%。时间是处理突发毒理学事件的关键因素。延迟分配第一剂解毒剂可能导致危及生命的后遗症。目前支持治疗决策的毒理学资源范围广泛,阅读耗时,或者有时不可用。我们对现有毒理学资源的回顾显示,它们在提供适当的治疗过程的权宜计算和建议方面存在差距。为了弥补差距,我们开发了解毒剂应用程序(AA),这是一个计算系统,可以自动提供针对患者的解毒剂治疗建议和个体化剂量计算。我们实施了27种算法,这些算法描述了FDA(美国食品和药物管理局)在主要文献中发现的用于治疗常见毒素暴露的批准使用和循证实践。AA涵盖了中毒控制和毒理学专家推荐的29种解毒剂,19种毒药类别和31种毒药,代表了200多种有毒物质。据我们所知,AA是毒理学领域第一个教育决策支持系统,提供针对患者的治疗建议和药物剂量计算。AA可在http://projects.met-hilab.org/antidote/上公开获取。
{"title":"Antidote Application: An Educational System for Treatment of Common Toxin Overdose","authors":"Jon Long, Yingyuan Zhang, V. Brusic, Lubomir T. Chitkushev, Guanglan Zhang","doi":"10.1145/3107411.3107415","DOIUrl":"https://doi.org/10.1145/3107411.3107415","url":null,"abstract":"Poisonings account for almost 1% of emergency room visits each year. Time is a critical factor in dealing with a toxicologic emergency. Delay in dispensing the first antidote dose can lead to life-threatening sequelae. Current toxicological resources that support treatment decisions are broad in scope, time-consuming to read, or at times unavailable. Our review of current toxicological resources revealed a gap in their ability to provide expedient calculations and recommendations about appropriate course of treatment. To bridge the gap, we developed the Antidote Application (AA), a computational system that automatically provides patient-specific antidote treatment recommendations and individualized dose calculations. We implemented 27 algorithms that describe FDA (the US Food and Drug Administration) approved use and evidence-based practices found in primary literature for the treatment of common toxin exposure. The AA covers 29 antidotes recommended by Poison Control and toxicology experts, 19 poison classes and 31 poisons, which represent over 200 toxic entities. To the best of our knowledge, the AA is the first educational decision support system in toxicology that provides patient-specific treatment recommendations and drug dose calculations. The AA is publicly available at http://projects.met-hilab.org/antidote/.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"267 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121121661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Predicting Breast Cancer Outcome under Different Treatments by Feature Selection Approaches 用特征选择方法预测不同治疗下乳腺癌的预后
H. Pham, L. Rueda, A. Ngom
Gene expression data have been used in many researches to help reveal the underlying mechanism of many diseases. In this study, we applied feature selection techniques on breast cancer patients in the METABRIC Study to predict whether patients will be disease free or not, under different treatments. Our models for prediction are of high performance, thus, the genes in those models might help reveal the mechanism of the disease, and these potential biomarkers can become targets for new therapies.
基因表达数据已被用于许多研究,以帮助揭示许多疾病的潜在机制。在本研究中,我们将METABRIC研究中的特征选择技术应用于乳腺癌患者,以预测患者在不同治疗下是否无病。我们的预测模型是高性能的,因此,这些模型中的基因可能有助于揭示疾病的机制,这些潜在的生物标志物可以成为新疗法的靶点。
{"title":"Predicting Breast Cancer Outcome under Different Treatments by Feature Selection Approaches","authors":"H. Pham, L. Rueda, A. Ngom","doi":"10.1145/3107411.3108226","DOIUrl":"https://doi.org/10.1145/3107411.3108226","url":null,"abstract":"Gene expression data have been used in many researches to help reveal the underlying mechanism of many diseases. In this study, we applied feature selection techniques on breast cancer patients in the METABRIC Study to predict whether patients will be disease free or not, under different treatments. Our models for prediction are of high performance, thus, the genes in those models might help reveal the mechanism of the disease, and these potential biomarkers can become targets for new therapies.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116059119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SeqyClean: A Pipeline for High-throughput Sequence Data Preprocessing SeqyClean:一个用于高通量序列数据预处理的管道
I. Zhbannikov, Samuel S. Hunter, J. Foster, M. Settles
Modern high-throughput sequencing instruments produce massive amounts of data, which often contains noise in the form of sequencing errors, sequencing adaptors, and contaminating reads. This noise complicates genomics studies. Although many preprocessing software tools have been developed to reduce the sequence noise, many of them cannot handle data from multiple technologies and few address more than one type of noise. We present SeqyClean, a comprehensive preprocessing software pipeline. SeqyClean effectively removes multiple sources of noise in high throughput sequence data and, according to our tests, outperforms other available preprocessing tools. We show that preprocessing data with SeqyClean first improves both de-novo genome assembly and genome mapping. We have used SeqyClean extensively in the genomics core at the Institute for Bioinformatics and Evolutionary STudies (IBEST) at the University of Idaho, so it has been validated with both test and production data. SeqyClean is available as open source software under the MIT License at http://github.com/ibest/seqyclean
现代高通量测序仪器产生大量数据,这些数据通常包含测序错误、测序适配器和污染读取的噪声。这种噪音使基因组学研究复杂化。虽然已经开发了许多预处理软件工具来降低序列噪声,但其中许多软件工具无法处理来自多种技术的数据,并且很少处理一种以上的噪声。我们提出了SeqyClean,一个全面的预处理软件管道。SeqyClean有效地消除了高通量序列数据中的多个噪声源,并且根据我们的测试,优于其他可用的预处理工具。我们发现,使用SeqyClean对数据进行预处理首先提高了从头基因组组装和基因组定位。我们已经在爱达荷大学生物信息学和进化研究所(IBEST)的基因组学核心中广泛使用了SeqyClean,因此它已经通过测试和生产数据进行了验证。SeqyClean是MIT许可下的开源软件,可在http://github.com/ibest/seqyclean上获得
{"title":"SeqyClean: A Pipeline for High-throughput Sequence Data Preprocessing","authors":"I. Zhbannikov, Samuel S. Hunter, J. Foster, M. Settles","doi":"10.1145/3107411.3107446","DOIUrl":"https://doi.org/10.1145/3107411.3107446","url":null,"abstract":"Modern high-throughput sequencing instruments produce massive amounts of data, which often contains noise in the form of sequencing errors, sequencing adaptors, and contaminating reads. This noise complicates genomics studies. Although many preprocessing software tools have been developed to reduce the sequence noise, many of them cannot handle data from multiple technologies and few address more than one type of noise. We present SeqyClean, a comprehensive preprocessing software pipeline. SeqyClean effectively removes multiple sources of noise in high throughput sequence data and, according to our tests, outperforms other available preprocessing tools. We show that preprocessing data with SeqyClean first improves both de-novo genome assembly and genome mapping. We have used SeqyClean extensively in the genomics core at the Institute for Bioinformatics and Evolutionary STudies (IBEST) at the University of Idaho, so it has been validated with both test and production data. SeqyClean is available as open source software under the MIT License at http://github.com/ibest/seqyclean","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124954723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Integrative Sufficient Dimension Reduction Methods for Multi-Omics Data Analysis 多组学数据分析的综合充分降维方法
Yashita Jain, Shanshan Ding
With the advent of high throughput genome-wide assays it has become possible to simultaneously measure multiple types of genomic data. Several projects like TCGA, ICGC, NCI-60 has generated comprehensive, multi-dimensional maps of the key genomic changes like MiRNA, MRNA, proteomics etc. from cancer samples[2,4]. These genomic data can be used for classifying tumour types[5]. Integrative analysis of these data from multiple sources can potentially provide additional biological insights, but methods to do any such analysis are lacking. One of the widely used solutions to handle high dimension data is by removing redundant information in the integrated sample. Most of the expressed genes are overlapped and can be projected onto lower dimension, and then be used to classify different tumor types, without the loss of any/much information. Sufficient dimension reduction (SDR) [1], a supervised dimension reduction approach, can be ideal to achieve such a goal. In this paper, we propose a novel integrative SDR method that can reduce dimensions of multiple data types simultaneously while sharing common latent structures to improve prediction and interpretation. In particular, we extend the sliced inverse regression (SIR) technique, a major SDR method, to integrate multiple omits data for simultaneous dimension reduction. SIR is a supervised dimension reduction method that assumes that the outcome variable Y depends on the predictor variable X through d unknown linear combinations of the predictor[3]. The predictor variable is replaced by its projection into a lower dimension subspace of the predictor space without the loss of information. The aim is to find the intersection of all the subspaces δ called the central susbspace (CS) of the predictor space satisfying the property Y ╨ X| Pδ X. To integrate multiple types of data, we propose and implement a new integrative sufficient dimension reduction method extending SIR[3], called integrative SIR. The main idea is that we take into account all the multi-omics data information simultaneously while finding a basis matrix for each data type with some sharing latent structures. Finally, we get d dimension data which is much smaller than the original data dimension. The reduced dimension d was achieved by cross validation. To demonstrate the integrated analysis of multi-omics data, we applied and compared conventional SIR and integrative SIR to analyze MRNA, MiRNA and proteomics expression profile of a subset of cell lines from the NCI-60 panel. The data used is taken from [6]. The outcomes we have to classify are CNS, Leukemia and Melanoma tumor types. We pre-screened 400 variables from each data type with the criteria of high variance. To find classification error, we performed random forest classification after we applied to each method with leave-one-out cross-validation. As a result, we found out that integrative SIR leads to less classification error as compared to conventional SIR. To summarize, we propo
随着高通量全基因组测定的出现,同时测量多种类型的基因组数据已经成为可能。TCGA、ICGC、NCI-60等项目已经从癌症样本中生成了MiRNA、MRNA、蛋白质组学等关键基因组变化的全面、多维图谱[2,4]。这些基因组数据可用于肿瘤类型的分类。对来自多个来源的这些数据进行综合分析可能会提供额外的生物学见解,但缺乏进行此类分析的方法。处理高维数据的一种广泛使用的解决方案是去除集成样本中的冗余信息。大多数表达的基因是重叠的,可以投射到较低的维度上,然后用于分类不同的肿瘤类型,而不会丢失任何/很多信息。充分降维(SDR)[1]是实现这一目标的理想方法,它是一种监督降维方法。在本文中,我们提出了一种新的集成SDR方法,该方法可以同时降低多种数据类型的维数,同时共享共同的潜在结构,以提高预测和解释。特别地,我们扩展了切片逆回归(SIR)技术,一种主要的SDR方法,以整合多个遗漏数据同时降维。SIR是一种监督降维方法,它假设结果变量Y通过预测器[3]的d个未知线性组合依赖于预测变量X。预测变量被其投影到预测空间的低维子空间中而不丢失信息。目的是找到所有的子空间的交集δ被称为中央susbspace (CS)的预测空间满足属性Y╨X | PδX集成多种类型的数据,我们提出和实施一个新的综合足够的降维方法扩展先生[3],称为综合先生。主要的思想是,我们同时考虑所有multi-omics数据信息时发现每个数据类型的基础矩阵与一些共享潜在结构。最后得到比原始数据维数小得多的d维数据。通过交叉验证实现降维d。为了展示多组学数据的综合分析,我们应用并比较了传统SIR和综合SIR来分析NCI-60面板中一部分细胞系的MRNA、MiRNA和蛋白质组学表达谱。所使用的数据取自[6]。我们必须将结果分类为中枢神经系统,白血病和黑色素瘤肿瘤类型。我们以高方差标准从每种数据类型中预先筛选了400个变量。为了找出分类误差,我们对每种方法进行留一交叉验证后进行随机森林分类。结果表明,与传统的SIR方法相比,集成SIR方法的分类误差更小。综上所述,我们提出了一种新的集成SIR方法,一种用于多组学数据类型集成分析的监督降维技术。与传统的SDR方法不同,新方法可以同时降低多个组学数据的维数,同时在数据类型之间共享共同的潜在结构,而不会丢失任何预测信息。通过有效地捕获共同信息,我们的数值研究表明,与传统的SDR方法相比,集成SIR对肿瘤类型的分类更准确。
{"title":"Integrative Sufficient Dimension Reduction Methods for Multi-Omics Data Analysis","authors":"Yashita Jain, Shanshan Ding","doi":"10.1145/3107411.3108225","DOIUrl":"https://doi.org/10.1145/3107411.3108225","url":null,"abstract":"With the advent of high throughput genome-wide assays it has become possible to simultaneously measure multiple types of genomic data. Several projects like TCGA, ICGC, NCI-60 has generated comprehensive, multi-dimensional maps of the key genomic changes like MiRNA, MRNA, proteomics etc. from cancer samples[2,4]. These genomic data can be used for classifying tumour types[5]. Integrative analysis of these data from multiple sources can potentially provide additional biological insights, but methods to do any such analysis are lacking. One of the widely used solutions to handle high dimension data is by removing redundant information in the integrated sample. Most of the expressed genes are overlapped and can be projected onto lower dimension, and then be used to classify different tumor types, without the loss of any/much information. Sufficient dimension reduction (SDR) [1], a supervised dimension reduction approach, can be ideal to achieve such a goal. In this paper, we propose a novel integrative SDR method that can reduce dimensions of multiple data types simultaneously while sharing common latent structures to improve prediction and interpretation. In particular, we extend the sliced inverse regression (SIR) technique, a major SDR method, to integrate multiple omits data for simultaneous dimension reduction. SIR is a supervised dimension reduction method that assumes that the outcome variable Y depends on the predictor variable X through d unknown linear combinations of the predictor[3]. The predictor variable is replaced by its projection into a lower dimension subspace of the predictor space without the loss of information. The aim is to find the intersection of all the subspaces δ called the central susbspace (CS) of the predictor space satisfying the property Y ╨ X| Pδ X. To integrate multiple types of data, we propose and implement a new integrative sufficient dimension reduction method extending SIR[3], called integrative SIR. The main idea is that we take into account all the multi-omics data information simultaneously while finding a basis matrix for each data type with some sharing latent structures. Finally, we get d dimension data which is much smaller than the original data dimension. The reduced dimension d was achieved by cross validation. To demonstrate the integrated analysis of multi-omics data, we applied and compared conventional SIR and integrative SIR to analyze MRNA, MiRNA and proteomics expression profile of a subset of cell lines from the NCI-60 panel. The data used is taken from [6]. The outcomes we have to classify are CNS, Leukemia and Melanoma tumor types. We pre-screened 400 variables from each data type with the criteria of high variance. To find classification error, we performed random forest classification after we applied to each method with leave-one-out cross-validation. As a result, we found out that integrative SIR leads to less classification error as compared to conventional SIR. To summarize, we propo","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125180288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nirvana: Clinical Grade Variant Annotator 涅槃:临床级变异注释器
Michael P. Strömberg, R. Roy, J. Lajugie, Yu Jiang, Haochen Li, E. Margulies
Sequencing an individual genome typically produces approximately three million variants compared to the human reference genome. The consequence for each variant depends on the location and nature of the variant and is a key question for genetic analysts performing clinical diagnosis. Variant annotation describes how a variant affects the sample's genome. These annotations include the functional consequence on the different transcripts for a gene or in proximal regulatory regions. Annotation also includes additional data on what is known about a given variant that can help in understanding its relevance to a given line of investigation. Often this data is provided by different sources and contain allele frequencies for different populations, clinical implications, relevance to cancer types, additional studies, etc. Ultimately this information helps clinicians interpret variants when providing a diagnosis. The three most widely used open source annotation tools are VEP, SnpEff and AnnoVar. VEP is widely considered to be most accurate of the three, but is also slower than both SnpEff and AnnoVar. When annotating the variants from a 30x genome (NA12878), VEP finished in 18 hours whereas SnpEff 4.3g and AnnoVar finish in 15 min and 67 min respectively using one core. We present Nirvana, an open source clinical variant annotator, that is both accurate (over 99.9% concordance with VEP) and fast (takes 7 min to annotate NA12878). Nirvana is used in all of Illumina's relevant analysis pipelines and is tested rigorously to ensure adherence to clinical standards.
与人类参考基因组相比,单个基因组的测序通常会产生大约300万个变体。每种变异的结果取决于变异的位置和性质,这是遗传分析人员进行临床诊断的关键问题。变体注释描述了一个变体如何影响样本的基因组。这些注释包括对基因或近端调控区域的不同转录本的功能后果。注释还包括关于给定变体的已知情况的附加数据,这些数据可以帮助理解其与给定调查线的相关性。这些数据通常来自不同的来源,包含不同人群的等位基因频率、临床意义、与癌症类型的相关性、其他研究等。最终,这些信息有助于临床医生在提供诊断时解释变异。三个最广泛使用的开源注释工具是VEP、SnpEff和AnnoVar。VEP被普遍认为是三者中最准确的,但也比SnpEff和AnnoVar慢。当对来自30x基因组(NA12878)的变体进行注释时,VEP在18小时内完成,而SnpEff 4.3g和AnnoVar分别在15分钟和67分钟内完成。我们介绍了Nirvana,一个开源的临床变异注释器,它既准确(与VEP的一致性超过99.9%)又快速(注释NA12878需要7分钟)。Nirvana用于Illumina所有相关的分析管道,并经过严格的测试,以确保符合临床标准。
{"title":"Nirvana: Clinical Grade Variant Annotator","authors":"Michael P. Strömberg, R. Roy, J. Lajugie, Yu Jiang, Haochen Li, E. Margulies","doi":"10.1145/3107411.3108204","DOIUrl":"https://doi.org/10.1145/3107411.3108204","url":null,"abstract":"Sequencing an individual genome typically produces approximately three million variants compared to the human reference genome. The consequence for each variant depends on the location and nature of the variant and is a key question for genetic analysts performing clinical diagnosis. Variant annotation describes how a variant affects the sample's genome. These annotations include the functional consequence on the different transcripts for a gene or in proximal regulatory regions. Annotation also includes additional data on what is known about a given variant that can help in understanding its relevance to a given line of investigation. Often this data is provided by different sources and contain allele frequencies for different populations, clinical implications, relevance to cancer types, additional studies, etc. Ultimately this information helps clinicians interpret variants when providing a diagnosis. The three most widely used open source annotation tools are VEP, SnpEff and AnnoVar. VEP is widely considered to be most accurate of the three, but is also slower than both SnpEff and AnnoVar. When annotating the variants from a 30x genome (NA12878), VEP finished in 18 hours whereas SnpEff 4.3g and AnnoVar finish in 15 min and 67 min respectively using one core. We present Nirvana, an open source clinical variant annotator, that is both accurate (over 99.9% concordance with VEP) and fast (takes 7 min to annotate NA12878). Nirvana is used in all of Illumina's relevant analysis pipelines and is tested rigorously to ensure adherence to clinical standards.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133498709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analysis of Single Cells on a Pseudotime Scale along Postnatal Pancreatic Beta Cell Development 出生后胰腺β细胞发育过程中单细胞伪时间尺度的分析
F. Mulas, Chun Zeng, Yinghui Sui, Tiffany Guan, Nathanael Miller, Yuliang Tan, Fenfen Liu, Wen Jin, Andrea C. Carrano, M. Huising, O. Shirihai, Gene W. Yeo, M. Sander
Single-cell RNA-seq generates gene expression profiles of individual cells and has furthered our understanding of the developmental and cellular hierarchy within complex tissues. One computational challenge in analyzing single-cell data sets is reconstructing the progression of individual cells with respect to the gradual transition of their transcriptomes. While a number of single-cell ordering tools have been proposed, many of these require knowledge of progression markers or time delineators. Here, we adapted an algorithm previously developed for temporally ordering bulk microarray samples [1] to reconstruct the developmental trajectory of pancreatic beta-cells postnatally. To accomplish this, we applied a multi-step pipeline to analyze single-cell RNA-seq data sets from isolated beta-cells at five different time points between birth and post-weaning. Specifically, we i) ordered cells along a linear trajectory (the Pseudotime Scale) by applying one-dimensional principal component analysis to the normalized data matrix; ii) identified annotated and de-novo gene sets significantly regulated along the trajectory; iii) built a network of top-regulated genes using protein interaction repositories; and iv) scored genes for their network connectivity to transcription factors [2]. A systematic comparison showed that our approach was more accurate in correctly ordering cells for our data set than previously reported methods and allowed for direct comparisons with external data sets. Importantly, our analysis revealed never before seen changes in beta-cell metabolism and in levels of mitochondrial reactive oxygen species. We demonstrated experimentally a role for these changes in the regulation of postnatal beta-cell proliferation. Our pipeline identified maturation-related changes in gene expression not captured when evaluating bulk gene expression data across the developmental time course. The proposed methodology has a broad applicability beyond the context here described and could be used to examine the trajectory of other single cell types along a continuous course of cell state changes.
单细胞RNA-seq生成单个细胞的基因表达谱,并进一步加深了我们对复杂组织中发育和细胞层次结构的理解。分析单细胞数据集的一个计算挑战是重建相对于其转录组的逐渐转变的单个细胞的进展。而许多单细胞订购工具提出了许多需要知识的进展标记或时间描写的人。在这里,我们采用了先前开发的一种算法,用于临时订购大量微阵列样本[1],以重建出生后胰腺β细胞的发育轨迹。为了实现这一目标,我们应用了一个多步骤管道来分析从出生到断奶后五个不同时间点分离的β细胞的单细胞RNA-seq数据集。具体来说,我们i)通过对归一化数据矩阵应用一维主成分分析,沿线性轨迹(伪时间尺度)对细胞进行排序;Ii)鉴定出沿轨迹显著调控的注释和去novo基因集;Iii)利用蛋白质相互作用库构建了顶级调控基因网络;和iv)得到的基因转录因子[2]的网络连接。系统比较表明,我们的方法在正确排序数据集的单元格方面比以前报道的方法更准确,并允许与外部数据集进行直接比较。重要的是,我们的分析显示从未见过胰腺β-细胞代谢的改变,线粒体活性氧的水平。我们通过实验证明了这些变化在出生后β细胞增殖调节中的作用。我们的研究管道确定了在整个发育过程中评估大量基因表达数据时未捕获的基因表达的成熟相关变化。所提出的方法具有广泛的适用性,超出了这里所描述的上下文,可以用来检查沿着细胞状态变化的连续过程中其他单细胞类型的轨迹。
{"title":"Analysis of Single Cells on a Pseudotime Scale along Postnatal Pancreatic Beta Cell Development","authors":"F. Mulas, Chun Zeng, Yinghui Sui, Tiffany Guan, Nathanael Miller, Yuliang Tan, Fenfen Liu, Wen Jin, Andrea C. Carrano, M. Huising, O. Shirihai, Gene W. Yeo, M. Sander","doi":"10.1145/3107411.3107458","DOIUrl":"https://doi.org/10.1145/3107411.3107458","url":null,"abstract":"Single-cell RNA-seq generates gene expression profiles of individual cells and has furthered our understanding of the developmental and cellular hierarchy within complex tissues. One computational challenge in analyzing single-cell data sets is reconstructing the progression of individual cells with respect to the gradual transition of their transcriptomes. While a number of single-cell ordering tools have been proposed, many of these require knowledge of progression markers or time delineators. Here, we adapted an algorithm previously developed for temporally ordering bulk microarray samples [1] to reconstruct the developmental trajectory of pancreatic beta-cells postnatally. To accomplish this, we applied a multi-step pipeline to analyze single-cell RNA-seq data sets from isolated beta-cells at five different time points between birth and post-weaning. Specifically, we i) ordered cells along a linear trajectory (the Pseudotime Scale) by applying one-dimensional principal component analysis to the normalized data matrix; ii) identified annotated and de-novo gene sets significantly regulated along the trajectory; iii) built a network of top-regulated genes using protein interaction repositories; and iv) scored genes for their network connectivity to transcription factors [2]. A systematic comparison showed that our approach was more accurate in correctly ordering cells for our data set than previously reported methods and allowed for direct comparisons with external data sets. Importantly, our analysis revealed never before seen changes in beta-cell metabolism and in levels of mitochondrial reactive oxygen species. We demonstrated experimentally a role for these changes in the regulation of postnatal beta-cell proliferation. Our pipeline identified maturation-related changes in gene expression not captured when evaluating bulk gene expression data across the developmental time course. The proposed methodology has a broad applicability beyond the context here described and could be used to examine the trajectory of other single cell types along a continuous course of cell state changes.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115132503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Rigidity Properties of Protein Cavities 研究蛋白质空腔的刚性特性
Stephanie Mason, T. Woods, B. Chen, F. Jagodzinski
Cavities in proteins facilitate a variety of biochemical processes. The shapes and sizes of cavities are factors that contribute to specificity in ligand binding, and docking with other biomolecules. A deep understanding of cavity properties may enable new insights into protein-protein interactions, ligand binding, and structure-based drug design studies. In this work we explore how biological properties such as size and residue membership of protein cavities correlate with the flexibility of the cavity as computed using an efficient graph theoretic rigidity algorithm. We hypothesize that various rigidity properties of protein cavities are dependent on cavity surface area. In this work we enumerate a set of cavity rigidity metrics, and demonstrate their use in characterizing over 120,000 cavities from approximately 2,500 chains. We show that cavity size indeed does correlate with some -- but not all -- cavity rigidity metrics.
蛋白质中的空腔促进了多种生化过程。空腔的形状和大小是决定配体结合和与其他生物分子对接特异性的因素。对空腔特性的深入了解可以为蛋白质-蛋白质相互作用、配体结合和基于结构的药物设计研究提供新的见解。在这项工作中,我们探讨了生物特性,如蛋白质空腔的大小和残留成员如何与使用有效的图论刚性算法计算的空腔的灵活性相关。我们假设蛋白质空腔的各种刚性特性依赖于空腔的表面积。在这项工作中,我们列举了一组腔刚度指标,并展示了它们在表征约2,500个链中超过120,000个腔中的使用。我们表明,空腔尺寸确实与一些(但不是全部)空腔刚度指标相关。
{"title":"Investigating Rigidity Properties of Protein Cavities","authors":"Stephanie Mason, T. Woods, B. Chen, F. Jagodzinski","doi":"10.1145/3107411.3107502","DOIUrl":"https://doi.org/10.1145/3107411.3107502","url":null,"abstract":"Cavities in proteins facilitate a variety of biochemical processes. The shapes and sizes of cavities are factors that contribute to specificity in ligand binding, and docking with other biomolecules. A deep understanding of cavity properties may enable new insights into protein-protein interactions, ligand binding, and structure-based drug design studies. In this work we explore how biological properties such as size and residue membership of protein cavities correlate with the flexibility of the cavity as computed using an efficient graph theoretic rigidity algorithm. We hypothesize that various rigidity properties of protein cavities are dependent on cavity surface area. In this work we enumerate a set of cavity rigidity metrics, and demonstrate their use in characterizing over 120,000 cavities from approximately 2,500 chains. We show that cavity size indeed does correlate with some -- but not all -- cavity rigidity metrics.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124760158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bias and Noise Cancellation for Robust Copy Number Variation Detection 鲁棒拷贝数变异检测中的偏差和噪声消除
Fatima Zare, Sardar Ansari, K. Najarian, S. Nabavi
High-throughput next generation sequencing (NGS) technologies have created an opportunity for detecting copy number variations (CNVs) more accurately. In this work, we introduce a novel preprocessing pipeline to improve the detection accuracy of CNVs in heterogeneous NGS data such as cancer whole exome sequencing data. We employ several normalizations to reduce biases due to GC contents, mappability and tumor contamination.We also utilize the Taut String method as an efficient effective smoothing approach to reduce noise.
高通量下一代测序(NGS)技术为更准确地检测拷贝数变异(CNVs)创造了机会。在这项工作中,我们引入了一种新的预处理管道,以提高异质NGS数据(如癌症全外显子组测序数据)中CNVs的检测精度。我们采用了几种归一化来减少由于GC含量、可映射性和肿瘤污染造成的偏差。我们还利用紧弦方法作为一种有效的平滑方法来降低噪声。
{"title":"Bias and Noise Cancellation for Robust Copy Number Variation Detection","authors":"Fatima Zare, Sardar Ansari, K. Najarian, S. Nabavi","doi":"10.1145/3107411.3108199","DOIUrl":"https://doi.org/10.1145/3107411.3108199","url":null,"abstract":"High-throughput next generation sequencing (NGS) technologies have created an opportunity for detecting copy number variations (CNVs) more accurately. In this work, we introduce a novel preprocessing pipeline to improve the detection accuracy of CNVs in heterogeneous NGS data such as cancer whole exome sequencing data. We employ several normalizations to reduce biases due to GC contents, mappability and tumor contamination.We also utilize the Taut String method as an efficient effective smoothing approach to reduce noise.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121587197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1