首页 > 最新文献

Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )最新文献

英文 中文
Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. 用于低温电子显微图像中粒子拾取的正未标记卷积神经网络。
Tristan Bepler, Andrew Morin, Alex J Noble, Julia Brasch, Lawrence Shapiro, Bonnie Berger
{"title":"Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs.","authors":"Tristan Bepler, Andrew Morin, Alex J Noble, Julia Brasch, Lawrence Shapiro, Bonnie Berger","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"10812 ","pages":"245-247"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5917602/pdf/nihms959799.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36055249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Variable Model for Aligning Barcoded Short-Reads Improves Downstream Analyses. 利用潜在变量模型对条形码短读段进行比对,改善下游分析。
Ariya Shajii, Ibrahim Numanagić, Bonnie Berger
{"title":"Latent Variable Model for Aligning Barcoded Short-Reads Improves Downstream Analyses.","authors":"Ariya Shajii, Ibrahim Numanagić, Bonnie Berger","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"10812 ","pages":"280-282"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5989713/pdf/nihms968810.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36210353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive calibration and averaging for tandem mass spectrometry statistical confidence estimation: Why settle for a single decoy? 串联质谱统计置信估计的渐进校准和平均:为什么满足于单一诱饵?
Uri Keich, William Stafford Noble

Estimating the false discovery rate (FDR) among a list of tandem mass spectrum identifications is mostly done through target-decoy competition (TDC). Here we offer two new methods that can use an arbitrarily small number of additional randomly drawn decoy databases to improve TDC. Specifically, "Partial Calibration" utilizes a new meta-scoring scheme that allows us to gradually benefit from the increase in the number of identifications calibration yields and "Averaged TDC" (a-TDC) reduces the liberal bias of TDC for small FDR values and its variability throughout. Combining a-TDC with "Progressive Calibration" (PC), which attempts to find the "right" number of decoys required for calibration we see substantial impact in real datasets: when analyzing the Plasmodium falciparum data it typically yields almost the entire 17% increase in discoveries that "full calibration" yields (at FDR level 0.05) using 60 times fewer decoys. Our methods are further validated using a novel realistic simulation scheme and importantly, they apply more generally to the problem of controlling the FDR among discoveries from searching an incomplete database.

在一系列串联质谱识别中,估计错误发现率(FDR)主要是通过目标-诱饵竞争(TDC)来实现的。在这里,我们提供了两种新的方法,可以使用任意少量的额外随机抽取的诱饵数据库来提高TDC。具体而言,“部分校准”采用了一种新的元评分方案,使我们能够逐渐受益于鉴定校准产量数量的增加,而“平均TDC”(a-TDC)减少了TDC对小FDR值及其整个变异性的自由偏差。将a-TDC与“渐进校准”(PC)相结合,它试图找到校准所需的“正确”诱饵数量,我们看到了实际数据集的重大影响:在分析恶性疟原虫数据时,它通常会产生几乎17%的发现增长,而“完全校准”的产量(在FDR水平0.05)使用60倍的诱饵。我们的方法通过一种新颖的现实仿真方案得到了进一步验证,重要的是,它们更普遍地适用于在搜索不完整数据库的发现中控制FDR的问题。
{"title":"Progressive calibration and averaging for tandem mass spectrometry statistical confidence estimation: Why settle for a single decoy?","authors":"Uri Keich,&nbsp;William Stafford Noble","doi":"10.1007/978-3-319-56970-3_7","DOIUrl":"https://doi.org/10.1007/978-3-319-56970-3_7","url":null,"abstract":"<p><p>Estimating the false discovery rate (FDR) among a list of tandem mass spectrum identifications is mostly done through target-decoy competition (TDC). Here we offer two new methods that can use an arbitrarily small number of additional randomly drawn decoy databases to improve TDC. Specifically, \"Partial Calibration\" utilizes a new meta-scoring scheme that allows us to gradually benefit from the increase in the number of identifications calibration yields and \"Averaged TDC\" (a-TDC) reduces the liberal bias of TDC for small FDR values and its variability throughout. Combining a-TDC with \"Progressive Calibration\" (PC), which attempts to find the \"right\" number of decoys required for calibration we see substantial impact in real datasets: when analyzing the <i>Plasmodium falciparum</i> data it typically yields almost the entire 17% increase in discoveries that \"full calibration\" yields (at FDR level 0.05) using 60 times fewer decoys. Our methods are further validated using a novel realistic simulation scheme and importantly, they apply more generally to the problem of controlling the FDR among discoveries from searching an incomplete database.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"10229 ","pages":"99-116"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-56970-3_7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35730804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Resolving multicopy duplications de novo using polyploid phasing. 利用多倍体分期从头解决多拷贝重复问题。
Mark J Chaisson, Sudipto Mukherjee, Sreeram Kannan, Evan E Eichler

While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian de novo assemblies are rarely identical; after a sequence is duplicated, it begins to acquire paralog specific variants. In this paper, we study the problem of resolving the variations in multicopy long-segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using discrete matrix completion. The second algorithm is based on correlation clustering and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology, and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on an average 7.0 haplotypes in 10-copy duplication data-sets whereas existing algorithms reconstruct less than 1 copy on average.

虽然单分子测序系统的兴起使基因组复杂区域的组装能力得到了前所未有的提高,但基因组中长的节段重复序列仍然是组装中极具挑战性的前沿领域。片段重复既有丰富的基因,又容易发生大的结构重排,因此对其序列的解析在医学和进化研究中非常重要。在哺乳动物的从头组装中被拼接的重复序列很少是完全相同的;一个序列被复制后,它开始获得旁表型的特定变异。在本文中,我们通过开发和利用多倍体分期算法,研究了如何解决多拷贝长片段复制中的变异问题。我们开发了两种算法:第一种算法的目标是利用离散矩阵补全法最大限度地提高在给定底层单倍型的情况下观察读数的可能性。第二种算法基于相关性聚类,并利用了一个假设,即每个旁表型都有相当数量的旁表型特异变体,而这一假设在这些复制中经常得到满足。我们制定了详细的模拟方法,并在一系列模拟数据集上证明了所提算法的卓越性能。我们测量了似然得分和重建准确性,即正确聚类的读数比例。我们发现,在这两项性能指标上,我们的算法在 93% 以上的数据集上都优于现有算法。离散矩阵完成法在似然比得分上表现更好,而相关聚类算法由于其固有的更强正则化功能,在重建准确性上表现更好。我们还表明,我们的相关聚类算法平均可以在 10 个拷贝的重复数据集中重建 7.0 个单倍型,而现有算法平均只能重建不到 1 个拷贝。
{"title":"Resolving multicopy duplications <i>de novo</i> using polyploid phasing.","authors":"Mark J Chaisson, Sudipto Mukherjee, Sreeram Kannan, Evan E Eichler","doi":"10.1007/978-3-319-56970-3_8","DOIUrl":"10.1007/978-3-319-56970-3_8","url":null,"abstract":"<p><p>While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian <i>de novo</i> assemblies are rarely identical; after a sequence is duplicated, it begins to acquire <i>paralog specific variants</i>. In this paper, we study the problem of resolving the variations in multicopy long-segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using <i>discrete matrix completion</i>. The second algorithm is based on <i>correlation clustering</i> and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology, and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on an average 7.0 haplotypes in 10-copy duplication data-sets whereas existing algorithms reconstruct less than 1 copy on average.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"10229 ","pages":"117-133"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5553120/pdf/nihms883111.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35321571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joker de Bruijn: Sequence Libraries to Cover All k-mers Using Joker Characters. Joker de Bruijn:序列库覆盖所有使用Joker字符的k-mers。
Yaron Orenstein, Ryan Kim, Polly Fordyce, Bonnie Berger
{"title":"Joker de Bruijn: Sequence Libraries to Cover All <i>k</i>-mers Using Joker Characters.","authors":"Yaron Orenstein,&nbsp;Ryan Kim,&nbsp;Polly Fordyce,&nbsp;Bonnie Berger","doi":"10.1007/978-3-319-56970-3","DOIUrl":"https://doi.org/10.1007/978-3-319-56970-3","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"10229 ","pages":"389-390"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-56970-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36055248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Quantifying the Impact of Non-coding Variants on Transcription Factor-DNA Binding. 量化非编码变异对转录因子- dna结合的影响。
Jingkang Zhao, Dongshunyi Li, Jungkyun Seo, Andrew S Allen, Raluca Gordân

Many recent studies have emphasized the importance of genetic variants and mutations in cancer and other complex human diseases. The overwhelming majority of these variants occur in non-coding portions of the genome, where they can have a functional impact by disrupting regulatory interactions between transcription factors (TFs) and DNA. Here, we present a method for assessing the impact of non-coding mutations on TF-DNA interactions, based on regression models of DNA-binding specificity trained on high-throughput in vitro data. We use ordinary least squares (OLS) to estimate the parameters of the binding model for each TF, and we show that our predictions of TF-binding changes due to DNA mutations correlate well with measured changes in gene expression. In addition, by leveraging distributional results associated with OLS estimation, for each predicted change in TF binding we also compute a normalized score (z-score) and a significance value (p-value) reflecting our confidence that the mutation affects TF binding. We use this approach to analyze a large set of pathogenic non-coding variants, and we show that these variants lead to significant differences in TF binding between alleles, compared to a control set of common variants. Thus, our results indicate that there is a strong regulatory component to the pathogenic non-coding variants identified thus far.

最近的许多研究都强调了基因变异和突变在癌症和其他复杂人类疾病中的重要性。这些变异绝大多数发生在基因组的非编码部分,在那里它们可以通过破坏转录因子(TFs)和DNA之间的调节相互作用而产生功能影响。在这里,我们提出了一种评估非编码突变对TF-DNA相互作用影响的方法,该方法基于高通量体外数据训练的dna结合特异性回归模型。我们使用普通最小二乘(OLS)来估计每个TF结合模型的参数,并表明我们对DNA突变引起的TF结合变化的预测与测量到的基因表达变化具有良好的相关性。此外,通过利用与OLS估计相关的分布结果,对于每个预测的TF结合变化,我们还计算了一个标准化得分(z-score)和一个显著性值(p-value),反映了我们对突变影响TF结合的信心。我们使用这种方法分析了一大组致病的非编码变异体,结果表明,与一组普通变异体相比,这些变异体导致等位基因之间的TF结合存在显著差异。因此,我们的研究结果表明,迄今为止鉴定出的致病性非编码变异有很强的调控成分。
{"title":"Quantifying the Impact of Non-coding Variants on Transcription Factor-DNA Binding.","authors":"Jingkang Zhao,&nbsp;Dongshunyi Li,&nbsp;Jungkyun Seo,&nbsp;Andrew S Allen,&nbsp;Raluca Gordân","doi":"10.1007/978-3-319-56970-3_21","DOIUrl":"https://doi.org/10.1007/978-3-319-56970-3_21","url":null,"abstract":"<p><p>Many recent studies have emphasized the importance of genetic variants and mutations in cancer and other complex human diseases. The overwhelming majority of these variants occur in non-coding portions of the genome, where they can have a functional impact by disrupting regulatory interactions between transcription factors (TFs) and DNA. Here, we present a method for assessing the impact of non-coding mutations on TF-DNA interactions, based on regression models of DNA-binding specificity trained on high-throughput <i>in vitro</i> data. We use ordinary least squares (OLS) to estimate the parameters of the binding model for each TF, and we show that our predictions of TF-binding changes due to DNA mutations correlate well with measured changes in gene expression. In addition, by leveraging distributional results associated with OLS estimation, for each predicted change in TF binding we also compute a normalized score (<i>z</i>-score) and a significance value (<i>p</i>-value) reflecting our confidence that the mutation affects TF binding. We use this approach to analyze a large set of pathogenic non-coding variants, and we show that these variants lead to significant differences in TF binding between alleles, compared to a control set of common variants. Thus, our results indicate that there is a strong regulatory component to the pathogenic non-coding variants identified thus far.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"10229 ","pages":"336-352"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-56970-3_21","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35155832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Longitudinal Genotype-Phenotype Association Study via Temporal Structure Auto-Learning Predictive Model. 基于时间结构自动学习预测模型的纵向基因型-表型关联研究。
Xiaoqian Wang, Jingwen Yan, Xiaohui Yao, Sungeun Kim, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Li Shen, Heng Huang

With rapid progress in high-throughput genotyping and neuroimaging, imaging genetics has gained significant attention in the research of complex brain disorders, such as Alzheimer's Disease (AD). The genotype-phenotype association study using imaging genetic data has the potential to reveal genetic basis and biological mechanism of brain structure and function. AD is a progressive neurodegenerative disease, thus, it is crucial to look into the relations between SNPs and longitudinal variations of neuroimaging phenotypes. Although some machine learning models were newly presented to capture the longitudinal patterns in genotype-phenotype association study, most of them required fixed longitudinal structures of prediction tasks and could not automatically learn the interrelations among longitudinal prediction tasks. To address this challenge, we proposed a novel temporal structure auto-learning model to automatically uncover longitudinal genotype-phenotype interrelations and utilized such interrelated structures to enhance phenotype prediction in the meantime. We conducted longitudinal phenotype prediction experiments on the ADNI cohort including 3,123 SNPs and 2 types of biomarkers, VBM and FreeSurfer. Empirical results demonstrated advantages of our proposed model over the counterparts. Moreover, available literature was identified for our top selected SNPs, which demonstrated the rationality of our prediction results. An executable program is available online at https://github.com/littleq1991/sparse_lowRank_regression.

随着高通量基因分型和神经影像学的快速发展,成像遗传学在阿尔茨海默病(AD)等复杂脑部疾病的研究中受到了极大的关注。利用影像学遗传数据进行基因型-表型关联研究,有可能揭示大脑结构和功能的遗传基础和生物学机制。AD是一种进行性神经退行性疾病,因此,研究snp与神经影像学表型纵向变异的关系至关重要。虽然近年来出现了一些机器学习模型来捕捉基因型-表型关联研究中的纵向模式,但大多数模型都需要固定的预测任务的纵向结构,不能自动学习纵向预测任务之间的相互关系。为了解决这一挑战,我们提出了一种新的时间结构自动学习模型来自动揭示纵向基因型-表型相互关系,并利用这些相互关联的结构来增强表型预测。我们对ADNI队列进行了纵向表型预测实验,包括3123个snp和2种生物标志物,VBM和FreeSurfer。实证结果表明,我们提出的模型优于同类模型。此外,对我们选择的最佳snp进行了现有文献鉴定,这证明了我们预测结果的合理性。可执行程序可在https://github.com/littleq1991/sparse_lowRank_regression上在线获得。
{"title":"Longitudinal Genotype-Phenotype Association Study via Temporal Structure Auto-Learning Predictive Model.","authors":"Xiaoqian Wang, Jingwen Yan, Xiaohui Yao, Sungeun Kim, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Li Shen, Heng Huang","doi":"10.1007/978-3-319-56970-3_18","DOIUrl":"10.1007/978-3-319-56970-3_18","url":null,"abstract":"<p><p>With rapid progress in high-throughput genotyping and neuroimaging, imaging genetics has gained significant attention in the research of complex brain disorders, such as Alzheimer's Disease (AD). The genotype-phenotype association study using imaging genetic data has the potential to reveal genetic basis and biological mechanism of brain structure and function. AD is a progressive neurodegenerative disease, thus, it is crucial to look into the relations between SNPs and longitudinal variations of neuroimaging phenotypes. Although some machine learning models were newly presented to capture the longitudinal patterns in genotype-phenotype association study, most of them required fixed longitudinal structures of prediction tasks and could not automatically learn the interrelations among longitudinal prediction tasks. To address this challenge, we proposed a novel temporal structure auto-learning model to automatically uncover longitudinal genotype-phenotype interrelations and utilized such interrelated structures to enhance phenotype prediction in the meantime. We conducted longitudinal phenotype prediction experiments on the ADNI cohort including 3,123 SNPs and 2 types of biomarkers, VBM and FreeSurfer. Empirical results demonstrated advantages of our proposed model over the counterparts. Moreover, available literature was identified for our top selected SNPs, which demonstrated the rationality of our prediction results. An executable program is available online at https://github.com/littleq1991/sparse_lowRank_regression.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"10229 ","pages":"287-302"},"PeriodicalIF":0.0,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-56970-3_18","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36044454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A concurrent subtractive assembly approach for identification of disease associated sub-metagenomes. 一种用于疾病相关亚宏基因组鉴定的并行减法组装方法。
Wontack Han, Mingjie Wang, Yuzhen Ye

Comparative analysis of metagenomes can be used to detect sub-metagenomes (species or gene sets) that are associated with specific phenotypes (e.g., host status). The typical workflow is to assemble and annotate metagenomic datasets individually or as a whole, followed by statistical tests to identify differentially abundant species/genes. We previously developed subtractive assembly (SA), a de novo assembly approach for comparative metagenomics that first detects differential reads that distinguish between two groups of metagenomes and then only assembles these reads. Application of SA to type 2 diabetes (T2D) microbiomes revealed new microbial genes associated with T2D. Here we further developed a Concurrent Subtractive Assembly (CoSA) approach, which uses a Wilcoxon rank-sum (WRS) test to detect k-mers that are differentially abundant between two groups of microbiomes (by contrast, SA only checks ratios of k-mer counts in one pooled sample versus the other). It then uses identified differential k-mers to extract reads that are likely sequenced from the sub-metagenome with consistent abundance differences between the groups of microbiomes. Further, CoSA attempts to reduce the redundancy of reads (from abundant common species) by excluding reads containing abundant k-mers. Using simulated microbiome datasets and T2D datasets, we show that CoSA achieves strikingly better performance in detecting consistent changes than SA does, and it enables the detection and assembly of genomes and genes with minor abundance difference. A SVM classifier built upon the microbial genes detected by CoSA from the T2D datasets can accurately discriminates patients from healthy controls, with an AUC of 0.94 (10-fold cross-validation), and therefore these differential genes (207 genes) may serve as potential microbial marker genes for T2D.

宏基因组的比较分析可用于检测与特定表型(例如,宿主状态)相关的亚宏基因组(物种或基因集)。典型的工作流程是单独或整体组装和注释宏基因组数据集,然后进行统计测试以识别差异丰富的物种/基因。我们之前开发了减法组装(SA),这是一种用于比较宏基因组学的全新组装方法,首先检测区分两组宏基因组的差异读取,然后仅组装这些读取。SA在2型糖尿病(T2D)微生物组中的应用揭示了与T2D相关的新的微生物基因。在这里,我们进一步开发了一种并发减法组装(CoSA)方法,该方法使用Wilcoxon秩和(WRS)测试来检测两组微生物组之间差异丰富的k-mer(相比之下,SA仅检查一个汇集样本中k-mer计数与另一个样本的比率)。然后,它使用鉴定的差异k-mers提取可能从亚宏基因组测序的读数,这些读数在微生物组之间具有一致的丰度差异。此外,CoSA试图通过排除含有丰富k-mers的读取来减少读取(来自丰富的常见物种)的冗余。使用模拟微生物组数据集和T2D数据集,我们发现CoSA在检测一致性变化方面的性能明显优于SA,并且它可以检测和组装具有较小丰度差异的基因组和基因。基于CoSA从T2D数据集中检测到的微生物基因构建的SVM分类器可以准确区分患者和健康对照,AUC为0.94(10倍交叉验证),因此这些差异基因(207个基因)可能作为T2D的潜在微生物标记基因。
{"title":"A concurrent subtractive assembly approach for identification of disease associated sub-metagenomes.","authors":"Wontack Han,&nbsp;Mingjie Wang,&nbsp;Yuzhen Ye","doi":"10.1007/978-3-319-56970-3_2","DOIUrl":"https://doi.org/10.1007/978-3-319-56970-3_2","url":null,"abstract":"<p><p>Comparative analysis of metagenomes can be used to detect sub-metagenomes (species or gene sets) that are associated with specific phenotypes (e.g., host status). The typical workflow is to assemble and annotate metagenomic datasets individually or as a whole, followed by statistical tests to identify differentially abundant species/genes. We previously developed subtractive assembly (SA), a <i>de novo</i> assembly approach for comparative metagenomics that first detects differential reads that distinguish between two groups of metagenomes and then only assembles these reads. Application of SA to type 2 diabetes (T2D) microbiomes revealed new microbial genes associated with T2D. Here we further developed a Concurrent Subtractive Assembly (CoSA) approach, which uses a Wilcoxon rank-sum (WRS) test to detect k-mers that are differentially abundant between two groups of microbiomes (by contrast, SA only checks ratios of k-mer counts in one pooled sample versus the other). It then uses identified differential k-mers to extract reads that are likely sequenced from the sub-metagenome with consistent abundance differences between the groups of microbiomes. Further, CoSA attempts to reduce the redundancy of reads (from abundant common species) by excluding reads containing abundant k-mers. Using simulated microbiome datasets and T2D datasets, we show that CoSA achieves strikingly better performance in detecting consistent changes than SA does, and it enables the detection and assembly of genomes and genes with minor abundance difference. A SVM classifier built upon the microbial genes detected by CoSA from the T2D datasets can accurately discriminates patients from healthy controls, with an AUC of 0.94 (10-fold cross-validation), and therefore these differential genes (207 genes) may serve as potential microbial marker genes for T2D.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"2017 ","pages":"18-33"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-56970-3_2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35640983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks. 扩散成分分析:揭示生物网络中的功能拓扑。
Hyunghoon Cho, Bonnie Berger, Jian Peng
{"title":"Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks.","authors":"Hyunghoon Cho,&nbsp;Bonnie Berger,&nbsp;Jian Peng","doi":"10.1007/978-3-319-16706-0_9","DOIUrl":"https://doi.org/10.1007/978-3-319-16706-0_9","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"9029 ","pages":"62-64"},"PeriodicalIF":0.0,"publicationDate":"2015-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-16706-0_9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35204924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
HapTree-X: An Integrative Bayesian Framework for Haplotype Reconstruction from Transcriptome and Genome Sequencing Data. HapTree-X:基于转录组和基因组测序数据的单倍型重建整合贝叶斯框架。
Emily Berger, Deniz Yorukoglu, Bonnie Berger
{"title":"HapTree-X: An Integrative Bayesian Framework for Haplotype Reconstruction from Transcriptome and Genome Sequencing Data.","authors":"Emily Berger,&nbsp;Deniz Yorukoglu,&nbsp;Bonnie Berger","doi":"10.1007/978-3-319-16706-0_4","DOIUrl":"https://doi.org/10.1007/978-3-319-16706-0_4","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"9029 ","pages":"28-29"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-16706-0_4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35427239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1