Optimized splitting of mixed-species RNA sequencing data.

IF 0.7 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Bioinformatics and Computational Biology Pub Date : 2022-04-01 Epub Date: 2022-01-06 DOI:10.1142/S0219720022500019

Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart

{"title":"Optimized splitting of mixed-species RNA sequencing data.","authors":"Xuan Song, Hai Yun Gao, Karl Herrup, Ronald P Hart","doi":"10.1142/S0219720022500019","DOIUrl":null,"url":null,"abstract":"<p><p>Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"20 2","pages":"2250001"},"PeriodicalIF":0.7000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9081140/pdf/nihms-1770823.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Bioinformatics and Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1142/S0219720022500019","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/6 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

混合物种 RNA 测序数据的优化分割。

事实证明，利用异种移植或共培养系统（通常是人鼠混合细胞）进行基因表达研究，对于揭示发育过程中或疾病模型中的细胞动态非常有价值。然而，物种间 mRNA 序列的相似性给转录本的精确定量带来了挑战。为了确定分析混合物种 RNA 测序数据的最佳策略，我们评估了依赖配准和不依赖配准的方法。将读数与集合参考索引进行比对是有效的，特别是如果使用最佳比对将测序读数按物种分类，然后与单个基因组重新比对，这样就能在一系列物种比例中产生[公式：见正文]准确性。独立于配准的方法，如卷积神经网络，提取两个物种序列的保守模式，对 RNA 测序读数进行分类的准确率超过 85%。重要的是，这两种方法在人类和小鼠读数比例不同的情况下表现良好。虽然非配准策略成功地按物种划分了读数，但事实证明，先混合基因组配准再优化分离读数的传统方法更成功，错误率更低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Bioinformatics and Computational Biology MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

2.10

自引率

0.00%

发文量

期刊介绍： The Journal of Bioinformatics and Computational Biology aims to publish high quality, original research articles, expository tutorial papers and review papers as well as short, critical comments on technical issues associated with the analysis of cellular information. The research papers will be technical presentations of new assertions, discoveries and tools, intended for a narrower specialist community. The tutorials, reviews and critical commentary will be targeted at a broader readership of biologists who are interested in using computers but are not knowledgeable about scientific computing, and equally, computer scientists who have an interest in biology but are not familiar with current thrusts nor the language of biology. Such carefully chosen tutorials and articles should greatly accelerate the rate of entry of these new creative scientists into the field.