Capacity of the Shotgun Sequencing Channel

2022 IEEE International Symposium on Information Theory (ISIT) Pub Date : 2022-06-26 DOI:10.1109/ISIT50566.2022.9834409

Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony

引用次数: 0

Abstract

Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in [1], is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this problem; i.e., the scenario in which the DNA molecule being sequenced is a codeword from a predefined codebook. Our main result is an exact characterization of the capacity of the resulting shotgun sequencing channel as a function of the read length and coverage depth. In particular, our results imply that while in the uncoded case, O(n) reads of length greater than 2logn are needed for reliable reconstruction of a length-n binary sequence, in the coded case, only O(n/log n) reads of length greater than log n are needed for the capacity to be arbitrarily close to 1.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

霰弹枪序列通道的容量

大多数DNA测序技术都是基于霰弹枪模式:许多短读是从DNA序列中随机未知的位置获得的。[1]研究的一个基本问题是，需要多大的读取长度和覆盖深度(即总读取次数)才能保证可靠的序列重建。基于dna存储的动机，我们研究了这个问题的编码版本;即，被测序的DNA分子是来自预定义码本的码字的场景。我们的主要结果是准确表征了由此产生的鸟枪测序通道的容量作为读取长度和覆盖深度的函数。特别是，我们的结果表明，在未编码情况下，为了可靠地重建长度为n的二进制序列，需要O(n)次长度大于2logn的读取，而在编码情况下，只需要O(n/log n)次长度大于logn的读取，容量就可以任意接近1。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Symposium on Information Theory (ISIT)

自引率

0.00%

发文量

期刊最新文献

Fast Low Rank column-wise Compressive Sensing Ternary Message Passing Decoding of RS-SPC Product Codes Understanding Deep Neural Networks Using Sliced Mutual Information Rate-Optimal Streaming Codes Over the Three-Node Decode-And-Forward Relay Network Unlimited Sampling via Generalized Thresholding