Johan Chrisnata;Han Mao Kiah;Alexander Vardy;Eitan Yaakobi
{"title":"Bee Identification Problem for DNA Strands","authors":"Johan Chrisnata;Han Mao Kiah;Alexander Vardy;Eitan Yaakobi","doi":"10.1109/JSAIT.2023.3294423","DOIUrl":null,"url":null,"abstract":"Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all <inline-formula> <tex-math notation=\"LaTeX\">$M$ </tex-math></inline-formula> codewords from a codebook over some channel and each codeword results in <inline-formula> <tex-math notation=\"LaTeX\">$N$ </tex-math></inline-formula> noisy outputs. Then our task is to identify each codeword from this unordered set of <inline-formula> <tex-math notation=\"LaTeX\">$MN$ </tex-math></inline-formula> noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in <inline-formula> <tex-math notation=\"LaTeX\">$O(M^{3})$ </tex-math></inline-formula> time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is <inline-formula> <tex-math notation=\"LaTeX\">$o(M^{3})$ </tex-math></inline-formula>. For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"190-204"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10179132/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all $M$ codewords from a codebook over some channel and each codeword results in $N$ noisy outputs. Then our task is to identify each codeword from this unordered set of $MN$ noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in $O(M^{3})$ time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is $o(M^{3})$ . For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.