Johan Chrisnata;Han Mao Kiah;Alexander Vardy;Eitan Yaakobi
{"title":"蜜蜂DNA链的识别问题","authors":"Johan Chrisnata;Han Mao Kiah;Alexander Vardy;Eitan Yaakobi","doi":"10.1109/JSAIT.2023.3294423","DOIUrl":null,"url":null,"abstract":"Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all <inline-formula> <tex-math notation=\"LaTeX\">$M$ </tex-math></inline-formula> codewords from a codebook over some channel and each codeword results in <inline-formula> <tex-math notation=\"LaTeX\">$N$ </tex-math></inline-formula> noisy outputs. Then our task is to identify each codeword from this unordered set of <inline-formula> <tex-math notation=\"LaTeX\">$MN$ </tex-math></inline-formula> noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in <inline-formula> <tex-math notation=\"LaTeX\">$O(M^{3})$ </tex-math></inline-formula> time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is <inline-formula> <tex-math notation=\"LaTeX\">$o(M^{3})$ </tex-math></inline-formula>. For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"190-204"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bee Identification Problem for DNA Strands\",\"authors\":\"Johan Chrisnata;Han Mao Kiah;Alexander Vardy;Eitan Yaakobi\",\"doi\":\"10.1109/JSAIT.2023.3294423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all <inline-formula> <tex-math notation=\\\"LaTeX\\\">$M$ </tex-math></inline-formula> codewords from a codebook over some channel and each codeword results in <inline-formula> <tex-math notation=\\\"LaTeX\\\">$N$ </tex-math></inline-formula> noisy outputs. Then our task is to identify each codeword from this unordered set of <inline-formula> <tex-math notation=\\\"LaTeX\\\">$MN$ </tex-math></inline-formula> noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in <inline-formula> <tex-math notation=\\\"LaTeX\\\">$O(M^{3})$ </tex-math></inline-formula> time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is <inline-formula> <tex-math notation=\\\"LaTeX\\\">$o(M^{3})$ </tex-math></inline-formula>. For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.\",\"PeriodicalId\":73295,\"journal\":{\"name\":\"IEEE journal on selected areas in information theory\",\"volume\":\"4 \",\"pages\":\"190-204\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE journal on selected areas in information theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10179132/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10179132/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all $M$ codewords from a codebook over some channel and each codeword results in $N$ noisy outputs. Then our task is to identify each codeword from this unordered set of $MN$ noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in $O(M^{3})$ time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is $o(M^{3})$ . For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.