Pub Date : 2024-04-16DOI: 10.1109/TMBMC.2024.3388971
Ward J. P. Spee;Jos H. Weber
Recent advances in DNA data storage have attracted renewed attention towards deletion, insertion and substitution correcting codes. Compared to codes aimed at correcting either substitution errors or deletion and insertion (indel) errors, the understanding of codes that correct combinations of substitution and indel errors lags behind. In this paper, we focus on the maximal size of q-ary t-indel s-substitution correcting codes.Our main contributions include two Gilbert-Varshamov inspired lower bounds on this size. On the upper bound side, we prove a Singleton-like bound, a family of sphere-packing upper bounds and an integer linear programming bound. Several of these bounds are shown to improve upon existing results. Moreover, we use these bounds to derive a lower bound and an upper bound on the asymptotic redundancy of maximally sized t-indel s-substitution correcting codes.
DNA 数据存储领域的最新进展再次吸引了人们对删除、插入和置换纠错码的关注。与旨在纠正置换错误或删除和插入(indel)错误的代码相比,人们对纠正置换和indel错误组合的代码的理解相对滞后。在本文中,我们重点研究了 qary t-indel s-substitution 纠错码的最大大小。我们的主要贡献包括两个受 Gilbert-Varshamov 启发的关于该大小的下界。在上界方面,我们证明了一个类似 Singleton- 的上界、一系列球形堆积上界和一个整数线性规划上界。我们证明了其中几个边界对现有结果的改进。此外,我们还利用这些边界推导出了最大尺寸 t-indel s 置换校正码渐近冗余度的下界和上界。
{"title":"Bounds on the Maximum Cardinality of Indel and Substitution Correcting Codes","authors":"Ward J. P. Spee;Jos H. Weber","doi":"10.1109/TMBMC.2024.3388971","DOIUrl":"https://doi.org/10.1109/TMBMC.2024.3388971","url":null,"abstract":"Recent advances in DNA data storage have attracted renewed attention towards deletion, insertion and substitution correcting codes. Compared to codes aimed at correcting either substitution errors or deletion and insertion (indel) errors, the understanding of codes that correct combinations of substitution and indel errors lags behind. In this paper, we focus on the maximal size of q-ary t-indel s-substitution correcting codes.Our main contributions include two Gilbert-Varshamov inspired lower bounds on this size. On the upper bound side, we prove a Singleton-like bound, a family of sphere-packing upper bounds and an integer linear programming bound. Several of these bounds are shown to improve upon existing results. Moreover, we use these bounds to derive a lower bound and an upper bound on the asymptotic redundancy of maximally sized t-indel s-substitution correcting codes.","PeriodicalId":36530,"journal":{"name":"IEEE Transactions on Molecular, Biological, and Multi-Scale Communications","volume":"10 2","pages":"349-358"},"PeriodicalIF":2.2,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-15DOI: 10.1109/TMBMC.2024.3388977
Steven S. Andrews
Biological systems often include spatial regions with different diffusion coefficients. Explicitly simulating their physical causes is computationally intensive, so it is typically preferable to simply vary the coefficients. This raises the question of how to address the boundaries between the regions. Making them fully permeable in both directions seems intuitively reasonable, but causes molecular motion to be simulated as active diffusion, meaning that it arises from energy that is continuously added to the system; in this case, molecules accumulate on the slow-diffusing side. However, molecular motion in most biochemical systems is better described as thermal diffusion, meaning that it occurs even at equilibrium. This can be simulated by reducing the transmission probability into the slow-diffusing side, which yields the correct result that spatially varying diffusion coefficients that arise from macromolecular crowding, changes in viscosity, or other energy-neutral influences do not affect equilibrium molecular concentrations. This work presents transmission coefficients and transmission probability equations for simulating thermal diffusion, including for cases with free energy differences and/or volume exclusion by crowders. They have been implemented in the Smoldyn particle-based simulation software.
{"title":"Modeling Diffusion Between Regions With Different Diffusion Coefficients","authors":"Steven S. Andrews","doi":"10.1109/TMBMC.2024.3388977","DOIUrl":"https://doi.org/10.1109/TMBMC.2024.3388977","url":null,"abstract":"Biological systems often include spatial regions with different diffusion coefficients. Explicitly simulating their physical causes is computationally intensive, so it is typically preferable to simply vary the coefficients. This raises the question of how to address the boundaries between the regions. Making them fully permeable in both directions seems intuitively reasonable, but causes molecular motion to be simulated as active diffusion, meaning that it arises from energy that is continuously added to the system; in this case, molecules accumulate on the slow-diffusing side. However, molecular motion in most biochemical systems is better described as thermal diffusion, meaning that it occurs even at equilibrium. This can be simulated by reducing the transmission probability into the slow-diffusing side, which yields the correct result that spatially varying diffusion coefficients that arise from macromolecular crowding, changes in viscosity, or other energy-neutral influences do not affect equilibrium molecular concentrations. This work presents transmission coefficients and transmission probability equations for simulating thermal diffusion, including for cases with free energy differences and/or volume exclusion by crowders. They have been implemented in the Smoldyn particle-based simulation software.","PeriodicalId":36530,"journal":{"name":"IEEE Transactions on Molecular, Biological, and Multi-Scale Communications","volume":"10 3","pages":"425-432"},"PeriodicalIF":2.4,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142320474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-31DOI: 10.1109/TMBMC.2024.3408053
Inbal Preuss;Ben Galili;Zohar Yakhini;Leon Anavy
This study introduces a novel model for analyzing and determining the required sequencing coverage in DNA-based data storage, focusing on combinatorial DNA encoding. We seek to characterize the distribution of the number of sequencing reads required for message reconstruction. We use a variant of the coupon collector distribution for this purpose. For any given number of observed reads, $Rin mathbb {N}$