Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834494
Yajuan Liu, Xuan He, Xiaohu Tang
GC-content and homopolymer run are two constraints of interest in DNA storage systems. Extensive experiments showed that if GC-content is too high (low), or homopolymer run exceeds six in a DNA sequence, there will give rise to dramatical increase of insertion, deletion and substitution errors. Committing to study the DNA sequences with both constraints, a recent work (Nguyen et al. 2020) proposed a class of (ϵ, ℓ)-constrained codes that can only asymptotically approach the capacity, but may have reasonable loss for finite code lengths.In this paper, we design the first (ϵ, ℓ)-constrained codes based on the enumeration coding technique which can always achieve capacity regardless of code lengths. In addition, motivated by the influence of local GC-content, we consider a nontrivial case that the prefixes of a DNA sequence also hold GC-content constraint for the first time, called (δ,ℓ)-prefix constrained codes.
gc含量和均聚物运行是DNA存储系统的两个限制因素。大量实验表明,如果一个DNA序列中gc含量过高(过低)或均聚物数超过6个,插入、删除和替换错误将显著增加。致力于研究具有这两种约束的DNA序列,最近的一项工作(Nguyen et al. 2020)提出了一类(λ, λ)约束的编码,它只能渐近地接近容量,但对于有限的编码长度可能有合理的损失。在本文中,我们设计了第一个基于枚举编码技术的(λ, λ)约束码,无论码长如何,都能获得容量。此外,受局部gc含量的影响,我们考虑了一种非平凡的情况,即DNA序列的前缀也首次具有gc含量约束,称为(δ, r)-前缀约束码。
{"title":"Capacity-Achieving Constrained Codes with GC-Content and Runlength Limits for DNA Storage","authors":"Yajuan Liu, Xuan He, Xiaohu Tang","doi":"10.1109/ISIT50566.2022.9834494","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834494","url":null,"abstract":"GC-content and homopolymer run are two constraints of interest in DNA storage systems. Extensive experiments showed that if GC-content is too high (low), or homopolymer run exceeds six in a DNA sequence, there will give rise to dramatical increase of insertion, deletion and substitution errors. Committing to study the DNA sequences with both constraints, a recent work (Nguyen et al. 2020) proposed a class of (ϵ, ℓ)-constrained codes that can only asymptotically approach the capacity, but may have reasonable loss for finite code lengths.In this paper, we design the first (ϵ, ℓ)-constrained codes based on the enumeration coding technique which can always achieve capacity regardless of code lengths. In addition, motivated by the influence of local GC-content, we consider a nontrivial case that the prefixes of a DNA sequence also hold GC-content constraint for the first time, called (δ,ℓ)-prefix constrained codes.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115758673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834718
Hsin-Po Wang, Ryan Gabrys, A. Vardy
Polymerase chain reaction (PCR) testing is the gold standard for diagnosing COVID-19. Unfortunately, the outputs of these tests are imprecise and therefore quantitative group testing methods, which rely on precise measurements, are not applicable. Motivated by the ever-increasing demand to identify individuals infected with SARS-CoV-19, we propose a new model that leverages tropical arithmetic to characterize the PCR testing process. In many cases, some of which are highlighted in this work, tropical group testing is provably more powerful than traditional binary group testing in that it requires fewer tests than classical approaches, while additionally providing a mechanism to identify the viral load of each infected individual.
{"title":"PCR, Tropical Arithmetic, and Group Testing","authors":"Hsin-Po Wang, Ryan Gabrys, A. Vardy","doi":"10.1109/ISIT50566.2022.9834718","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834718","url":null,"abstract":"Polymerase chain reaction (PCR) testing is the gold standard for diagnosing COVID-19. Unfortunately, the outputs of these tests are imprecise and therefore quantitative group testing methods, which rely on precise measurements, are not applicable. Motivated by the ever-increasing demand to identify individuals infected with SARS-CoV-19, we propose a new model that leverages tropical arithmetic to characterize the PCR testing process. In many cases, some of which are highlighted in this work, tropical group testing is provably more powerful than traditional binary group testing in that it requires fewer tests than classical approaches, while additionally providing a mechanism to identify the viral load of each infected individual.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124306213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834574
Benjamin Aram Berendsohn, L. Kozma
Group testing is a well-studied approach for identifying t defective items in a set X of m items, by testing appropriately chosen subsets of X. In classical group testing any subset of X can be tested, and for $t in {mathcal{O}}(1)$ the optimal number of (non-adaptive) tests is known to be Θ(logm).In this work we consider a novel geometric setting for group testing, where the items are points in Euclidean space and the tests are axis-parallel boxes (hyperrectangles), corresponding to the scenario where tests are defined by parameter-ranges (say, according to physical measurements). We present upper and lower bounds on the required number of tests in this setting, observing that in contrast to the unrestricted, combinatorial case, the bounds are polynomial in m. For instance, we show that with two parameters, identifying a defective pair of items requires Ω(m3/5) tests, and there exist configurations for which ${mathcal{O}}left({{m^{2/3}}}right)$ tests are sufficient, whereas to identify a single defective item Θ(m1/2) tests are always necessary and sometimes sufficient. Perhaps most interestingly, our work brings to the study of group testing a set of techniques from extremal combinatorics.
群测试是一种经过充分研究的方法,通过测试X的适当选择的子集来识别m个项目集合X中的t个缺陷项目。在经典的群测试中,X的任何子集都可以被测试,对于$t in {mathcal{O}}(1)$,(非自适应)测试的最佳数量已知为Θ(logm)。在这项工作中,我们考虑了一种新的组测试几何设置,其中项目是欧几里得空间中的点,测试是轴平行盒(超矩形),对应于测试由参数范围定义的场景(例如,根据物理测量)。在这种情况下,我们给出了所需测试次数的上界和下界,观察到与不受限制的组合情况相反,边界是m中的多项式。例如,我们表明,对于两个参数,识别缺陷对需要Ω(m3/5)测试,并且存在${mathcal{O}}左({{m^{2/3}}}}右)$测试足够的配置,然而,为了识别单个缺陷项目Θ(m1/2),测试总是必要的,有时是足够的。也许最有趣的是,我们的工作将极值组合学中的一组技术引入了对群体测试的研究。
{"title":"Group Testing with Geometric Ranges","authors":"Benjamin Aram Berendsohn, L. Kozma","doi":"10.1109/ISIT50566.2022.9834574","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834574","url":null,"abstract":"Group testing is a well-studied approach for identifying t defective items in a set X of m items, by testing appropriately chosen subsets of X. In classical group testing any subset of X can be tested, and for $t in {mathcal{O}}(1)$ the optimal number of (non-adaptive) tests is known to be Θ(logm).In this work we consider a novel geometric setting for group testing, where the items are points in Euclidean space and the tests are axis-parallel boxes (hyperrectangles), corresponding to the scenario where tests are defined by parameter-ranges (say, according to physical measurements). We present upper and lower bounds on the required number of tests in this setting, observing that in contrast to the unrestricted, combinatorial case, the bounds are polynomial in m. For instance, we show that with two parameters, identifying a defective pair of items requires Ω(m3/5) tests, and there exist configurations for which ${mathcal{O}}left({{m^{2/3}}}right)$ tests are sufficient, whereas to identify a single defective item Θ(m1/2) tests are always necessary and sometimes sufficient. Perhaps most interestingly, our work brings to the study of group testing a set of techniques from extremal combinatorics.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124538043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834409
Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony
Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in [1], is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this problem; i.e., the scenario in which the DNA molecule being sequenced is a codeword from a predefined codebook. Our main result is an exact characterization of the capacity of the resulting shotgun sequencing channel as a function of the read length and coverage depth. In particular, our results imply that while in the uncoded case, O(n) reads of length greater than 2logn are needed for reliable reconstruction of a length-n binary sequence, in the coded case, only O(n/log n) reads of length greater than log n are needed for the capacity to be arbitrarily close to 1.
{"title":"Capacity of the Shotgun Sequencing Channel","authors":"Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony","doi":"10.1109/ISIT50566.2022.9834409","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834409","url":null,"abstract":"Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in [1], is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this problem; i.e., the scenario in which the DNA molecule being sequenced is a codeword from a predefined codebook. Our main result is an exact characterization of the capacity of the resulting shotgun sequencing channel as a function of the read length and coverage depth. In particular, our results imply that while in the uncoded case, O(n) reads of length greater than 2logn are needed for reliable reconstruction of a length-n binary sequence, in the coded case, only O(n/log n) reads of length greater than log n are needed for the capacity to be arbitrarily close to 1.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114370202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834606
M. Salmasi, M. Sahani
Perception is an inferential process, in which the state of the immediate environment must be estimated from sensory input. Inference in the face of noise and ambiguity requires reasoning with uncertainty, and much animal behaviour appears close to Bayes optimal. This observation has inspired hypotheses for how the activity of neurons in the brain might represent the distributional beliefs necessary to implement explicit Bayesian computation. While previous work has focused on the sufficiency of these hypothesised codes for computation, relatively little consideration has been given to optimality in the representation itself. Here, we adopt an encoder-decoder approach to study representational optimisation within one hypothesised belief encoding framework: the distributed distributional code (DDC). We consider a setting in which typical belief distribution functions take the form of a sparse combination of an underlying set of basis functions, and the corresponding DDC signals are corrupted by neural variability. We estimate the conditional entropy over beliefs induced by these DDC signals using an appropriate decoder. Like other hypothesised frameworks, a DDC representation of a belief depends on a set of fixed encoding functions that are usually set arbitrarily. Our approach allows us to seek the encoding functions that minimise the decoder conditional entropy and thus optimise representational accuracy in an information theoretic sense. We apply the approach to show how optimal encoding properties may adapt to represent beliefs in new environments, relating the results to experimentally reported neural responses.
{"title":"Learning neural codes for perceptual uncertainty","authors":"M. Salmasi, M. Sahani","doi":"10.1109/ISIT50566.2022.9834606","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834606","url":null,"abstract":"Perception is an inferential process, in which the state of the immediate environment must be estimated from sensory input. Inference in the face of noise and ambiguity requires reasoning with uncertainty, and much animal behaviour appears close to Bayes optimal. This observation has inspired hypotheses for how the activity of neurons in the brain might represent the distributional beliefs necessary to implement explicit Bayesian computation. While previous work has focused on the sufficiency of these hypothesised codes for computation, relatively little consideration has been given to optimality in the representation itself. Here, we adopt an encoder-decoder approach to study representational optimisation within one hypothesised belief encoding framework: the distributed distributional code (DDC). We consider a setting in which typical belief distribution functions take the form of a sparse combination of an underlying set of basis functions, and the corresponding DDC signals are corrupted by neural variability. We estimate the conditional entropy over beliefs induced by these DDC signals using an appropriate decoder. Like other hypothesised frameworks, a DDC representation of a belief depends on a set of fixed encoding functions that are usually set arbitrarily. Our approach allows us to seek the encoding functions that minimise the decoder conditional entropy and thus optimise representational accuracy in an information theoretic sense. We apply the approach to show how optimal encoding properties may adapt to represent beliefs in new environments, relating the results to experimentally reported neural responses.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"301 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114583776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834842
Vinayak Ramkumar, M. Krishnan, Myna Vajha, P. V. Kumar
In the context of an (n,k,m) convolutional code where k is the number of message symbols, n the number of code symbols and m the memory, Martinian [1] introduced the concept of information debt whose value at time t is the number of additional coded symbols needed to decode all prior message symbols. The same paper shows the existence of (n,k,m) convolutional codes that can recover all prior message symbols whenever the symbol-erasure pattern is such that the maximum time interval τ between successive returns to zero of the information debt function is at most m. The parameter τ also represents the worst-case delay in decoding a message symbol. In the present paper, we study (n,k,m) convolutional codes that possess the analogous property for the case τ > m whenever it is possible to do so. We will refer to such codes as information-debt-optimal streaming (iDOS) codes. We prove the existence of periodically time-varying iDOS codes for all possible {n,k,m,τ} parameters. We also show that m-MDS codes and Maximum Distance Profile convolutional codes are iDOS codes for certain parameter ranges. As a by-product of our existence result, the minimum memory needed for a particular class of streaming codes studied earlier in the literature, is determined.
{"title":"On Information-Debt-Optimal Streaming Codes With Small Memory","authors":"Vinayak Ramkumar, M. Krishnan, Myna Vajha, P. V. Kumar","doi":"10.1109/ISIT50566.2022.9834842","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834842","url":null,"abstract":"In the context of an (n,k,m) convolutional code where k is the number of message symbols, n the number of code symbols and m the memory, Martinian [1] introduced the concept of information debt whose value at time t is the number of additional coded symbols needed to decode all prior message symbols. The same paper shows the existence of (n,k,m) convolutional codes that can recover all prior message symbols whenever the symbol-erasure pattern is such that the maximum time interval τ between successive returns to zero of the information debt function is at most m. The parameter τ also represents the worst-case delay in decoding a message symbol. In the present paper, we study (n,k,m) convolutional codes that possess the analogous property for the case τ > m whenever it is possible to do so. We will refer to such codes as information-debt-optimal streaming (iDOS) codes. We prove the existence of periodically time-varying iDOS codes for all possible {n,k,m,τ} parameters. We also show that m-MDS codes and Maximum Distance Profile convolutional codes are iDOS codes for certain parameter ranges. As a by-product of our existence result, the minimum memory needed for a particular class of streaming codes studied earlier in the literature, is determined.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114653299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834599
N. Devroye, N. Mohammadi, A. Mulgund, H. Naik, R. Shekhar, Gyoergy Turan, Y. Wei, M. Žefran
Deep learning has been used recently to learn error-correcting encoders and decoders which may improve upon previously known codes in certain regimes. The encoders and decoders are learned "black-boxes", and interpreting their behavior is of interest both for further applications and for incorporating this work into coding theory. Understanding these codes provides a compelling case study for Explainable Artificial Intelligence (XAI): since coding theory is a well-developed and quantitative field, the interpretability problems that arise differ from those traditionally considered. We develop post-hoc interpretability techniques to analyze the deep-learned, autoencoder-based encoders of TurboAE-binary codes, using influence heatmaps, mixed integer linear programming (MILP), Fourier analysis, and property testing. We compare the learned, interpretable encoders combined with BCJR decoders to the original black-box code.
{"title":"Interpreting Deep-Learned Error-Correcting Codes","authors":"N. Devroye, N. Mohammadi, A. Mulgund, H. Naik, R. Shekhar, Gyoergy Turan, Y. Wei, M. Žefran","doi":"10.1109/ISIT50566.2022.9834599","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834599","url":null,"abstract":"Deep learning has been used recently to learn error-correcting encoders and decoders which may improve upon previously known codes in certain regimes. The encoders and decoders are learned \"black-boxes\", and interpreting their behavior is of interest both for further applications and for incorporating this work into coding theory. Understanding these codes provides a compelling case study for Explainable Artificial Intelligence (XAI): since coding theory is a well-developed and quantitative field, the interpretability problems that arise differ from those traditionally considered. We develop post-hoc interpretability techniques to analyze the deep-learned, autoencoder-based encoders of TurboAE-binary codes, using influence heatmaps, mixed integer linear programming (MILP), Fourier analysis, and property testing. We compare the learned, interpretable encoders combined with BCJR decoders to the original black-box code.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117081601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834894
Chun-Ying Hou, I-Hsiang Wang
Federated learning is a nascent framework for collaborative machine learning over networks of devices with local data and local model updates. Data heterogeneity across the devices is one of the challenges confronting this emerging field. Personalization is a natural approach to simultaneously utilize information from the other users’ data and take data heterogeneity into account. In this work, we study the linear regression problem where the data across users are generated from different regression vectors. We present an information-theoretic lower bound of the minimax expected excess risk of personalized linear models. We show an upper bound that matches the lower bound within constant factors. The results characterize the effect of data heterogeneity on learning performance and the trade-off between sample size, problem difficulty, and distribution discrepancy, suggesting that the discrepancy-to-difficulty ratio is the key factor governing the effectiveness of heterogeneous data.
{"title":"Fundamental Limits of Personalized Federated Linear Regression with Data Heterogeneity","authors":"Chun-Ying Hou, I-Hsiang Wang","doi":"10.1109/ISIT50566.2022.9834894","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834894","url":null,"abstract":"Federated learning is a nascent framework for collaborative machine learning over networks of devices with local data and local model updates. Data heterogeneity across the devices is one of the challenges confronting this emerging field. Personalization is a natural approach to simultaneously utilize information from the other users’ data and take data heterogeneity into account. In this work, we study the linear regression problem where the data across users are generated from different regression vectors. We present an information-theoretic lower bound of the minimax expected excess risk of personalized linear models. We show an upper bound that matches the lower bound within constant factors. The results characterize the effect of data heterogeneity on learning performance and the trade-off between sample size, problem difficulty, and distribution discrepancy, suggesting that the discrepancy-to-difficulty ratio is the key factor governing the effectiveness of heterogeneous data.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124055626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834780
Minoh Jeong, Martina Cardone, Alex Dytso
This paper considers the problem of recovering the ranking of a data vector from noisy observations, up to a distortion. Specifically, the noisy observations consist of the original data vector corrupted by isotropic additive Gaussian noise, and the distortion is measured in terms of a distance function between the estimated ranking and the true ranking of the original data vector. First, it is shown that an optimal (in terms of error probability) decision rule for the estimation task simply outputs the ranking of the noisy observation. Then, the error probability incurred by such a decision rule is characterized in the low-noise regime, and shown to grow sublinearly with the noise standard deviation. This result highlights that the proposed approximate version of the ranking recovery problem is significantly less noise-dominated than the exact recovery considered in [Jeong, ISIT 2021].
{"title":"On the Ranking Recovery from Noisy Observations up to a Distortion","authors":"Minoh Jeong, Martina Cardone, Alex Dytso","doi":"10.1109/ISIT50566.2022.9834780","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834780","url":null,"abstract":"This paper considers the problem of recovering the ranking of a data vector from noisy observations, up to a distortion. Specifically, the noisy observations consist of the original data vector corrupted by isotropic additive Gaussian noise, and the distortion is measured in terms of a distance function between the estimated ranking and the true ranking of the original data vector. First, it is shown that an optimal (in terms of error probability) decision rule for the estimation task simply outputs the ranking of the noisy observation. Then, the error probability incurred by such a decision rule is characterized in the low-noise regime, and shown to grow sublinearly with the noise standard deviation. This result highlights that the proposed approximate version of the ranking recovery problem is significantly less noise-dominated than the exact recovery considered in [Jeong, ISIT 2021].","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125997084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-26DOI: 10.1109/ISIT50566.2022.9834889
Wan, Li Chen, Fangguo Zhang
This paper proposes two interpolation-based algebraic Chase decoding for elliptic codes. It is introduced from the perspective of computing the Gröbner basis of the interpolation module, for which two Chase interpolation approaches are utilized. They are Kötter’s interpolation and the basis reduction (BR) interpolation. By identifying η unreliable symbols, 2η decoding test-vectors are formulated, and the corresponding interpolation modules can be defined. The re-encoding further helps transform the test-vectors, facilitating the two interpolation techniques. In particular, Kötter’s interpolation is performed for the common elements of the test-vectors, producing an intermediate outcome that is shared by the decoding of all test-vectors. The desired Gröbner bases w.r.t. all test-vectors can be obtained in a binary tree growing fashion, leading to a low complexity but its decoding latency cannot be contained. In contrast, the BR interpolation first performs the common computation in basis construction which is shared by all interpolation modules, and then conducts the module basis construction and reduction for all test-vectors in parallel. It results in a significantly lower decoding latency. Finally, simulation results are also presented to demonstrate the effectiveness of the proposed Chase decoding.
{"title":"Algebraic Chase Decoding of Elliptic Codes Through Computing the Gröbner Basis","authors":"Wan, Li Chen, Fangguo Zhang","doi":"10.1109/ISIT50566.2022.9834889","DOIUrl":"https://doi.org/10.1109/ISIT50566.2022.9834889","url":null,"abstract":"This paper proposes two interpolation-based algebraic Chase decoding for elliptic codes. It is introduced from the perspective of computing the Gröbner basis of the interpolation module, for which two Chase interpolation approaches are utilized. They are Kötter’s interpolation and the basis reduction (BR) interpolation. By identifying η unreliable symbols, 2η decoding test-vectors are formulated, and the corresponding interpolation modules can be defined. The re-encoding further helps transform the test-vectors, facilitating the two interpolation techniques. In particular, Kötter’s interpolation is performed for the common elements of the test-vectors, producing an intermediate outcome that is shared by the decoding of all test-vectors. The desired Gröbner bases w.r.t. all test-vectors can be obtained in a binary tree growing fashion, leading to a low complexity but its decoding latency cannot be contained. In contrast, the BR interpolation first performs the common computation in basis construction which is shared by all interpolation modules, and then conducts the module basis construction and reduction for all test-vectors in parallel. It results in a significantly lower decoding latency. Finally, simulation results are also presented to demonstrate the effectiveness of the proposed Chase decoding.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129807707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}