We use algebraic invariant theory to study three weight enumerators of self-dual quantum codes over finite fields. We show that the weight enumerators of self-dual quantum codes can be expressed algebraically by two polynomials and the double weight enumerators of self-dual quantum codes can be expressed algebraically by five polynomials. We also explicitly compute the complete weight enumerators of some special self-dual quantum codes. Our approach avoids applying the well-known Molien's formula and demonstrates the potential of employing invariant theory to compute weight enumerators of quantum codes.
{"title":"Weight enumerators of self-dual quantum codes","authors":"Yin Chen, Shan Ren","doi":"arxiv-2409.03576","DOIUrl":"https://doi.org/arxiv-2409.03576","url":null,"abstract":"We use algebraic invariant theory to study three weight enumerators of\u0000self-dual quantum codes over finite fields. We show that the weight enumerators\u0000of self-dual quantum codes can be expressed algebraically by two polynomials\u0000and the double weight enumerators of self-dual quantum codes can be expressed\u0000algebraically by five polynomials. We also explicitly compute the complete\u0000weight enumerators of some special self-dual quantum codes. Our approach avoids\u0000applying the well-known Molien's formula and demonstrates the potential of\u0000employing invariant theory to compute weight enumerators of quantum codes.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, deep learning (DL) based image transmission at the physical layer (PL) has become a rising trend due to its ability to significantly outperform conventional separation-based digital transmissions. However, implementing solutions at the PL requires a major shift in established standards, such as those in cellular communications. Application layer (AL) solutions present a more feasible and standards-compliant alternative. In this work, we propose a layered image transmission scheme at the AL that is robust to end-to-end (E2E) channel errors. The base layer transmits a coarse image, while the enhancement layer transmits the residual between the original and coarse images. By mapping the residual image into a latent representation that aligns with the structure of the E2E channel, our proposed solution demonstrates high robustness to E2E channel errors.
最近,基于深度学习(DL)的物理层(PL)图像传输已成为一种新兴趋势,因为它能够大大优于传统的基于分离的数字传输。然而,在物理层实施解决方案需要对既定标准(如蜂窝通信标准)进行重大调整。应用层(AL)解决方案提供了更可行且符合标准的替代方案。在这项工作中,我们在 AL 层提出了分层图像传输方案,该方案对端到端(E2E)信道错误具有鲁棒性。基础层传输粗糙图像,增强层传输原始图像和粗糙图像之间的残差。通过将残留图像映射到与端到端信道结构一致的潜表示中,我们提出的解决方案对端到端信道错误具有很高的鲁棒性。
{"title":"Robust End-to-End Image Transmission with Residual Learning","authors":"Cenk M. Yetis","doi":"arxiv-2409.03243","DOIUrl":"https://doi.org/arxiv-2409.03243","url":null,"abstract":"Recently, deep learning (DL) based image transmission at the physical layer\u0000(PL) has become a rising trend due to its ability to significantly outperform\u0000conventional separation-based digital transmissions. However, implementing\u0000solutions at the PL requires a major shift in established standards, such as\u0000those in cellular communications. Application layer (AL) solutions present a\u0000more feasible and standards-compliant alternative. In this work, we propose a\u0000layered image transmission scheme at the AL that is robust to end-to-end (E2E)\u0000channel errors. The base layer transmits a coarse image, while the enhancement\u0000layer transmits the residual between the original and coarse images. By mapping\u0000the residual image into a latent representation that aligns with the structure\u0000of the E2E channel, our proposed solution demonstrates high robustness to E2E\u0000channel errors.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anders Enqvist, Özlem Tuğfe Demir, Cicek Cavdar, Emil Björnson
In this paper, we examine the energy efficiency (EE) of a base station (BS) with multiple antennas. We use a state-of-the-art power consumption model, taking into account the passive and active parts of the transceiver circuitry, including the effects of radiated power, signal processing, and passive consumption. The paper treats the transmit power, bandwidth, and number of antennas as the optimization variables. We provide novel closed-form solutions for the optimal ratios of power per unit bandwidth and power per transmit antenna. We present a novel algorithm that jointly optimizes these variables to achieve maximum EE, while fulfilling constraints on the variable ranges. We also discover a new relationship between the radiated power and the passive transceiver power consumption. We provide analytical insight into whether using maximum power or bandwidth is optimal and how many antennas a BS should utilize.
{"title":"Fundamentals of Energy-Efficient Wireless Links: Optimal Ratios and Scaling Behaviors","authors":"Anders Enqvist, Özlem Tuğfe Demir, Cicek Cavdar, Emil Björnson","doi":"arxiv-2409.03436","DOIUrl":"https://doi.org/arxiv-2409.03436","url":null,"abstract":"In this paper, we examine the energy efficiency (EE) of a base station (BS)\u0000with multiple antennas. We use a state-of-the-art power consumption model,\u0000taking into account the passive and active parts of the transceiver circuitry,\u0000including the effects of radiated power, signal processing, and passive\u0000consumption. The paper treats the transmit power, bandwidth, and number of\u0000antennas as the optimization variables. We provide novel closed-form solutions\u0000for the optimal ratios of power per unit bandwidth and power per transmit\u0000antenna. We present a novel algorithm that jointly optimizes these variables to\u0000achieve maximum EE, while fulfilling constraints on the variable ranges. We\u0000also discover a new relationship between the radiated power and the passive\u0000transceiver power consumption. We provide analytical insight into whether using\u0000maximum power or bandwidth is optimal and how many antennas a BS should\u0000utilize.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sensor placement approaches in networks often involve using information-theoretic measures such as entropy and mutual information. We prove that mutual information abides by submodularity and is non-decreasing when considering the mutual information between the states of the network and a subset of $k$ nodes subjected to additive white Gaussian noise. We prove this under the assumption that the states follow a non-degenerate multivariate Gaussian distribution.
{"title":"Submodularity of Mutual Information for Multivariate Gaussian Sources with Additive Noise","authors":"George Crowley, Inaki Esnaola","doi":"arxiv-2409.03541","DOIUrl":"https://doi.org/arxiv-2409.03541","url":null,"abstract":"Sensor placement approaches in networks often involve using\u0000information-theoretic measures such as entropy and mutual information. We prove\u0000that mutual information abides by submodularity and is non-decreasing when\u0000considering the mutual information between the states of the network and a\u0000subset of $k$ nodes subjected to additive white Gaussian noise. We prove this\u0000under the assumption that the states follow a non-degenerate multivariate\u0000Gaussian distribution.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"297 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We examine the connection between deep learning and information theory through the paradigm of diffusion models. Using well-established principles from non-equilibrium thermodynamics we can characterize the amount of information required to reverse a diffusive process. Neural networks store this information and operate in a manner reminiscent of Maxwell's demon during the generative stage. We illustrate this cycle using a novel diffusion scheme we call the entropy matching model, wherein the information conveyed to the network during training exactly corresponds to the entropy that must be negated during reversal. We demonstrate that this entropy can be used to analyze the encoding efficiency and storage capacity of the network. This conceptual picture blends elements of stochastic optimal control, thermodynamics, information theory, and optimal transport, and raises the prospect of applying diffusion models as a test bench to understand neural networks.
{"title":"Neural Entropy","authors":"Akhil Premkumar","doi":"arxiv-2409.03817","DOIUrl":"https://doi.org/arxiv-2409.03817","url":null,"abstract":"We examine the connection between deep learning and information theory\u0000through the paradigm of diffusion models. Using well-established principles\u0000from non-equilibrium thermodynamics we can characterize the amount of\u0000information required to reverse a diffusive process. Neural networks store this\u0000information and operate in a manner reminiscent of Maxwell's demon during the\u0000generative stage. We illustrate this cycle using a novel diffusion scheme we\u0000call the entropy matching model, wherein the information conveyed to the\u0000network during training exactly corresponds to the entropy that must be negated\u0000during reversal. We demonstrate that this entropy can be used to analyze the\u0000encoding efficiency and storage capacity of the network. This conceptual\u0000picture blends elements of stochastic optimal control, thermodynamics,\u0000information theory, and optimal transport, and raises the prospect of applying\u0000diffusion models as a test bench to understand neural networks.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a novel hemispherical antenna array (HAA) designed for high-altitude platform stations (HAPS). A significant limitation of traditional rectangular antenna arrays for HAPS is that their antenna elements are oriented downward, resulting in low gains for distant users. Cylindrical antenna arrays were introduced to mitigate this drawback; however, their antenna elements face the horizon leading to suboptimal gains for users located beneath the HAPS. To address these challenges, in this study, we introduce our HAA. An HAA's antenna elements are strategically distributed across the surface of a hemisphere to ensure that each user is directly aligned with specific antenna elements. To maximize users minimum signal-to-interference-plus-noise ratio (SINR), we formulate an optimization problem. After performing analog beamforming, we introduce an antenna selection algorithm and show that this method achieves optimality when a substantial number of antenna elements are selected for each user. Additionally, we employ the bisection method to determine the optimal power allocation for each user. Our simulation results convincingly demonstrate that the proposed HAA outperforms the conventional arrays, and provides uniform rates across the entire coverage area. With a $20~mathrm{MHz}$ communication bandwidth, and a $50~mathrm{dBm}$ total power, the proposed approach reaches sum rates of $14~mathrm{Gbps}$.
{"title":"Hemispherical Antenna Array Architecture for High-Altitude Platform Stations (HAPS) for Uniform Capacity Provision","authors":"Omid Abbasi, Halim Yanikomeroglu, Georges Kaddoum","doi":"arxiv-2409.03474","DOIUrl":"https://doi.org/arxiv-2409.03474","url":null,"abstract":"In this paper, we present a novel hemispherical antenna array (HAA) designed\u0000for high-altitude platform stations (HAPS). A significant limitation of\u0000traditional rectangular antenna arrays for HAPS is that their antenna elements\u0000are oriented downward, resulting in low gains for distant users. Cylindrical\u0000antenna arrays were introduced to mitigate this drawback; however, their\u0000antenna elements face the horizon leading to suboptimal gains for users located\u0000beneath the HAPS. To address these challenges, in this study, we introduce our\u0000HAA. An HAA's antenna elements are strategically distributed across the surface\u0000of a hemisphere to ensure that each user is directly aligned with specific\u0000antenna elements. To maximize users minimum signal-to-interference-plus-noise\u0000ratio (SINR), we formulate an optimization problem. After performing analog\u0000beamforming, we introduce an antenna selection algorithm and show that this\u0000method achieves optimality when a substantial number of antenna elements are\u0000selected for each user. Additionally, we employ the bisection method to\u0000determine the optimal power allocation for each user. Our simulation results\u0000convincingly demonstrate that the proposed HAA outperforms the conventional\u0000arrays, and provides uniform rates across the entire coverage area. With a\u0000$20~mathrm{MHz}$ communication bandwidth, and a $50~mathrm{dBm}$ total power,\u0000the proposed approach reaches sum rates of $14~mathrm{Gbps}$.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Xie, Liangyan Li, Jun Chen, Lei Yu, Zhongshan Zhang
This paper investigates the best known bounds on the quadratic Gaussian distortion-rate-perception function with limited common randomness for the Kullback-Leibler divergence-based perception measure, as well as their counterparts for the squared Wasserstein-2 distance-based perception measure, recently established by Xie et al. These bounds are shown to be nondegenerate in the sense that they cannot be deduced from each other via a refined version of Talagrand's transportation inequality. On the other hand, an improved lower bound is established when the perception measure is given by the squared Wasserstein-2 distance. In addition, it is revealed by exploiting the connection between rate-distortion-perception coding and entropy-constrained scalar quantization that all the aforementioned bounds are generally not tight in the weak perception constraint regime.
{"title":"Gaussian Rate-Distortion-Perception Coding and Entropy-Constrained Scalar Quantization","authors":"Li Xie, Liangyan Li, Jun Chen, Lei Yu, Zhongshan Zhang","doi":"arxiv-2409.02388","DOIUrl":"https://doi.org/arxiv-2409.02388","url":null,"abstract":"This paper investigates the best known bounds on the quadratic Gaussian\u0000distortion-rate-perception function with limited common randomness for the\u0000Kullback-Leibler divergence-based perception measure, as well as their\u0000counterparts for the squared Wasserstein-2 distance-based perception measure,\u0000recently established by Xie et al. These bounds are shown to be nondegenerate\u0000in the sense that they cannot be deduced from each other via a refined version\u0000of Talagrand's transportation inequality. On the other hand, an improved lower\u0000bound is established when the perception measure is given by the squared\u0000Wasserstein-2 distance. In addition, it is revealed by exploiting the\u0000connection between rate-distortion-perception coding and entropy-constrained\u0000scalar quantization that all the aforementioned bounds are generally not tight\u0000in the weak perception constraint regime.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Polar codes are a class of error-correcting codes that provably achieve the capacity of practical channels under the low-complexity successive-cancellation flip (SCF) decoding algorithm. However, the SCF decoding algorithm has a variable execution time with a high (worst-case) decoding latency. This characteristic poses a challenge to the design of receivers that have to operate at fixed data rates. In this work, we propose a multi-threshold mechanism that restrains the delay of a SCF decoder depending on the state of the buffer to avoid overflow. We show that the proposed mechanism provides better error-correction performance compared to a straightforward codeword-dropping mechanism at the cost of a small increase in complexity. In the region of interest for wireless communications, the proposed mechanism can prevent buffer overflow while operating with a fixed channel-production rate that is 1.125 times lower than the rate associated to a single decoding trial.
{"title":"Successive-Cancellation Flip Decoding of Polar Codes Under Fixed Channel-Production Rate","authors":"Ilshat Sagitov, Charles Pillet, Pascal Giard","doi":"arxiv-2409.03051","DOIUrl":"https://doi.org/arxiv-2409.03051","url":null,"abstract":"Polar codes are a class of error-correcting codes that provably achieve the\u0000capacity of practical channels under the low-complexity successive-cancellation\u0000flip (SCF) decoding algorithm. However, the SCF decoding algorithm has a\u0000variable execution time with a high (worst-case) decoding latency. This\u0000characteristic poses a challenge to the design of receivers that have to\u0000operate at fixed data rates. In this work, we propose a multi-threshold\u0000mechanism that restrains the delay of a SCF decoder depending on the state of\u0000the buffer to avoid overflow. We show that the proposed mechanism provides\u0000better error-correction performance compared to a straightforward\u0000codeword-dropping mechanism at the cost of a small increase in complexity. In\u0000the region of interest for wireless communications, the proposed mechanism can\u0000prevent buffer overflow while operating with a fixed channel-production rate\u0000that is 1.125 times lower than the rate associated to a single decoding trial.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyuan Yang, Yan Chen, Xiqi Gao, Xiang-Gen Xia, Dirk Slock
We propose a group information geometry approach (GIGA) for ultra-massive multiple-input multiple-output (MIMO) signal detection. The signal detection task is framed as computing the approximate marginals of the a posteriori distribution of the transmitted data symbols of all users. With the approximate marginals, we perform the maximization of the {textsl{a posteriori}} marginals (MPM) detection to recover the symbol of each user. Based on the information geometry theory and the grouping of the components of the received signal, three types of manifolds are constructed and the approximate a posteriori marginals are obtained through m-projections. The Berry-Esseen theorem is introduced to offer an approximate calculation of the m-projection, while its direct calculation is exponentially complex. In most cases, more groups, less complexity of GIGA. However, when the number of groups exceeds a certain threshold, the complexity of GIGA starts to increase. Simulation results confirm that the proposed GIGA achieves better bit error rate (BER) performance within a small number of iterations, which demonstrates that it can serve as an efficient detection method in ultra-massive MIMO systems.
我们提出了一种用于超大规模多输入多输出(MIMO)信号检测的群信息几何方法(GIGA)。信号检测任务的框架是计算所有用户传输数据符号后验分布的近似边际。利用近似边际值,我们执行{textsl{a posteriori}}边际值最大化(MPM)检测,以恢复每个用户的符号。基于信息几何理论和接收信号分量的分组,我们构建了三种流形,并通过 m 投影得到了近似后验边际。贝里-埃森定理的引入提供了 m 投影的近似计算,而其直接计算是指数级复杂的。在大多数情况下,组数越多,GIGA 的复杂性就越低。然而,当组数超过某个阈值时,GIGA 的复杂度就会开始增加。仿真结果证实,所提出的 GIGA 在少量迭代中就能获得较好的误码率(BER)性能,这表明它可以作为超大规模 MIMO 系统中的一种高效检测方法。
{"title":"Group Information Geometry Approach for Ultra-Massive MIMO Signal Detection","authors":"Jiyuan Yang, Yan Chen, Xiqi Gao, Xiang-Gen Xia, Dirk Slock","doi":"arxiv-2409.02616","DOIUrl":"https://doi.org/arxiv-2409.02616","url":null,"abstract":"We propose a group information geometry approach (GIGA) for ultra-massive\u0000multiple-input multiple-output (MIMO) signal detection. The signal detection\u0000task is framed as computing the approximate marginals of the a posteriori\u0000distribution of the transmitted data symbols of all users. With the approximate\u0000marginals, we perform the maximization of the {textsl{a posteriori}} marginals\u0000(MPM) detection to recover the symbol of each user. Based on the information\u0000geometry theory and the grouping of the components of the received signal,\u0000three types of manifolds are constructed and the approximate a posteriori\u0000marginals are obtained through m-projections. The Berry-Esseen theorem is\u0000introduced to offer an approximate calculation of the m-projection, while its\u0000direct calculation is exponentially complex. In most cases, more groups, less\u0000complexity of GIGA. However, when the number of groups exceeds a certain\u0000threshold, the complexity of GIGA starts to increase. Simulation results\u0000confirm that the proposed GIGA achieves better bit error rate (BER) performance\u0000within a small number of iterations, which demonstrates that it can serve as an\u0000efficient detection method in ultra-massive MIMO systems.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The $k$-Minimum Values (kmv) data sketch algorithm stores the $k$ least hash keys generated by hashing the items in a dataset. We show that compression based on ordering the keys and encoding successive differences can offer $O(log n)$ bits per key in expected storage savings, where $n$ is the number of unique values in the data set. We also show that $O(log n)$ expected bits saved per key is optimal for any form of compression for the $k$ least of $n$ random values -- that the encoding method is near-optimal among all methods to encode a kmv sketch. We present a practical method to perform that compression, show that it is computationally efficient, and demonstrate that its average savings in practice is within about five percent of the theoretical minimum based on entropy. We verify that our method outperforms off-the-shelf compression methods, and we demonstrate that it is practical, using real and synthetic data.
{"title":"Key Compression Limits for $k$-Minimum Value Sketches","authors":"Charlie Dickens, Eric Bax, Alexander Saydakov","doi":"arxiv-2409.02852","DOIUrl":"https://doi.org/arxiv-2409.02852","url":null,"abstract":"The $k$-Minimum Values (kmv) data sketch algorithm stores the $k$ least hash\u0000keys generated by hashing the items in a dataset. We show that compression\u0000based on ordering the keys and encoding successive differences can offer\u0000$O(log n)$ bits per key in expected storage savings, where $n$ is the number\u0000of unique values in the data set. We also show that $O(log n)$ expected bits\u0000saved per key is optimal for any form of compression for the $k$ least of $n$\u0000random values -- that the encoding method is near-optimal among all methods to\u0000encode a kmv sketch. We present a practical method to perform that\u0000compression, show that it is computationally efficient, and demonstrate that\u0000its average savings in practice is within about five percent of the theoretical\u0000minimum based on entropy. We verify that our method outperforms off-the-shelf\u0000compression methods, and we demonstrate that it is practical, using real and\u0000synthetic data.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}