{"title":"基于Arcsinh-Compander的大字母概率分布的高效表示","authors":"Aviv Adler, Jennifer Tang, Yury Polyanskiy","doi":"10.1109/ISIT50566.2022.9834837","DOIUrl":null,"url":null,"abstract":"A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to 1. In some cases it is required to represent such a vector with only b bits per entry. A natural choice is to partition the interval [0,1] into 2b uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure – applying an entrywise non-linear function (compander) f(x) prior to quantization – yields an extremely effective quantization method. For example, for b = 8(16) and 105-sized alphabets, the quality of representation improves from a loss (under KL divergence) of 0.5(0.1) bits/entry to 10−4(10−9) bits/entry. Compared to floating point representations, our compander method improves the loss from 10−1(10−6) to 10−4(10−9) bits/entry. These numbers hold for both real-world data (word frequencies in books and DNA k-mer counts) and for synthetic randomly generated distributions. Theoretically, we set up a minimax optimality criterion and show that the compander $f(x) \\propto \\operatorname{ArcSinh} (\\sqrt {(1/2)(K\\log K)x} )$ achieves near-optimal performance, attaining a KL-quantization loss of ≍ 2−2b log2 K for a K-letter alphabet and b →∞. Interestingly, a similar minimax criterion for the quadratic loss on the hypercube shows optimality of the standard uniform quantizer. This suggests that the ArcSinh quantizer is as fundamental for KL-distortion as the uniform quantizer for quadratic distortion.","PeriodicalId":348168,"journal":{"name":"2022 IEEE International Symposium on Information Theory (ISIT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Representation of Large-Alphabet Probability Distributions via Arcsinh-Compander\",\"authors\":\"Aviv Adler, Jennifer Tang, Yury Polyanskiy\",\"doi\":\"10.1109/ISIT50566.2022.9834837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to 1. In some cases it is required to represent such a vector with only b bits per entry. A natural choice is to partition the interval [0,1] into 2b uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure – applying an entrywise non-linear function (compander) f(x) prior to quantization – yields an extremely effective quantization method. For example, for b = 8(16) and 105-sized alphabets, the quality of representation improves from a loss (under KL divergence) of 0.5(0.1) bits/entry to 10−4(10−9) bits/entry. Compared to floating point representations, our compander method improves the loss from 10−1(10−6) to 10−4(10−9) bits/entry. These numbers hold for both real-world data (word frequencies in books and DNA k-mer counts) and for synthetic randomly generated distributions. Theoretically, we set up a minimax optimality criterion and show that the compander $f(x) \\\\propto \\\\operatorname{ArcSinh} (\\\\sqrt {(1/2)(K\\\\log K)x} )$ achieves near-optimal performance, attaining a KL-quantization loss of ≍ 2−2b log2 K for a K-letter alphabet and b →∞. Interestingly, a similar minimax criterion for the quadratic loss on the hypercube shows optimality of the standard uniform quantizer. This suggests that the ArcSinh quantizer is as fundamental for KL-distortion as the uniform quantizer for quadratic distortion.\",\"PeriodicalId\":348168,\"journal\":{\"name\":\"2022 IEEE International Symposium on Information Theory (ISIT)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Symposium on Information Theory (ISIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT50566.2022.9834837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Information Theory (ISIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT50566.2022.9834837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Representation of Large-Alphabet Probability Distributions via Arcsinh-Compander
A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to 1. In some cases it is required to represent such a vector with only b bits per entry. A natural choice is to partition the interval [0,1] into 2b uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure – applying an entrywise non-linear function (compander) f(x) prior to quantization – yields an extremely effective quantization method. For example, for b = 8(16) and 105-sized alphabets, the quality of representation improves from a loss (under KL divergence) of 0.5(0.1) bits/entry to 10−4(10−9) bits/entry. Compared to floating point representations, our compander method improves the loss from 10−1(10−6) to 10−4(10−9) bits/entry. These numbers hold for both real-world data (word frequencies in books and DNA k-mer counts) and for synthetic randomly generated distributions. Theoretically, we set up a minimax optimality criterion and show that the compander $f(x) \propto \operatorname{ArcSinh} (\sqrt {(1/2)(K\log K)x} )$ achieves near-optimal performance, attaining a KL-quantization loss of ≍ 2−2b log2 K for a K-letter alphabet and b →∞. Interestingly, a similar minimax criterion for the quadratic loss on the hypercube shows optimality of the standard uniform quantizer. This suggests that the ArcSinh quantizer is as fundamental for KL-distortion as the uniform quantizer for quadratic distortion.