首页 > 最新文献

IEEE Transactions on Information Theory最新文献

英文 中文
Efficient Solvers for Wyner Common Information With Application to Multi-Modal Clustering
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-20 DOI: 10.1109/TIT.2025.3532280
Teng-Hui Huang;Hesham El Gamal
In this work, we propose computationally efficient solvers for novel extensions of Wyner common information. By separating information sources into bipartite, the proposed Bipartite common information framework has difference-of-convex structure for efficient non-convex optimization. In known joint distribution cases, our difference-of-convex algorithm(DCA)-based solver has a provable convergence guarantee to local stationary points. As for unknown distribution settings, the insights from DCA combined with the exponential family of distributions for parameterization allows for closed-form expressions for efficient estimation. Furthermore, we show that the Bipartite common information applies to multi-modal clustering without employing ad-hoc clustering algorithms. Empirically, our solvers outperform state-of-the-art methods in clustering accuracy and running time over a range of non-trivial multi-modal clustering datasets with different number of data modalities.
{"title":"Efficient Solvers for Wyner Common Information With Application to Multi-Modal Clustering","authors":"Teng-Hui Huang;Hesham El Gamal","doi":"10.1109/TIT.2025.3532280","DOIUrl":"https://doi.org/10.1109/TIT.2025.3532280","url":null,"abstract":"In this work, we propose computationally efficient solvers for novel extensions of Wyner common information. By separating information sources into bipartite, the proposed Bipartite common information framework has difference-of-convex structure for efficient non-convex optimization. In known joint distribution cases, our difference-of-convex algorithm(DCA)-based solver has a provable convergence guarantee to local stationary points. As for unknown distribution settings, the insights from DCA combined with the exponential family of distributions for parameterization allows for closed-form expressions for efficient estimation. Furthermore, we show that the Bipartite common information applies to multi-modal clustering without employing ad-hoc clustering algorithms. Empirically, our solvers outperform state-of-the-art methods in clustering accuracy and running time over a range of non-trivial multi-modal clustering datasets with different number of data modalities.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2054-2074"},"PeriodicalIF":2.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotically Optimal Codes for (t, s)-Burst Error
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-20 DOI: 10.1109/TIT.2025.3531915
Yubo Sun;Ziyang Lu;Yiwei Zhang;Gennian Ge
Recently, codes for correcting a burst of errors have attracted significant attention. One of the most important reasons is that bursts of errors occur in certain emerging techniques, such as DNA storage. In this paper, we investigate a type of error, called a $(t,s)$ -burst, which deletes t consecutive symbols and inserts s arbitrary symbols at the same coordinate. Note that a $(t,s)$ -burst error can be seen as a generalization of a burst of insertions ( $t=0$ ), a burst of deletions ( $s=0$ ), and a burst of substitutions ( $t=s$ ). Our main contribution is to give explicit constructions of q-ary $(t,s)$ -burst correcting codes with $log n + O(1)$ bits of redundancy for any given constant non-negative integers t, s, and $q geq 2$ . These codes have optimal redundancy up to an additive constant. Furthermore, we apply our $(t,s)$ -burst correcting codes to combat other various types of errors and improve the corresponding results. In particular, one of our byproducts is a permutation code capable of correcting a burst of t stable deletions with $log n + O(1)$ bits of redundancy, which is optimal up to an additive constant.
最近,用于纠正突发错误的编码引起了人们的极大关注。其中一个最重要的原因是,在 DNA 存储等某些新兴技术中会出现突发错误。在本文中,我们研究了一种称为 $(t,s)$ -burst 的错误,它在同一坐标上删除 t 个连续符号并插入 s 个任意符号。请注意,$(t,s)$ 突发性错误可视为插入突发性($t=0$)、删除突发性($s=0$)和替换突发性($t=s$)的一般化。我们的主要贡献是给出了对于任何给定的常数非负整数 t、s 和 $q geq 2$,具有 $log n + O(1)$ 比特冗余度的 qary $(t,s)$ 突发纠错码的明确构造。这些编码的最佳冗余度可达一个可加常数。此外,我们还将我们的 $(t,s)$ 爆破纠错码应用于对抗其他各种类型的错误,并改进了相应的结果。特别是,我们的副产品之一是一种能够以 $log n + O(1)$ 比特的冗余度纠正 t 个稳定删除的突发的置换码,它的冗余度在加常数以内都是最优的。
{"title":"Asymptotically Optimal Codes for (t, s)-Burst Error","authors":"Yubo Sun;Ziyang Lu;Yiwei Zhang;Gennian Ge","doi":"10.1109/TIT.2025.3531915","DOIUrl":"https://doi.org/10.1109/TIT.2025.3531915","url":null,"abstract":"Recently, codes for correcting a burst of errors have attracted significant attention. One of the most important reasons is that bursts of errors occur in certain emerging techniques, such as DNA storage. In this paper, we investigate a type of error, called a <inline-formula> <tex-math>$(t,s)$ </tex-math></inline-formula>-burst, which deletes t consecutive symbols and inserts s arbitrary symbols at the same coordinate. Note that a <inline-formula> <tex-math>$(t,s)$ </tex-math></inline-formula>-burst error can be seen as a generalization of a burst of insertions (<inline-formula> <tex-math>$t=0$ </tex-math></inline-formula>), a burst of deletions (<inline-formula> <tex-math>$s=0$ </tex-math></inline-formula>), and a burst of substitutions (<inline-formula> <tex-math>$t=s$ </tex-math></inline-formula>). Our main contribution is to give explicit constructions of q-ary <inline-formula> <tex-math>$(t,s)$ </tex-math></inline-formula>-burst correcting codes with <inline-formula> <tex-math>$log n + O(1)$ </tex-math></inline-formula> bits of redundancy for any given constant non-negative integers t, s, and <inline-formula> <tex-math>$q geq 2$ </tex-math></inline-formula>. These codes have optimal redundancy up to an additive constant. Furthermore, we apply our <inline-formula> <tex-math>$(t,s)$ </tex-math></inline-formula>-burst correcting codes to combat other various types of errors and improve the corresponding results. In particular, one of our byproducts is a permutation code capable of correcting a burst of t stable deletions with <inline-formula> <tex-math>$log n + O(1)$ </tex-math></inline-formula> bits of redundancy, which is optimal up to an additive constant.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1570-1584"},"PeriodicalIF":2.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization Performance of Empirical Risk Minimization on Over-Parameterized Deep ReLU Nets
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-17 DOI: 10.1109/TIT.2025.3531048
Shao-Bo Lin;Yao Wang;Ding-Xuan Zhou
In this paper, we study the generalization performance of global minima of empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving optimal generalization error rates for numerous types of data under mild conditions. Since over-parameterization of deep ReLU nets is crucial to guarantee that the global minima of ERM can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results present a potential way to fill the gap between optimization and generalization of deep learning.
{"title":"Generalization Performance of Empirical Risk Minimization on Over-Parameterized Deep ReLU Nets","authors":"Shao-Bo Lin;Yao Wang;Ding-Xuan Zhou","doi":"10.1109/TIT.2025.3531048","DOIUrl":"https://doi.org/10.1109/TIT.2025.3531048","url":null,"abstract":"In this paper, we study the generalization performance of global minima of empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving optimal generalization error rates for numerous types of data under mild conditions. Since over-parameterization of deep ReLU nets is crucial to guarantee that the global minima of ERM can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results present a potential way to fill the gap between optimization and generalization of deep learning.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1978-1993"},"PeriodicalIF":2.2,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844907","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Does Distribution Matching Help Domain Generalization: An Information-Theoretic Analysis
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-17 DOI: 10.1109/TIT.2025.3531136
Yuxin Dong;Tieliang Gong;Hong Chen;Shuangyong Song;Weizhan Zhang;Chen Li
Domain generalization aims to learn invariance across multiple source domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success in domain generalization, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution matching. In this work, we formulate domain generalization from a novel probabilistic perspective, ensuring robustness while avoiding overly conservative solutions. Through comprehensive information-theoretic analysis, we provide key insights into the roles of gradient and representation matching in promoting generalization. Our results reveal the complementary relationship between these two components, indicating that existing works focusing solely on either gradient or representation alignment are insufficient to solve the domain generalization problem. In light of these theoretical findings, we introduce IDM to simultaneously align the inter-domain gradients and representations. Integrated with the proposed PDM method for complex distribution matching, IDM achieves superior performance over various baseline methods.
{"title":"How Does Distribution Matching Help Domain Generalization: An Information-Theoretic Analysis","authors":"Yuxin Dong;Tieliang Gong;Hong Chen;Shuangyong Song;Weizhan Zhang;Chen Li","doi":"10.1109/TIT.2025.3531136","DOIUrl":"https://doi.org/10.1109/TIT.2025.3531136","url":null,"abstract":"Domain generalization aims to learn invariance across multiple source domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success in domain generalization, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution matching. In this work, we formulate domain generalization from a novel probabilistic perspective, ensuring robustness while avoiding overly conservative solutions. Through comprehensive information-theoretic analysis, we provide key insights into the roles of gradient and representation matching in promoting generalization. Our results reveal the complementary relationship between these two components, indicating that existing works focusing solely on either gradient or representation alignment are insufficient to solve the domain generalization problem. In light of these theoretical findings, we introduce IDM to simultaneously align the inter-domain gradients and representations. Integrated with the proposed PDM method for complex distribution matching, IDM achieves superior performance over various baseline methods.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2028-2053"},"PeriodicalIF":2.2,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Codes for Correcting a Burst of Edits Using Weighted-Summation VT Sketch
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-16 DOI: 10.1109/TIT.2025.3530506
Yubo Sun;Gennian Ge
Bursts of errors are a class of errors that can be found in a variety of applications. A burst of t edits refers to a burst of t deletions, or a burst of t insertions, or a burst of t substitutions. This paper focuses on studying codes that can correct a burst of t edits. Our primary approach involves the use of the tool called weighted-summation VT sketch. The $(t,k)$ -weighted-summation VT sketch of a length-n sequence is defined as the weighted summation of the VT sketch of each row of its $ttimes lceil n/t rceil $ array representation, with weights in the i-th row set as $k^{i-1}$ for $i=1,2,ldots,t$ . By employing the weighted-summation VT sketch alongside multiple weight sketches, we introduce a construction for q-ary t-burst-substitution correcting codes with a redundancy of $log n+O(1)$ , where the logarithm base is 2. Subsequently, we improve the redundancy to address specific types of burst-substitution errors, such as inversion errors, adjacent-block-transposition errors, and absorption errors. Moreover, by utilizing the method developed in the construction of burst-substitution correcting codes and imposing additional run-length-limited constraints, locally-bounded constraints, and strong-locally-balanced constraints, respectively, we introduce three constructions of t-burst-deletion correcting codes, each requiring a redundancy of $log n+O(log log n)$ . Any t-burst-deletion-correcting code is also a t-burst-insertion correcting code, allowing us to intersect the t-burst-substitution-correcting codes and t-burst-deletion-correcting codes designed above to derive three constructions of q-ary t-burst-edit-correcting codes. The first two constructions have a redundancy of $log n+(tlog q-1)log log n+O(1)$ , while the third construction has a redundancy of $log n+log log n+O(1)$ . Most of the proposed codes demonstrate superior performance compared to previous results, with the exception of burst-deletion correcting codes. Furthermore, in cases of single-edit errors (t-burst-edit error with $t=1$ ), the redundancy of the first two constructions of quaternary single-edit correcting codes outperforms the results of Gabrys et al. (IEEE Trans. Inf. Theory 2023). We also provide efficient encoding and decoding algorithms for our codes to enhance their practical usability.
突发错误是一类在各种应用中都可能出现的错误。一阵 t 次编辑指的是一阵 t 次删除,或一阵 t 次插入,或一阵 t 次替换。本文的重点是研究可以纠正 t 次突发编辑的代码。我们的主要方法是使用加权求和 VT 草图工具。长度为 n 的序列的 $(t,k)$ 加权求和 VT 草图被定义为其 $ttimes lceil n/t rceil $ 数组表示中每一行的 VT 草图的加权求和,第 i 行的权重设置为 $k^{i-1}$,条件为 $i=1,2,ldots,t$ 。通过使用加权求和 VT 草图和多重权重草图,我们引入了一种 qary t-猝置换纠错码的构造,其冗余度为 $log n+O(1)$ ,其中对数基数为 2。此外,通过利用在构建突发置换纠错码时开发的方法,并分别施加额外的运行长度限制约束、局部有界约束和强局部平衡约束,我们引入了三种 t-突发删除纠错码的构建方法,每种方法都需要 $log n+O(log log n)$ 的冗余度。任何 t-burst-deletion-correcting codes 也是 t-burst-insertion correcting codes,这样我们就可以把上面设计的 t-burst-substitution-correcting codes 和 t-burst-deletion-correcting codes 交叉起来,得到 qary t-burst-edit-correcting codes 的三种构造。前两种结构的冗余度为 $log n+(tlog q-1)log log n+O(1)$ ,而第三种结构的冗余度为 $log n+log log n+O(1)$ 。与之前的研究结果相比,除了猝发删除校正码之外,所提出的大多数编码都表现出更优越的性能。此外,在单删节错误(t-burst-edit error with $t=1$ )的情况下,四元单删节纠错码的前两种构造的冗余度优于 Gabrys 等人的结果(IEEE Trans.)我们还为我们的编码提供了高效的编码和解码算法,以提高它们的实际可用性。
{"title":"Codes for Correcting a Burst of Edits Using Weighted-Summation VT Sketch","authors":"Yubo Sun;Gennian Ge","doi":"10.1109/TIT.2025.3530506","DOIUrl":"https://doi.org/10.1109/TIT.2025.3530506","url":null,"abstract":"Bursts of errors are a class of errors that can be found in a variety of applications. A burst of t edits refers to a burst of t deletions, or a burst of t insertions, or a burst of t substitutions. This paper focuses on studying codes that can correct a burst of t edits. Our primary approach involves the use of the tool called weighted-summation VT sketch. The <inline-formula> <tex-math>$(t,k)$ </tex-math></inline-formula>-weighted-summation VT sketch of a length-n sequence is defined as the weighted summation of the VT sketch of each row of its <inline-formula> <tex-math>$ttimes lceil n/t rceil $ </tex-math></inline-formula> array representation, with weights in the i-th row set as <inline-formula> <tex-math>$k^{i-1}$ </tex-math></inline-formula> for <inline-formula> <tex-math>$i=1,2,ldots,t$ </tex-math></inline-formula>. By employing the weighted-summation VT sketch alongside multiple weight sketches, we introduce a construction for q-ary t-burst-substitution correcting codes with a redundancy of <inline-formula> <tex-math>$log n+O(1)$ </tex-math></inline-formula>, where the logarithm base is 2. Subsequently, we improve the redundancy to address specific types of burst-substitution errors, such as inversion errors, adjacent-block-transposition errors, and absorption errors. Moreover, by utilizing the method developed in the construction of burst-substitution correcting codes and imposing additional run-length-limited constraints, locally-bounded constraints, and strong-locally-balanced constraints, respectively, we introduce three constructions of t-burst-deletion correcting codes, each requiring a redundancy of <inline-formula> <tex-math>$log n+O(log log n)$ </tex-math></inline-formula>. Any t-burst-deletion-correcting code is also a t-burst-insertion correcting code, allowing us to intersect the t-burst-substitution-correcting codes and t-burst-deletion-correcting codes designed above to derive three constructions of q-ary t-burst-edit-correcting codes. The first two constructions have a redundancy of <inline-formula> <tex-math>$log n+(tlog q-1)log log n+O(1)$ </tex-math></inline-formula>, while the third construction has a redundancy of <inline-formula> <tex-math>$log n+log log n+O(1)$ </tex-math></inline-formula>. Most of the proposed codes demonstrate superior performance compared to previous results, with the exception of burst-deletion correcting codes. Furthermore, in cases of single-edit errors (t-burst-edit error with <inline-formula> <tex-math>$t=1$ </tex-math></inline-formula>), the redundancy of the first two constructions of quaternary single-edit correcting codes outperforms the results of Gabrys et al. (IEEE Trans. Inf. Theory 2023). We also provide efficient encoding and decoding algorithms for our codes to enhance their practical usability.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1631-1646"},"PeriodicalIF":2.2,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Private Noisy Side Information Helps to Increase the Capacity of SPIR
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-15 DOI: 10.1109/TIT.2025.3530400
Hassan ZivariFard;Rémi A. Chou;Xiaodong Wang
Noiseless private side information does not reduce the download cost in Symmetric Private Information Retrieval (SPIR) unless the client knows all but one file. While this is a pessimistic result, we explore in this paper whether noisy client side information available at the client helps decrease the download cost in the context of SPIR with colluding and replicated servers. Specifically, we assume that the client possesses noisy side information about each stored file, which is obtained by passing each file through one of D possible discrete memoryless test channels. The statistics of the test channels are known by the client and by all the servers, but the mapping $boldsymbol {mathcal {M}}$ between the files and the test channels is unknown to the servers. We study this problem under two privacy metrics. Under the first metric, the client wants to preserve the privacy of its file selection and the mapping $boldsymbol {mathcal {M}}$ , and the servers want to preserve the privacy of all the non-selected files. Under the second metric, the client is willing to reveal the index of the test channel that is associated with its desired file. For both privacy metrics, we derive the optimal common randomness and download cost. Our setup generalizes SPIR with colluding servers and SPIR with private noiseless side information. Unlike noiseless side information, our results demonstrate that noisy side information can reduce the download cost, even when the client does not have noiseless knowledge of all but one file.
{"title":"Private Noisy Side Information Helps to Increase the Capacity of SPIR","authors":"Hassan ZivariFard;Rémi A. Chou;Xiaodong Wang","doi":"10.1109/TIT.2025.3530400","DOIUrl":"https://doi.org/10.1109/TIT.2025.3530400","url":null,"abstract":"Noiseless private side information does not reduce the download cost in Symmetric Private Information Retrieval (SPIR) unless the client knows all but one file. While this is a pessimistic result, we explore in this paper whether noisy client side information available at the client helps decrease the download cost in the context of SPIR with colluding and replicated servers. Specifically, we assume that the client possesses noisy side information about each stored file, which is obtained by passing each file through one of D possible discrete memoryless test channels. The statistics of the test channels are known by the client and by all the servers, but the mapping <inline-formula> <tex-math>$boldsymbol {mathcal {M}}$ </tex-math></inline-formula> between the files and the test channels is unknown to the servers. We study this problem under two privacy metrics. Under the first metric, the client wants to preserve the privacy of its file selection and the mapping <inline-formula> <tex-math>$boldsymbol {mathcal {M}}$ </tex-math></inline-formula>, and the servers want to preserve the privacy of all the non-selected files. Under the second metric, the client is willing to reveal the index of the test channel that is associated with its desired file. For both privacy metrics, we derive the optimal common randomness and download cost. Our setup generalizes SPIR with colluding servers and SPIR with private noiseless side information. Unlike noiseless side information, our results demonstrate that noisy side information can reduce the download cost, even when the client does not have noiseless knowledge of all but one file.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2140-2156"},"PeriodicalIF":2.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overparameterized ReLU Neural Networks Learn the Simplest Model: Neural Isometry and Phase Transitions
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-15 DOI: 10.1109/TIT.2025.3530355
Yifei Wang;Yixuan Hua;Emmanuel J. Candès;Mert Pilanci
The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models, which is easy to describe: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models that generalize well even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.
{"title":"Overparameterized ReLU Neural Networks Learn the Simplest Model: Neural Isometry and Phase Transitions","authors":"Yifei Wang;Yixuan Hua;Emmanuel J. Candès;Mert Pilanci","doi":"10.1109/TIT.2025.3530355","DOIUrl":"https://doi.org/10.1109/TIT.2025.3530355","url":null,"abstract":"The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models, which is easy to describe: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models that generalize well even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1926-1977"},"PeriodicalIF":2.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Quantum Data-Syndrome Codes and Belief Propagation Decoding for Phenomenological Noise
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-14 DOI: 10.1109/TIT.2025.3529773
Kao-Yueh Kuo;Ching-Yi Lai
Quantum stabilizer codes often struggle with syndrome errors due to measurement imperfections. Typically, multiple rounds of syndrome extraction are employed to ensure reliable error information. In this paper, we consider phenomenological decoding problems, where data qubit errors may occur between extractions, and each measurement can be faulty. We introduce generalized quantum data-syndrome codes along with a generalized check matrix that integrates both quaternary and binary alphabets to represent diverse error sources. This results in a Tanner graph with mixed variable nodes, enabling the design of belief propagation (BP) decoding algorithms that effectively handle phenomenological errors. Importantly, our BP decoders are applicable to general sparse quantum codes. Through simulations, we achieve an error threshold of more than 3% for quantum memory protected by rotated toric codes, using solely BP without post-processing. Our results indicate that d rounds of syndrome extraction are sufficient for a toric code of distance d. We observe that at high error rates, fewer rounds of syndrome extraction tend to perform better, while more rounds improve performance at lower error rates. Additionally, we propose a method to construct effective redundant stabilizer checks for single-shot error correction. Our simulations show that BP decoding remains highly effective even with a high syndrome error rate.
{"title":"Generalized Quantum Data-Syndrome Codes and Belief Propagation Decoding for Phenomenological Noise","authors":"Kao-Yueh Kuo;Ching-Yi Lai","doi":"10.1109/TIT.2025.3529773","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529773","url":null,"abstract":"Quantum stabilizer codes often struggle with syndrome errors due to measurement imperfections. Typically, multiple rounds of syndrome extraction are employed to ensure reliable error information. In this paper, we consider phenomenological decoding problems, where data qubit errors may occur between extractions, and each measurement can be faulty. We introduce generalized quantum data-syndrome codes along with a generalized check matrix that integrates both quaternary and binary alphabets to represent diverse error sources. This results in a Tanner graph with mixed variable nodes, enabling the design of belief propagation (BP) decoding algorithms that effectively handle phenomenological errors. Importantly, our BP decoders are applicable to general sparse quantum codes. Through simulations, we achieve an error threshold of more than 3% for quantum memory protected by rotated toric codes, using solely BP without post-processing. Our results indicate that d rounds of syndrome extraction are sufficient for a toric code of distance d. We observe that at high error rates, fewer rounds of syndrome extraction tend to perform better, while more rounds improve performance at lower error rates. Additionally, we propose a method to construct effective redundant stabilizer checks for single-shot error correction. Our simulations show that BP decoding remains highly effective even with a high syndrome error rate.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1824-1840"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Estimation of the Null Distribution in Large-Scale Inference
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-14 DOI: 10.1109/TIT.2025.3529457
Subhodh Kotekal;Chao Gao
The advent of large-scale inference has spurred reexamination of conventional statistical thinking. In a series of highly original articles, Efron persuasively illustrated the danger for downstream inference in assuming the veracity of a posited null distribution. In a Gaussian model for n many z-scores with at most $k lt frac {n}{2}$ nonnulls, Efron suggests estimating the parameters of an empirical null $N(theta , sigma ^{2})$ instead of assuming the theoretical null $N(0, 1)$ . Looking to the robust statistics literature by viewing the nonnulls as outliers is unsatisfactory as the question of optimal rates is still open; even consistency is not known in the regime $k asymp n$ which is especially relevant to many large-scale inference applications. However, provably rate-optimal robust estimators have been developed in other models (e.g. Huber contamination) which appear quite close to Efron’s proposal. Notably, the impossibility of consistency when $k asymp n$ in these other models may suggest the same major weakness afflicts Efron’s popularly adopted recommendation. A sound evaluation thus requires a complete understanding of information-theoretic limits. We characterize the regime of k for which consistent estimation is possible, notably without imposing any assumptions at all on the nonnull effects. Unlike in other robust models, it is shown consistent estimation of the location parameter is possible if and only if $frac {n}{2} {-} k = omega (sqrt {n})$ , and of the scale parameter in the entire regime $k lt frac {n}{2}$ . Furthermore, we establish sharp minimax rates and show estimators based on the empirical characteristic function are optimal by exploiting the Gaussian character of the data.
{"title":"Optimal Estimation of the Null Distribution in Large-Scale Inference","authors":"Subhodh Kotekal;Chao Gao","doi":"10.1109/TIT.2025.3529457","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529457","url":null,"abstract":"The advent of large-scale inference has spurred reexamination of conventional statistical thinking. In a series of highly original articles, Efron persuasively illustrated the danger for downstream inference in assuming the veracity of a posited null distribution. In a Gaussian model for n many z-scores with at most <inline-formula> <tex-math>$k lt frac {n}{2}$ </tex-math></inline-formula> nonnulls, Efron suggests estimating the parameters of an empirical null <inline-formula> <tex-math>$N(theta , sigma ^{2})$ </tex-math></inline-formula> instead of assuming the theoretical null <inline-formula> <tex-math>$N(0, 1)$ </tex-math></inline-formula>. Looking to the robust statistics literature by viewing the nonnulls as outliers is unsatisfactory as the question of optimal rates is still open; even consistency is not known in the regime <inline-formula> <tex-math>$k asymp n$ </tex-math></inline-formula> which is especially relevant to many large-scale inference applications. However, provably rate-optimal robust estimators have been developed in other models (e.g. Huber contamination) which appear quite close to Efron’s proposal. Notably, the impossibility of consistency when <inline-formula> <tex-math>$k asymp n$ </tex-math></inline-formula> in these other models may suggest the same major weakness afflicts Efron’s popularly adopted recommendation. A sound evaluation thus requires a complete understanding of information-theoretic limits. We characterize the regime of k for which consistent estimation is possible, notably without imposing any assumptions at all on the nonnull effects. Unlike in other robust models, it is shown consistent estimation of the location parameter is possible if and only if <inline-formula> <tex-math>$frac {n}{2} {-} k = omega (sqrt {n})$ </tex-math></inline-formula>, and of the scale parameter in the entire regime <inline-formula> <tex-math>$k lt frac {n}{2}$ </tex-math></inline-formula>. Furthermore, we establish sharp minimax rates and show estimators based on the empirical characteristic function are optimal by exploiting the Gaussian character of the data.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2075-2103"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tight Exponential Strong Converse for Source Coding Problem With Encoded Side Information
IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-14 DOI: 10.1109/TIT.2025.3529612
Daisuke Takeuchi;Shun Watanabe
The source coding problem with encoded side information is considered. A lower bound on the strong converse exponent has been derived by Oohama, but its tightness has not been clarified. In this paper, we derive a tight strong converse exponent. For the special case where the side-information does not exist, we demonstrate that our tight exponent of the Wyner-Ahlswede-Körner (WAK) problem reduces to the known tight expression of that special case while Oohama’s lower bound is strictly loose. The converse part is proved by a judicious use of the change-of-measure argument, which was introduced by Gu and Effros and further developed by Tyagi and Watanabe. A key component of the methodology by Tyagi and Watanabe is the use of soft Markov constraint, which was originally introduced by Oohama, as a penalty term to prove the Markov constraint at the end. A technical innovation of this paper compared to Tyagi and Watanabe is recognizing that the soft Markov constraint is a part of the exponent, rather than a penalty term that should vanish at the end; this recognition enables us to derive the matching achievability bound. In fact, via numerical experiment, we provide evidence that the soft Markov constraint is strictly positive. Compared to Oohama’s derivation of the lower bound, which relies on the single-letterization of a certain moment-generating function, the derivation of our tight exponent only involves manipulations of the Kullback-Leibrer divergence and Shannon entropies. The achievability part is derived by a careful analysis of the type argument; however, unlike the conventional analysis for the achievable rate region, we need to derive the soft Markov constraint in the analysis of the correct probability. Furthermore, we present an application of our derivation of the strong converse exponent to the privacy amplification.
{"title":"Tight Exponential Strong Converse for Source Coding Problem With Encoded Side Information","authors":"Daisuke Takeuchi;Shun Watanabe","doi":"10.1109/TIT.2025.3529612","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529612","url":null,"abstract":"The source coding problem with encoded side information is considered. A lower bound on the strong converse exponent has been derived by Oohama, but its tightness has not been clarified. In this paper, we derive a tight strong converse exponent. For the special case where the side-information does not exist, we demonstrate that our tight exponent of the Wyner-Ahlswede-Körner (WAK) problem reduces to the known tight expression of that special case while Oohama’s lower bound is strictly loose. The converse part is proved by a judicious use of the change-of-measure argument, which was introduced by Gu and Effros and further developed by Tyagi and Watanabe. A key component of the methodology by Tyagi and Watanabe is the use of soft Markov constraint, which was originally introduced by Oohama, as a penalty term to prove the Markov constraint at the end. A technical innovation of this paper compared to Tyagi and Watanabe is recognizing that the soft Markov constraint is a part of the exponent, rather than a penalty term that should vanish at the end; this recognition enables us to derive the matching achievability bound. In fact, via numerical experiment, we provide evidence that the soft Markov constraint is strictly positive. Compared to Oohama’s derivation of the lower bound, which relies on the single-letterization of a certain moment-generating function, the derivation of our tight exponent only involves manipulations of the Kullback-Leibrer divergence and Shannon entropies. The achievability part is derived by a careful analysis of the type argument; however, unlike the conventional analysis for the achievable rate region, we need to derive the soft Markov constraint in the analysis of the correct probability. Furthermore, we present an application of our derivation of the strong converse exponent to the privacy amplification.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1533-1545"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Information Theory
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1