{"title":"LASERS:用于生成建模的稀疏性表征 LAtent Space 编码","authors":"Xin Li, Anand Sarwate","doi":"arxiv-2409.11184","DOIUrl":null,"url":null,"abstract":"Learning compact and meaningful latent space representations has been shown\nto be very useful in generative modeling tasks for visual data. One particular\nexample is applying Vector Quantization (VQ) in variational autoencoders\n(VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance\nin many modern generative modeling applications. Quantizing the latent space\nhas been justified by the assumption that the data themselves are inherently\ndiscrete in the latent space (like pixel values). In this paper, we propose an\nalternative representation of the latent space by relaxing the structural\nassumption than the VQ formulation. Specifically, we assume that the latent\nspace can be approximated by a union of subspaces model corresponding to a\ndictionary-based representation under a sparsity constraint. The dictionary is\nlearned/updated during the training process. We apply this approach to look at\ntwo models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs\nwith Generative Adversarial Networks (DL-GANs). We show empirically that our\nmore latent space is more expressive and has leads to better representations\nthan the VQ approach in terms of reconstruction quality at the expense of a\nsmall computational overhead for the latent space computation. Our results thus\nsuggest that the true benefit of the VQ approach might not be from\ndiscretization of the latent space, but rather the lossy compression of the\nlatent space. We confirm this hypothesis by showing that our sparse\nrepresentations also address the codebook collapse issue as found common in\nVQ-family models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling\",\"authors\":\"Xin Li, Anand Sarwate\",\"doi\":\"arxiv-2409.11184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning compact and meaningful latent space representations has been shown\\nto be very useful in generative modeling tasks for visual data. One particular\\nexample is applying Vector Quantization (VQ) in variational autoencoders\\n(VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance\\nin many modern generative modeling applications. Quantizing the latent space\\nhas been justified by the assumption that the data themselves are inherently\\ndiscrete in the latent space (like pixel values). In this paper, we propose an\\nalternative representation of the latent space by relaxing the structural\\nassumption than the VQ formulation. Specifically, we assume that the latent\\nspace can be approximated by a union of subspaces model corresponding to a\\ndictionary-based representation under a sparsity constraint. The dictionary is\\nlearned/updated during the training process. We apply this approach to look at\\ntwo models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs\\nwith Generative Adversarial Networks (DL-GANs). We show empirically that our\\nmore latent space is more expressive and has leads to better representations\\nthan the VQ approach in terms of reconstruction quality at the expense of a\\nsmall computational overhead for the latent space computation. Our results thus\\nsuggest that the true benefit of the VQ approach might not be from\\ndiscretization of the latent space, but rather the lossy compression of the\\nlatent space. We confirm this hypothesis by showing that our sparse\\nrepresentations also address the codebook collapse issue as found common in\\nVQ-family models.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling
Learning compact and meaningful latent space representations has been shown
to be very useful in generative modeling tasks for visual data. One particular
example is applying Vector Quantization (VQ) in variational autoencoders
(VQ-VAEs, VQ-GANs, etc.), which has demonstrated state-of-the-art performance
in many modern generative modeling applications. Quantizing the latent space
has been justified by the assumption that the data themselves are inherently
discrete in the latent space (like pixel values). In this paper, we propose an
alternative representation of the latent space by relaxing the structural
assumption than the VQ formulation. Specifically, we assume that the latent
space can be approximated by a union of subspaces model corresponding to a
dictionary-based representation under a sparsity constraint. The dictionary is
learned/updated during the training process. We apply this approach to look at
two models: Dictionary Learning Variational Autoencoders (DL-VAEs) and DL-VAEs
with Generative Adversarial Networks (DL-GANs). We show empirically that our
more latent space is more expressive and has leads to better representations
than the VQ approach in terms of reconstruction quality at the expense of a
small computational overhead for the latent space computation. Our results thus
suggest that the true benefit of the VQ approach might not be from
discretization of the latent space, but rather the lossy compression of the
latent space. We confirm this hypothesis by showing that our sparse
representations also address the codebook collapse issue as found common in
VQ-family models.