Jamie Haddock, Lara Kassab, Alona Kryshchenko, D. Needell
{"title":"非负CP张量分解对噪声的鲁棒性","authors":"Jamie Haddock, Lara Kassab, Alona Kryshchenko, D. Needell","doi":"10.1109/ITA50056.2020.9244932","DOIUrl":null,"url":null,"abstract":"In today’s data-driven world, there is an unprecedented demand for large-scale temporal data analysis. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower dimensional nonnegative matrices. With this approach, however, noise can have devastating effects on the learned latent topics and obscure the true topics in the data. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAFAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors. We show experimental evidence that suggests that NNCPD is robust to noise in the data when one overestimates the CP rank of the tensor.","PeriodicalId":137257,"journal":{"name":"2020 Information Theory and Applications Workshop (ITA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"On Nonnegative CP Tensor Decomposition Robustness to Noise\",\"authors\":\"Jamie Haddock, Lara Kassab, Alona Kryshchenko, D. Needell\",\"doi\":\"10.1109/ITA50056.2020.9244932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today’s data-driven world, there is an unprecedented demand for large-scale temporal data analysis. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower dimensional nonnegative matrices. With this approach, however, noise can have devastating effects on the learned latent topics and obscure the true topics in the data. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAFAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors. We show experimental evidence that suggests that NNCPD is robust to noise in the data when one overestimates the CP rank of the tensor.\",\"PeriodicalId\":137257,\"journal\":{\"name\":\"2020 Information Theory and Applications Workshop (ITA)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Information Theory and Applications Workshop (ITA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITA50056.2020.9244932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Information Theory and Applications Workshop (ITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITA50056.2020.9244932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On Nonnegative CP Tensor Decomposition Robustness to Noise
In today’s data-driven world, there is an unprecedented demand for large-scale temporal data analysis. Dynamic topic modeling has been widely used in social and data sciences with the goal of learning latent topics that emerge, evolve, and fade over time. Previous work on dynamic topic modeling primarily employ the method of nonnegative matrix factorization (NMF), where slices of the data tensor are each factorized into the product of lower dimensional nonnegative matrices. With this approach, however, noise can have devastating effects on the learned latent topics and obscure the true topics in the data. To overcome this issue, we propose instead adopting the method of nonnegative CANDECOMP/PARAFAC (CP) tensor decomposition (NNCPD), where the data tensor is directly decomposed into a minimal sum of outer products of nonnegative vectors. We show experimental evidence that suggests that NNCPD is robust to noise in the data when one overestimates the CP rank of the tensor.