{"title":"Determined Multichannel Blind Source Separation with Clustered Source Model","authors":"Jianyu Wang, Shanzheng Guan","doi":"arxiv-2405.03118","DOIUrl":null,"url":null,"abstract":"The independent low-rank matrix analysis (ILRMA) method stands out as a\nprominent technique for multichannel blind audio source separation. It\nleverages nonnegative matrix factorization (NMF) and nonnegative canonical\npolyadic decomposition (NCPD) to model source parameters. While it effectively\ncaptures the low-rank structure of sources, the NMF model overlooks\ninter-channel dependencies. On the other hand, NCPD preserves intrinsic\nstructure but lacks interpretable latent factors, making it challenging to\nincorporate prior information as constraints. To address these limitations, we\nintroduce a clustered source model based on nonnegative block-term\ndecomposition (NBTD). This model defines blocks as outer products of vectors\n(clusters) and matrices (for spectral structure modeling), offering\ninterpretable latent vectors. Moreover, it enables straightforward integration\nof orthogonality constraints to ensure independence among source images.\nExperimental results demonstrate that our proposed method outperforms ILRMA and\nits extensions in anechoic conditions and surpasses the original ILRMA in\nsimulated reverberant environments.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.03118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The independent low-rank matrix analysis (ILRMA) method stands out as a
prominent technique for multichannel blind audio source separation. It
leverages nonnegative matrix factorization (NMF) and nonnegative canonical
polyadic decomposition (NCPD) to model source parameters. While it effectively
captures the low-rank structure of sources, the NMF model overlooks
inter-channel dependencies. On the other hand, NCPD preserves intrinsic
structure but lacks interpretable latent factors, making it challenging to
incorporate prior information as constraints. To address these limitations, we
introduce a clustered source model based on nonnegative block-term
decomposition (NBTD). This model defines blocks as outer products of vectors
(clusters) and matrices (for spectral structure modeling), offering
interpretable latent vectors. Moreover, it enables straightforward integration
of orthogonality constraints to ensure independence among source images.
Experimental results demonstrate that our proposed method outperforms ILRMA and
its extensions in anechoic conditions and surpasses the original ILRMA in
simulated reverberant environments.