Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746606
Huiyu Duan, Xiongkuo Min, Wei Shen, Guangtao Zhai
A single superimposed image containing two image views causes visual confusion for both human vision and computer vision. Human vision needs a "develop-then-rival" process to decompose the superimposed image into two individual images, which effectively suppresses visual confusion. In this paper, we propose a human vision-inspired framework for separating superimposed images. We first propose a network to simulate the development stage, which tries to understand and distinguish the semantic information of the two layers of a single superimposed image. To further simulate the rivalry activation/suppression process in human brains, we carefully design a rivalry stage, which incorporates the original mixed input (superimposed image), the activated visual information (outputs of the development stage) together, and then rivals to get images without ambiguity. Experimental results show that our novel framework effectively separates the superimposed images and significantly improves the performance with better output quality compared with state-of-the-art methods.
{"title":"A Unified Two-Stage Model for Separating Superimposed Images","authors":"Huiyu Duan, Xiongkuo Min, Wei Shen, Guangtao Zhai","doi":"10.1109/icassp43922.2022.9746606","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746606","url":null,"abstract":"A single superimposed image containing two image views causes visual confusion for both human vision and computer vision. Human vision needs a \"develop-then-rival\" process to decompose the superimposed image into two individual images, which effectively suppresses visual confusion. In this paper, we propose a human vision-inspired framework for separating superimposed images. We first propose a network to simulate the development stage, which tries to understand and distinguish the semantic information of the two layers of a single superimposed image. To further simulate the rivalry activation/suppression process in human brains, we carefully design a rivalry stage, which incorporates the original mixed input (superimposed image), the activated visual information (outputs of the development stage) together, and then rivals to get images without ambiguity. Experimental results show that our novel framework effectively separates the superimposed images and significantly improves the performance with better output quality compared with state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123906339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746419
Jiapeng Li, Ge Li, Thomas H. Li
To copy with the extreme variations of illumination and rotation in the real world, popular descriptors have captured more invariance recently, but more invariance makes descriptors less informative. So this paper designs a unique attention guided framework (named AISLFD) to select appropriate invariance for local feature descriptors, which boosts the performance of descriptors even in the scenes with extreme changes. Specifically, we first explore an efficient multi-scale feature extraction module that provides our local descriptors with more useful information. Besides, we propose a novel parallel self-attention module to get meta descriptors with the global receptive field, which guides the invariance selection more correctly. Compared with state-of-the-art methods, our method achieves competitive performance through sufficient experiments.
{"title":"Attention Guided Invariance Selection for Local Feature Descriptors","authors":"Jiapeng Li, Ge Li, Thomas H. Li","doi":"10.1109/icassp43922.2022.9746419","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746419","url":null,"abstract":"To copy with the extreme variations of illumination and rotation in the real world, popular descriptors have captured more invariance recently, but more invariance makes descriptors less informative. So this paper designs a unique attention guided framework (named AISLFD) to select appropriate invariance for local feature descriptors, which boosts the performance of descriptors even in the scenes with extreme changes. Specifically, we first explore an efficient multi-scale feature extraction module that provides our local descriptors with more useful information. Besides, we propose a novel parallel self-attention module to get meta descriptors with the global receptive field, which guides the invariance selection more correctly. Compared with state-of-the-art methods, our method achieves competitive performance through sufficient experiments.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123508287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746867
P. B. Gohain, M. Jansson
Extended Bayesian information criterion (EBIC) and extended Fisher information criterion (EFIC) are two popular criteria for model selection in sparse high-dimensional linear regression models. However, EBIC is inconsistent in scenarios when the signal-to-noise-ratio (SNR) is high but the sample size is small, and EFIC is not invariant to data scaling, which affects its performance under different signal and noise statistics. In this paper, we present a refined criterion called EBICR where the ‘R’ stands for robust. EBICR is an improved version of EBIC and EFIC. It is scale-invariant and a consistent estimator of the true model as the sample size grows large and/or when the SNR tends to infinity. The performance of EBICR is compared to existing methods such as EBIC, EFIC and multi-beta-test (MBT). Simulation results indicate that the performance of EBICR in identifying the true model is either at par or superior to that of the other considered methods.
{"title":"New Improved Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models","authors":"P. B. Gohain, M. Jansson","doi":"10.1109/icassp43922.2022.9746867","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746867","url":null,"abstract":"Extended Bayesian information criterion (EBIC) and extended Fisher information criterion (EFIC) are two popular criteria for model selection in sparse high-dimensional linear regression models. However, EBIC is inconsistent in scenarios when the signal-to-noise-ratio (SNR) is high but the sample size is small, and EFIC is not invariant to data scaling, which affects its performance under different signal and noise statistics. In this paper, we present a refined criterion called EBICR where the ‘R’ stands for robust. EBICR is an improved version of EBIC and EFIC. It is scale-invariant and a consistent estimator of the true model as the sample size grows large and/or when the SNR tends to infinity. The performance of EBICR is compared to existing methods such as EBIC, EFIC and multi-beta-test (MBT). Simulation results indicate that the performance of EBICR in identifying the true model is either at par or superior to that of the other considered methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123621785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746726
Yuchen Sun, Kejun Huang
We propose a new algorithm called higher-order QR iteration (HO-QRI) for computing the Tucker decomposition of large and sparse tensors. Compared to the celebrated higher-order orthogonal iterations (HOOI), HOQRI relies on a simple orthogonalization step in each iteration rather than a more sophisticated singular value de-composition step as in HOOI. More importantly, when dealing with extremely large and sparse data tensors, HOQRI completely eliminates the intermediate memory explosion by defining a new sparse tensor operation called TTMcTC. Furthermore, HOQRI is shown to monotonically improve the objective function, thus enjoying the same convergence guarantee as that of HOOI. Numerical experiments on synthetic and real data showcase the effectiveness of HOQRI.
{"title":"HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition","authors":"Yuchen Sun, Kejun Huang","doi":"10.1109/icassp43922.2022.9746726","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746726","url":null,"abstract":"We propose a new algorithm called higher-order QR iteration (HO-QRI) for computing the Tucker decomposition of large and sparse tensors. Compared to the celebrated higher-order orthogonal iterations (HOOI), HOQRI relies on a simple orthogonalization step in each iteration rather than a more sophisticated singular value de-composition step as in HOOI. More importantly, when dealing with extremely large and sparse data tensors, HOQRI completely eliminates the intermediate memory explosion by defining a new sparse tensor operation called TTMcTC. Furthermore, HOQRI is shown to monotonically improve the objective function, thus enjoying the same convergence guarantee as that of HOOI. Numerical experiments on synthetic and real data showcase the effectiveness of HOQRI.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123625430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746145
P. L. Combettes, Zev Woodstock
We show that many nonlinear observation models in signal recovery can be represented using firmly nonexpansive operators. To address problems with inaccurate measurements, we propose solving a variational inequality relaxation which is guaranteed to possess solutions under mild conditions and which coincides with the original problem if it happens to be consistent. We then present an efficient algorithm for its solution, as well as numerical applications in signal and image recovery, including an experimental operator-theoretic method of promoting sparsity.
{"title":"Signal Recovery from Inconsistent Nonlinear Observations","authors":"P. L. Combettes, Zev Woodstock","doi":"10.1109/icassp43922.2022.9746145","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746145","url":null,"abstract":"We show that many nonlinear observation models in signal recovery can be represented using firmly nonexpansive operators. To address problems with inaccurate measurements, we propose solving a variational inequality relaxation which is guaranteed to possess solutions under mild conditions and which coincides with the original problem if it happens to be consistent. We then present an efficient algorithm for its solution, as well as numerical applications in signal and image recovery, including an experimental operator-theoretic method of promoting sparsity.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123649550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9747574
Liming Shi, Guoli Ping, Xiaoxiang Shen, M. G. Christensen
Personal sound field control techniques aim to produce sound fields for different sound contents in different places of an acoustic space without interference. The limitations of the state-of-the-art methods for sound field control include high latency and computational complexity, especially in the cases when the reverberation time is long and number of loudspeakers is large. In this paper, we propose a personal sound field control approach that exploits interframe correlation. Considering the past frames, the proposed method can accommodate long reverberation time with a low latency. To find the optimal parameters for the physical meaningful constraints, the subspace decomposition and Newton’s method are applied. Furthermore, a sound field distortion oriented subspace construction method is proposed to reduce the subspace dimension. Compared with traditional methods, simulation results show that the proposed algorithm is able to obtain a good trade-off between acoustic contrast and reproduction error with a low latency for measured room impulse responses.
{"title":"Generation of Personal Sound Fields in Reverberant Environments Using Interframe Correlation","authors":"Liming Shi, Guoli Ping, Xiaoxiang Shen, M. G. Christensen","doi":"10.1109/icassp43922.2022.9747574","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747574","url":null,"abstract":"Personal sound field control techniques aim to produce sound fields for different sound contents in different places of an acoustic space without interference. The limitations of the state-of-the-art methods for sound field control include high latency and computational complexity, especially in the cases when the reverberation time is long and number of loudspeakers is large. In this paper, we propose a personal sound field control approach that exploits interframe correlation. Considering the past frames, the proposed method can accommodate long reverberation time with a low latency. To find the optimal parameters for the physical meaningful constraints, the subspace decomposition and Newton’s method are applied. Furthermore, a sound field distortion oriented subspace construction method is proposed to reduce the subspace dimension. Compared with traditional methods, simulation results show that the proposed algorithm is able to obtain a good trade-off between acoustic contrast and reproduction error with a low latency for measured room impulse responses.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114094179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746822
Y. Liu, Jing Liu, Xiaoguang Zhu, Donglai Wei, Xiaohong Huang, Liang Song
The automatic detection of abnormal events in surveillance videos with weak supervision has been formulated as a multiple instance learning task, which aims to localize the clips containing abnormal events temporally with the video-level labels. However, most existing methods rely on the features extracted by the pre-trained action recognition models, which are not discriminative enough for video anomaly detection. In this work, we propose a spatial-temporal attention mechanism to learn inter- and intra-correlations of video clips, and the boosted features are encouraged to be task-specific via the mutual cosine embedding loss. Experimental results on standard benchmarks demonstrate the effectiveness of the spatial-temporal attention, and our method achieves superior performance to the state-of-the-art methods.
{"title":"Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention","authors":"Y. Liu, Jing Liu, Xiaoguang Zhu, Donglai Wei, Xiaohong Huang, Liang Song","doi":"10.1109/icassp43922.2022.9746822","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746822","url":null,"abstract":"The automatic detection of abnormal events in surveillance videos with weak supervision has been formulated as a multiple instance learning task, which aims to localize the clips containing abnormal events temporally with the video-level labels. However, most existing methods rely on the features extracted by the pre-trained action recognition models, which are not discriminative enough for video anomaly detection. In this work, we propose a spatial-temporal attention mechanism to learn inter- and intra-correlations of video clips, and the boosted features are encouraged to be task-specific via the mutual cosine embedding loss. Experimental results on standard benchmarks demonstrate the effectiveness of the spatial-temporal attention, and our method achieves superior performance to the state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114310685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746238
Ning Wu, Zhaoci Liu, Zhenhua Ling
To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. In this method, phone-level prosody codes are extracted from prosody features by combining VAE with FastSpeech, and are predicted using discourse-level text features together with BERT embeddings. The continuous wavelet transform (CWT) in FastSpeech2 for F0 representation is not necessary anymore. Experimental results on a Chinese audiobook dataset show that our proposed method can effectively take advantage of discourse-level linguistic information and has outperformed FastSpeech2 on the naturalness and expressiveness of synthetic speech.
{"title":"Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis","authors":"Ning Wu, Zhaoci Liu, Zhenhua Ling","doi":"10.1109/icassp43922.2022.9746238","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746238","url":null,"abstract":"To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. In this method, phone-level prosody codes are extracted from prosody features by combining VAE with FastSpeech, and are predicted using discourse-level text features together with BERT embeddings. The continuous wavelet transform (CWT) in FastSpeech2 for F0 representation is not necessary anymore. Experimental results on a Chinese audiobook dataset show that our proposed method can effectively take advantage of discourse-level linguistic information and has outperformed FastSpeech2 on the naturalness and expressiveness of synthetic speech.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116214318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9747165
Haiyan Yu, Zhen Qin, Zhihui Zhu
Efficiently computing an (approximate) orthonormal basis and low-rank approximation for the input data X plays a crucial role in data analysis. One of the most efficient algorithms for such tasks is the randomized algorithm, which proceeds by computing a projection XA with a random sketching matrix A of much smaller size, and then computing the orthonormal basis as well as low-rank factorizations of the tall matrix XA. While a random matrix A is the de facto choice, in this work, we improve upon its performance by utilizing a learning approach to find an adaptive sketching matrix A from a set of training data. We derive a closed-form formulation for the gradient of the training problem, enabling us to use efficient gradient-based algorithms. We also extend this approach for learning structured sketching matrix, such as the sparse sketching matrix that performs as selecting a few number of representative columns from the input data. Our experiments on both synthetical and real data show that both learned dense and sparse sketching matrices outperform the random ones in finding the approximate orthonormal basis and low-rank approximations.
{"title":"Learning Approach For Fast Approximate Matrix Factorizations","authors":"Haiyan Yu, Zhen Qin, Zhihui Zhu","doi":"10.1109/icassp43922.2022.9747165","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747165","url":null,"abstract":"Efficiently computing an (approximate) orthonormal basis and low-rank approximation for the input data X plays a crucial role in data analysis. One of the most efficient algorithms for such tasks is the randomized algorithm, which proceeds by computing a projection XA with a random sketching matrix A of much smaller size, and then computing the orthonormal basis as well as low-rank factorizations of the tall matrix XA. While a random matrix A is the de facto choice, in this work, we improve upon its performance by utilizing a learning approach to find an adaptive sketching matrix A from a set of training data. We derive a closed-form formulation for the gradient of the training problem, enabling us to use efficient gradient-based algorithms. We also extend this approach for learning structured sketching matrix, such as the sparse sketching matrix that performs as selecting a few number of representative columns from the input data. Our experiments on both synthetical and real data show that both learned dense and sparse sketching matrices outperform the random ones in finding the approximate orthonormal basis and low-rank approximations.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121485540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-23DOI: 10.1109/icassp43922.2022.9746938
Wenpeng Xing, Jie Chen
We propose Nex+, a neural Multi-Plane Image (MPI) representation with alpha denoising for the task of novel view synthesis (NVS). Overfitting to training data is a common challenge for all learning-based models. We propose a novel solution for resolving such issue in the context of NVS with signal denoising-motivated operations over the alpha coefficients of the MPI, without any additional requirements for supervision. Nex+ contains a novel 5D Alpha Neural Regulariser (ANR), which favors low-frequency components in the angular domain, i.e., the alpha coefficients’ signal sub-space indicating various viewing directions. ANR’s angular low-frequency property derives from its small number of angular encoding levels and output basis. The regularised alpha in Nex+ can model the scene geometry more accurately than Nex, and outperforms other state-of-the-art methods on public datasets for the task of NVS.
{"title":"NEX+: Novel View Synthesis with Neural Regularisation Over Multi-Plane Images","authors":"Wenpeng Xing, Jie Chen","doi":"10.1109/icassp43922.2022.9746938","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746938","url":null,"abstract":"We propose Nex+, a neural Multi-Plane Image (MPI) representation with alpha denoising for the task of novel view synthesis (NVS). Overfitting to training data is a common challenge for all learning-based models. We propose a novel solution for resolving such issue in the context of NVS with signal denoising-motivated operations over the alpha coefficients of the MPI, without any additional requirements for supervision. Nex+ contains a novel 5D Alpha Neural Regulariser (ANR), which favors low-frequency components in the angular domain, i.e., the alpha coefficients’ signal sub-space indicating various viewing directions. ANR’s angular low-frequency property derives from its small number of angular encoding levels and output basis. The regularised alpha in Nex+ can model the scene geometry more accurately than Nex, and outperforms other state-of-the-art methods on public datasets for the task of NVS.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"87 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114002135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}