首页 > 最新文献

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
A Unified Two-Stage Model for Separating Superimposed Images 一种统一的两阶段叠加图像分离模型
Huiyu Duan, Xiongkuo Min, Wei Shen, Guangtao Zhai
A single superimposed image containing two image views causes visual confusion for both human vision and computer vision. Human vision needs a "develop-then-rival" process to decompose the superimposed image into two individual images, which effectively suppresses visual confusion. In this paper, we propose a human vision-inspired framework for separating superimposed images. We first propose a network to simulate the development stage, which tries to understand and distinguish the semantic information of the two layers of a single superimposed image. To further simulate the rivalry activation/suppression process in human brains, we carefully design a rivalry stage, which incorporates the original mixed input (superimposed image), the activated visual information (outputs of the development stage) together, and then rivals to get images without ambiguity. Experimental results show that our novel framework effectively separates the superimposed images and significantly improves the performance with better output quality compared with state-of-the-art methods.
包含两个图像视图的单个叠加图像会导致人类视觉和计算机视觉的视觉混淆。人类视觉需要“先显影后竞争”的过程,将叠加的图像分解为两个独立的图像,有效地抑制了视觉混淆。在本文中,我们提出了一个人类视觉启发的框架,用于分离叠加图像。我们首先提出了一个模拟发展阶段的网络,它试图理解和区分单个叠加图像的两层语义信息。为了进一步模拟人脑中的竞争激活/抑制过程,我们精心设计了一个竞争阶段,将原始的混合输入(叠加图像)和激活的视觉信息(发展阶段的输出)结合在一起,然后竞争得到无歧义的图像。实验结果表明,与现有方法相比,该框架有效地分离了叠加图像,显著提高了性能,输出质量更好。
{"title":"A Unified Two-Stage Model for Separating Superimposed Images","authors":"Huiyu Duan, Xiongkuo Min, Wei Shen, Guangtao Zhai","doi":"10.1109/icassp43922.2022.9746606","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746606","url":null,"abstract":"A single superimposed image containing two image views causes visual confusion for both human vision and computer vision. Human vision needs a \"develop-then-rival\" process to decompose the superimposed image into two individual images, which effectively suppresses visual confusion. In this paper, we propose a human vision-inspired framework for separating superimposed images. We first propose a network to simulate the development stage, which tries to understand and distinguish the semantic information of the two layers of a single superimposed image. To further simulate the rivalry activation/suppression process in human brains, we carefully design a rivalry stage, which incorporates the original mixed input (superimposed image), the activated visual information (outputs of the development stage) together, and then rivals to get images without ambiguity. Experimental results show that our novel framework effectively separates the superimposed images and significantly improves the performance with better output quality compared with state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123906339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Attention Guided Invariance Selection for Local Feature Descriptors 局部特征描述符的注意引导不变性选择
Jiapeng Li, Ge Li, Thomas H. Li
To copy with the extreme variations of illumination and rotation in the real world, popular descriptors have captured more invariance recently, but more invariance makes descriptors less informative. So this paper designs a unique attention guided framework (named AISLFD) to select appropriate invariance for local feature descriptors, which boosts the performance of descriptors even in the scenes with extreme changes. Specifically, we first explore an efficient multi-scale feature extraction module that provides our local descriptors with more useful information. Besides, we propose a novel parallel self-attention module to get meta descriptors with the global receptive field, which guides the invariance selection more correctly. Compared with state-of-the-art methods, our method achieves competitive performance through sufficient experiments.
为了复制现实世界中光照和旋转的极端变化,最近流行的描述符捕获了更多的不变性,但更多的不变性使描述符的信息量减少。因此,本文设计了一种独特的注意力引导框架(AISLFD),为局部特征描述子选择合适的不变性,从而在极端变化的场景下提高描述子的性能。具体来说,我们首先探索了一个高效的多尺度特征提取模块,为我们的局部描述符提供更多有用的信息。此外,我们提出了一种新的并行自注意模块来获得具有全局接受场的元描述符,从而更准确地指导不变性选择。与目前最先进的方法相比,我们的方法通过充分的实验获得了具有竞争力的性能。
{"title":"Attention Guided Invariance Selection for Local Feature Descriptors","authors":"Jiapeng Li, Ge Li, Thomas H. Li","doi":"10.1109/icassp43922.2022.9746419","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746419","url":null,"abstract":"To copy with the extreme variations of illumination and rotation in the real world, popular descriptors have captured more invariance recently, but more invariance makes descriptors less informative. So this paper designs a unique attention guided framework (named AISLFD) to select appropriate invariance for local feature descriptors, which boosts the performance of descriptors even in the scenes with extreme changes. Specifically, we first explore an efficient multi-scale feature extraction module that provides our local descriptors with more useful information. Besides, we propose a novel parallel self-attention module to get meta descriptors with the global receptive field, which guides the invariance selection more correctly. Compared with state-of-the-art methods, our method achieves competitive performance through sufficient experiments.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123508287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
New Improved Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models 稀疏高维线性回归模型模型选择的新改进准则
P. B. Gohain, M. Jansson
Extended Bayesian information criterion (EBIC) and extended Fisher information criterion (EFIC) are two popular criteria for model selection in sparse high-dimensional linear regression models. However, EBIC is inconsistent in scenarios when the signal-to-noise-ratio (SNR) is high but the sample size is small, and EFIC is not invariant to data scaling, which affects its performance under different signal and noise statistics. In this paper, we present a refined criterion called EBICR where the ‘R’ stands for robust. EBICR is an improved version of EBIC and EFIC. It is scale-invariant and a consistent estimator of the true model as the sample size grows large and/or when the SNR tends to infinity. The performance of EBICR is compared to existing methods such as EBIC, EFIC and multi-beta-test (MBT). Simulation results indicate that the performance of EBICR in identifying the true model is either at par or superior to that of the other considered methods.
扩展贝叶斯信息准则(EBIC)和扩展费雪信息准则(EFIC)是稀疏高维线性回归模型中常用的两种模型选择准则。然而,在信噪比高但样本量小的情况下,EBIC并不一致,而且EFIC对数据缩放也不是不变的,这影响了它在不同信噪统计量下的性能。在本文中,我们提出了一种称为EBICR的改进准则,其中“R”代表鲁棒性。EBICR是EBIC和EFIC的改进版本。当样本量增大和/或信噪比趋于无穷大时,它是尺度不变的,是真实模型的一致估计量。将EBICR的性能与现有的EBIC、EFIC和多重beta测试(MBT)进行了比较。仿真结果表明,EBICR识别真实模型的性能与其他方法相当或优于其他方法。
{"title":"New Improved Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models","authors":"P. B. Gohain, M. Jansson","doi":"10.1109/icassp43922.2022.9746867","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746867","url":null,"abstract":"Extended Bayesian information criterion (EBIC) and extended Fisher information criterion (EFIC) are two popular criteria for model selection in sparse high-dimensional linear regression models. However, EBIC is inconsistent in scenarios when the signal-to-noise-ratio (SNR) is high but the sample size is small, and EFIC is not invariant to data scaling, which affects its performance under different signal and noise statistics. In this paper, we present a refined criterion called EBICR where the ‘R’ stands for robust. EBICR is an improved version of EBIC and EFIC. It is scale-invariant and a consistent estimator of the true model as the sample size grows large and/or when the SNR tends to infinity. The performance of EBICR is compared to existing methods such as EBIC, EFIC and multi-beta-test (MBT). Simulation results indicate that the performance of EBICR in identifying the true model is either at par or superior to that of the other considered methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123621785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition HOQRI:可扩展Tucker分解的高阶QR迭代
Yuchen Sun, Kejun Huang
We propose a new algorithm called higher-order QR iteration (HO-QRI) for computing the Tucker decomposition of large and sparse tensors. Compared to the celebrated higher-order orthogonal iterations (HOOI), HOQRI relies on a simple orthogonalization step in each iteration rather than a more sophisticated singular value de-composition step as in HOOI. More importantly, when dealing with extremely large and sparse data tensors, HOQRI completely eliminates the intermediate memory explosion by defining a new sparse tensor operation called TTMcTC. Furthermore, HOQRI is shown to monotonically improve the objective function, thus enjoying the same convergence guarantee as that of HOOI. Numerical experiments on synthetic and real data showcase the effectiveness of HOQRI.
我们提出了一种新的算法,称为高阶QR迭代(HO-QRI),用于计算大张量和稀疏张量的Tucker分解。与著名的高阶正交迭代(HOOI)相比,HOQRI在每次迭代中依赖于简单的正交化步骤,而不是像HOOI中那样依赖于更复杂的奇异值分解步骤。更重要的是,在处理超大稀疏数据张量时,HOQRI通过定义一种新的稀疏张量操作TTMcTC,完全消除了中间内存爆炸。此外,HOQRI对目标函数进行单调改进,具有与HOOI相同的收敛性保证。综合数据和实际数据的数值实验证明了HOQRI的有效性。
{"title":"HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition","authors":"Yuchen Sun, Kejun Huang","doi":"10.1109/icassp43922.2022.9746726","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746726","url":null,"abstract":"We propose a new algorithm called higher-order QR iteration (HO-QRI) for computing the Tucker decomposition of large and sparse tensors. Compared to the celebrated higher-order orthogonal iterations (HOOI), HOQRI relies on a simple orthogonalization step in each iteration rather than a more sophisticated singular value de-composition step as in HOOI. More importantly, when dealing with extremely large and sparse data tensors, HOQRI completely eliminates the intermediate memory explosion by defining a new sparse tensor operation called TTMcTC. Furthermore, HOQRI is shown to monotonically improve the objective function, thus enjoying the same convergence guarantee as that of HOOI. Numerical experiments on synthetic and real data showcase the effectiveness of HOQRI.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123625430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Signal Recovery from Inconsistent Nonlinear Observations 不一致非线性观测的信号恢复
P. L. Combettes, Zev Woodstock
We show that many nonlinear observation models in signal recovery can be represented using firmly nonexpansive operators. To address problems with inaccurate measurements, we propose solving a variational inequality relaxation which is guaranteed to possess solutions under mild conditions and which coincides with the original problem if it happens to be consistent. We then present an efficient algorithm for its solution, as well as numerical applications in signal and image recovery, including an experimental operator-theoretic method of promoting sparsity.
我们证明了信号恢复中的许多非线性观测模型可以用非膨胀算子表示。为了解决测量不准确的问题,我们提出求解一个变分不等式松弛,它保证在温和条件下具有解,如果它恰好是一致的,它与原始问题一致。然后,我们提出了一种有效的算法来解决它,以及在信号和图像恢复中的数值应用,包括一种提高稀疏性的实验算子理论方法。
{"title":"Signal Recovery from Inconsistent Nonlinear Observations","authors":"P. L. Combettes, Zev Woodstock","doi":"10.1109/icassp43922.2022.9746145","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746145","url":null,"abstract":"We show that many nonlinear observation models in signal recovery can be represented using firmly nonexpansive operators. To address problems with inaccurate measurements, we propose solving a variational inequality relaxation which is guaranteed to possess solutions under mild conditions and which coincides with the original problem if it happens to be consistent. We then present an efficient algorithm for its solution, as well as numerical applications in signal and image recovery, including an experimental operator-theoretic method of promoting sparsity.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123649550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of Personal Sound Fields in Reverberant Environments Using Interframe Correlation 利用帧间关联在混响环境中产生个人声场
Liming Shi, Guoli Ping, Xiaoxiang Shen, M. G. Christensen
Personal sound field control techniques aim to produce sound fields for different sound contents in different places of an acoustic space without interference. The limitations of the state-of-the-art methods for sound field control include high latency and computational complexity, especially in the cases when the reverberation time is long and number of loudspeakers is large. In this paper, we propose a personal sound field control approach that exploits interframe correlation. Considering the past frames, the proposed method can accommodate long reverberation time with a low latency. To find the optimal parameters for the physical meaningful constraints, the subspace decomposition and Newton’s method are applied. Furthermore, a sound field distortion oriented subspace construction method is proposed to reduce the subspace dimension. Compared with traditional methods, simulation results show that the proposed algorithm is able to obtain a good trade-off between acoustic contrast and reproduction error with a low latency for measured room impulse responses.
个人声场控制技术的目标是在声学空间的不同位置为不同的声音内容产生不受干扰的声场。当前声场控制方法的局限性包括高延迟和计算复杂性,特别是在混响时间长和扬声器数量多的情况下。在本文中,我们提出了一种利用帧间相关的个人声场控制方法。考虑到过去的帧,该方法可以适应较长的混响时间和较低的延迟。为了寻找物理约束的最优参数,采用了子空间分解和牛顿法。在此基础上,提出了一种面向声场畸变的子空间构造方法来降低子空间的维数。与传统方法相比,仿真结果表明,该算法能够在声学对比度和再现误差之间取得良好的平衡,并且具有较低的房间脉冲响应延迟。
{"title":"Generation of Personal Sound Fields in Reverberant Environments Using Interframe Correlation","authors":"Liming Shi, Guoli Ping, Xiaoxiang Shen, M. G. Christensen","doi":"10.1109/icassp43922.2022.9747574","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747574","url":null,"abstract":"Personal sound field control techniques aim to produce sound fields for different sound contents in different places of an acoustic space without interference. The limitations of the state-of-the-art methods for sound field control include high latency and computational complexity, especially in the cases when the reverberation time is long and number of loudspeakers is large. In this paper, we propose a personal sound field control approach that exploits interframe correlation. Considering the past frames, the proposed method can accommodate long reverberation time with a low latency. To find the optimal parameters for the physical meaningful constraints, the subspace decomposition and Newton’s method are applied. Furthermore, a sound field distortion oriented subspace construction method is proposed to reduce the subspace dimension. Compared with traditional methods, simulation results show that the proposed algorithm is able to obtain a good trade-off between acoustic contrast and reproduction error with a low latency for measured room impulse responses.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114094179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention 基于时空注意的视频异常检测学习任务特定表示
Y. Liu, Jing Liu, Xiaoguang Zhu, Donglai Wei, Xiaohong Huang, Liang Song
The automatic detection of abnormal events in surveillance videos with weak supervision has been formulated as a multiple instance learning task, which aims to localize the clips containing abnormal events temporally with the video-level labels. However, most existing methods rely on the features extracted by the pre-trained action recognition models, which are not discriminative enough for video anomaly detection. In this work, we propose a spatial-temporal attention mechanism to learn inter- and intra-correlations of video clips, and the boosted features are encouraged to be task-specific via the mutual cosine embedding loss. Experimental results on standard benchmarks demonstrate the effectiveness of the spatial-temporal attention, and our method achieves superior performance to the state-of-the-art methods.
将弱监管监控视频中的异常事件自动检测制定为一个多实例学习任务,目的是利用视频级别标签对含有异常事件的片段进行时间定位。然而,现有的方法大多依赖于预先训练好的动作识别模型提取的特征,对视频异常检测的鉴别能力不足。在这项工作中,我们提出了一种时空注意机制来学习视频片段之间和内部的相关性,并通过相互余弦嵌入损失来鼓励增强的特征是特定于任务的。在标准基准上的实验结果证明了该方法的有效性,并且该方法的性能优于现有的方法。
{"title":"Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention","authors":"Y. Liu, Jing Liu, Xiaoguang Zhu, Donglai Wei, Xiaohong Huang, Liang Song","doi":"10.1109/icassp43922.2022.9746822","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746822","url":null,"abstract":"The automatic detection of abnormal events in surveillance videos with weak supervision has been formulated as a multiple instance learning task, which aims to localize the clips containing abnormal events temporally with the video-level labels. However, most existing methods rely on the features extracted by the pre-trained action recognition models, which are not discriminative enough for video anomaly detection. In this work, we propose a spatial-temporal attention mechanism to learn inter- and intra-correlations of video clips, and the boosted features are encouraged to be task-specific via the mutual cosine embedding loss. Experimental results on standard benchmarks demonstrate the effectiveness of the spatial-temporal attention, and our method achieves superior performance to the state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114310685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis 基于变分自编码器的非自回归表达语音合成语篇级韵律建模
Ning Wu, Zhaoci Liu, Zhenhua Ling
To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. In this method, phone-level prosody codes are extracted from prosody features by combining VAE with FastSpeech, and are predicted using discourse-level text features together with BERT embeddings. The continuous wavelet transform (CWT) in FastSpeech2 for F0 representation is not necessary anymore. Experimental results on a Chinese audiobook dataset show that our proposed method can effectively take advantage of discourse-level linguistic information and has outperformed FastSpeech2 on the naturalness and expressiveness of synthetic speech.
为了解决表达性语音合成中音素序列到声学特征的一对多映射问题,本文提出了一种基于FastSpeech非自回归架构的变分自编码器(VAE)的语篇级韵律建模方法。在该方法中,通过结合VAE和FastSpeech从韵律特征中提取语音级韵律代码,并结合BERT嵌入使用语篇级文本特征进行预测。FastSpeech2中的连续小波变换(CWT)对F0表示不再是必要的。在中文有声读物数据集上的实验结果表明,本文提出的方法可以有效地利用语篇级语言信息,在合成语音的自然度和表达性方面优于FastSpeech2。
{"title":"Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis","authors":"Ning Wu, Zhaoci Liu, Zhenhua Ling","doi":"10.1109/icassp43922.2022.9746238","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746238","url":null,"abstract":"To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. In this method, phone-level prosody codes are extracted from prosody features by combining VAE with FastSpeech, and are predicted using discourse-level text features together with BERT embeddings. The continuous wavelet transform (CWT) in FastSpeech2 for F0 representation is not necessary anymore. Experimental results on a Chinese audiobook dataset show that our proposed method can effectively take advantage of discourse-level linguistic information and has outperformed FastSpeech2 on the naturalness and expressiveness of synthetic speech.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116214318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Approach For Fast Approximate Matrix Factorizations 快速近似矩阵分解的学习方法
Haiyan Yu, Zhen Qin, Zhihui Zhu
Efficiently computing an (approximate) orthonormal basis and low-rank approximation for the input data X plays a crucial role in data analysis. One of the most efficient algorithms for such tasks is the randomized algorithm, which proceeds by computing a projection XA with a random sketching matrix A of much smaller size, and then computing the orthonormal basis as well as low-rank factorizations of the tall matrix XA. While a random matrix A is the de facto choice, in this work, we improve upon its performance by utilizing a learning approach to find an adaptive sketching matrix A from a set of training data. We derive a closed-form formulation for the gradient of the training problem, enabling us to use efficient gradient-based algorithms. We also extend this approach for learning structured sketching matrix, such as the sparse sketching matrix that performs as selecting a few number of representative columns from the input data. Our experiments on both synthetical and real data show that both learned dense and sparse sketching matrices outperform the random ones in finding the approximate orthonormal basis and low-rank approximations.
有效地计算输入数据X的(近似)标准正交基和低秩近似在数据分析中起着至关重要的作用。对于这类任务,最有效的算法之一是随机算法,它首先用一个小得多的随机素描矩阵a计算一个投影XA,然后计算高矩阵XA的标准正交基和低秩分解。虽然随机矩阵a是事实上的选择,但在这项工作中,我们通过利用学习方法从一组训练数据中找到自适应素描矩阵a来提高其性能。我们推导了训练问题梯度的封闭形式公式,使我们能够使用高效的基于梯度的算法。我们还扩展了这种方法来学习结构化素描矩阵,例如从输入数据中选择一些代表性列的稀疏素描矩阵。我们在综合数据和真实数据上的实验表明,在寻找近似标准正交基和低秩近似方面,学习的密集和稀疏素描矩阵都优于随机素描矩阵。
{"title":"Learning Approach For Fast Approximate Matrix Factorizations","authors":"Haiyan Yu, Zhen Qin, Zhihui Zhu","doi":"10.1109/icassp43922.2022.9747165","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747165","url":null,"abstract":"Efficiently computing an (approximate) orthonormal basis and low-rank approximation for the input data X plays a crucial role in data analysis. One of the most efficient algorithms for such tasks is the randomized algorithm, which proceeds by computing a projection XA with a random sketching matrix A of much smaller size, and then computing the orthonormal basis as well as low-rank factorizations of the tall matrix XA. While a random matrix A is the de facto choice, in this work, we improve upon its performance by utilizing a learning approach to find an adaptive sketching matrix A from a set of training data. We derive a closed-form formulation for the gradient of the training problem, enabling us to use efficient gradient-based algorithms. We also extend this approach for learning structured sketching matrix, such as the sparse sketching matrix that performs as selecting a few number of representative columns from the input data. Our experiments on both synthetical and real data show that both learned dense and sparse sketching matrices outperform the random ones in finding the approximate orthonormal basis and low-rank approximations.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121485540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NEX+: Novel View Synthesis with Neural Regularisation Over Multi-Plane Images NEX+:基于神经正则化的新型多平面图像视图合成
Wenpeng Xing, Jie Chen
We propose Nex+, a neural Multi-Plane Image (MPI) representation with alpha denoising for the task of novel view synthesis (NVS). Overfitting to training data is a common challenge for all learning-based models. We propose a novel solution for resolving such issue in the context of NVS with signal denoising-motivated operations over the alpha coefficients of the MPI, without any additional requirements for supervision. Nex+ contains a novel 5D Alpha Neural Regulariser (ANR), which favors low-frequency components in the angular domain, i.e., the alpha coefficients’ signal sub-space indicating various viewing directions. ANR’s angular low-frequency property derives from its small number of angular encoding levels and output basis. The regularised alpha in Nex+ can model the scene geometry more accurately than Nex, and outperforms other state-of-the-art methods on public datasets for the task of NVS.
我们提出了Nex+,一种带有alpha去噪的神经多平面图像(MPI)表示,用于新视图合成(NVS)任务。对训练数据的过度拟合是所有基于学习的模型面临的共同挑战。我们提出了一种新的解决方案,在NVS的背景下,通过MPI的α系数上的信号去噪驱动操作来解决这一问题,而不需要任何额外的监督要求。Nex+包含一个新颖的5D Alpha Neural Regulariser (ANR),有利于角域中的低频分量,即Alpha系数的信号子空间指示不同的观看方向。ANR的角低频特性源于其角编码层数和输出基数少。Nex+中的正则化alpha可以比Nex更准确地模拟场景几何形状,并且在NVS任务的公共数据集上优于其他最先进的方法。
{"title":"NEX+: Novel View Synthesis with Neural Regularisation Over Multi-Plane Images","authors":"Wenpeng Xing, Jie Chen","doi":"10.1109/icassp43922.2022.9746938","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746938","url":null,"abstract":"We propose Nex+, a neural Multi-Plane Image (MPI) representation with alpha denoising for the task of novel view synthesis (NVS). Overfitting to training data is a common challenge for all learning-based models. We propose a novel solution for resolving such issue in the context of NVS with signal denoising-motivated operations over the alpha coefficients of the MPI, without any additional requirements for supervision. Nex+ contains a novel 5D Alpha Neural Regulariser (ANR), which favors low-frequency components in the angular domain, i.e., the alpha coefficients’ signal sub-space indicating various viewing directions. ANR’s angular low-frequency property derives from its small number of angular encoding levels and output basis. The regularised alpha in Nex+ can model the scene geometry more accurately than Nex, and outperforms other state-of-the-art methods on public datasets for the task of NVS.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"87 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114002135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1