首页 > 最新文献

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)最新文献

英文 中文
Complex spectrogram enhancement by convolutional neural network with multi-metrics learning 基于多指标学习的卷积神经网络复谱图增强
Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu
This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously. To solve the first problem, we propose a novel convolutional neural network (CNN) model for complex spectrogram enhancement, namely estimating clean real and imaginary (RI) spectrograms from noisy ones. The reconstructed RI spectrograms are directly used to synthesize enhanced speech waveforms. In addition, since log-power spectrogram (LPS) can be represented as a function of RI spectrograms, its reconstruction is also considered as another target. Thus a unified objective function, which combines these two targets (reconstruction of RI spectrograms and LPS), is equivalent to simultaneously optimizing two commonly used objective metrics: segmental signal-to-noise ratio (SSNR) and log-spectral distortion (LSD). Therefore, the learning process is called multi-metrics learning (MML). Experimental results confirm the effectiveness of the proposed CNN with RI spectrograms and MML in terms of improved standardized evaluation metrics on a speech enhancement task.
本文旨在解决当前语音增强方法中存在的两个问题:1)相位估计困难;2)单一目标函数不能同时考虑多个指标。为了解决第一个问题,我们提出了一种新的卷积神经网络(CNN)模型用于复杂谱图增强,即从噪声谱图中估计干净的实虚(RI)谱图。重建的RI谱图直接用于合成增强语音波形。此外,由于对数功率谱图(LPS)可以表示为RI谱图的函数,因此其重建也被视为另一个目标。因此,一个统一的目标函数,结合这两个目标(重构的RI谱图和LPS),相当于同时优化两个常用的客观指标:段信噪比(SSNR)和对数光谱失真(LSD)。因此,这种学习过程被称为多指标学习(MML)。实验结果证实了基于RI谱图和MML的CNN在语音增强任务中改进的标准化评估指标方面的有效性。
{"title":"Complex spectrogram enhancement by convolutional neural network with multi-metrics learning","authors":"Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu","doi":"10.1109/MLSP.2017.8168119","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168119","url":null,"abstract":"This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously. To solve the first problem, we propose a novel convolutional neural network (CNN) model for complex spectrogram enhancement, namely estimating clean real and imaginary (RI) spectrograms from noisy ones. The reconstructed RI spectrograms are directly used to synthesize enhanced speech waveforms. In addition, since log-power spectrogram (LPS) can be represented as a function of RI spectrograms, its reconstruction is also considered as another target. Thus a unified objective function, which combines these two targets (reconstruction of RI spectrograms and LPS), is equivalent to simultaneously optimizing two commonly used objective metrics: segmental signal-to-noise ratio (SSNR) and log-spectral distortion (LSD). Therefore, the learning process is called multi-metrics learning (MML). Experimental results confirm the effectiveness of the proposed CNN with RI spectrograms and MML in terms of improved standardized evaluation metrics on a speech enhancement task.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"141 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80109280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 151
Scene-Adapted plug-and-play algorithm with convergence guarantees 具有收敛保证的场景适应即插即用算法
Afonso M. Teodoro, J. Bioucas-Dias, Mário A. T. Figueiredo
Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the convergence of the resulting algorithm may be difficult to analyse. In this paper, we plug a state-of-the-art denoiser, based on a Gaussian mixture model, in the iterations of an alternating direction method of multipliers and prove the algorithm is guaranteed to converge. Moreover, we build upon the concept of scene-adapted priors where we learn a model targeted to a specific scene being imaged, and apply the proposed method to address the hyperspectral sharpening problem.
最近的框架,如所谓的即插即用,使我们能够利用图像去噪的发展来解决图像处理中其他更复杂的问题。顾名思义,最先进的去噪器被插入到迭代算法中,该算法在去噪步骤和观测算子的反演之间交替进行。虽然这些工具提供了灵活性,但结果算法的收敛性可能难以分析。在本文中,我们将一种基于高斯混合模型的最先进的去噪器插入到乘法器交替方向法的迭代中,并证明了该算法保证收敛。此外,我们建立了场景适应先验的概念,其中我们学习针对特定场景被成像的模型,并应用所提出的方法来解决高光谱锐化问题。
{"title":"Scene-Adapted plug-and-play algorithm with convergence guarantees","authors":"Afonso M. Teodoro, J. Bioucas-Dias, Mário A. T. Figueiredo","doi":"10.1109/MLSP.2017.8168194","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168194","url":null,"abstract":"Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the convergence of the resulting algorithm may be difficult to analyse. In this paper, we plug a state-of-the-art denoiser, based on a Gaussian mixture model, in the iterations of an alternating direction method of multipliers and prove the algorithm is guaranteed to converge. Moreover, we build upon the concept of scene-adapted priors where we learn a model targeted to a specific scene being imaged, and apply the proposed method to address the hyperspectral sharpening problem.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"19 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83676335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Multi-Objective contextual bandits with a dominant objective 具有主导目标的多目标情境强盗
Cem Tekin, E. Turğay
In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.
在本文中,我们提出了一个新的具有两个目标的上下文强盗问题,其中一个目标优于另一个目标。与单目标强盗问题不同的是,在该问题中,学习者对其选择的每个手臂获得随机标量奖励,而在该问题中,学习者获得随机奖励向量,其中奖励向量的每个组成部分对应于一个目标。学习者的目标是在保证其在优势目标中获得最大回报的同时,使其在非优势目标中获得最大回报。在这种情况下,给定环境的最优手臂是在所有最大化主导目标期望奖励的手臂中最大化非主导目标期望奖励的手臂。针对这一问题,我们提出了多目标上下文多臂盗匪算法(MOC-MAB),并证明了该算法在最优上下文依赖策略下实现了次线性后悔。然后,我们将所提出的算法与其他最先进的强盗算法的性能进行了比较。所提出的上下文强盗模型和算法具有广泛的实际应用,涉及从无线通信到医疗诊断和推荐系统的多个可能相互冲突的目标。
{"title":"Multi-Objective contextual bandits with a dominant objective","authors":"Cem Tekin, E. Turğay","doi":"10.1109/MLSP.2017.8168123","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168123","url":null,"abstract":"In this paper, we propose a new contextual bandit problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective bandit problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives. The goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its reward in the dominant objective. In this case, the optimal arm given a context is the one that maximizes the expected reward in the non-dominant objective among all arms that maximize the expected reward in the dominant objective. For this problem, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and prove that it achieves sublinear regret with respect to the optimal context dependent policy. Then, we compare the performance of the proposed algorithm with other state-of-the-art bandit algorithms. The proposed contextual bandit model and the algorithm have a wide range of real-world applications that involve multiple and possibly conflicting objectives ranging from wireless communication to medical diagnosis and recommender systems.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"110 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76066702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Learning guided convolutional neural networks for cross-resolution face recognition 学习引导卷积神经网络用于交叉分辨率人脸识别
Tzu-Chien Fu, Wei-Chen Chiu, Y. Wang
Cross-resolution face recognition tackles the problem of matching face images with different resolutions. Although state-of-the-art convolutional neural network (CNN) based methods have reported promising performances on standard face recognition problems, such models cannot sufficiently describe images with resolution different from those seen during training, and thus cannot solve the above task accordingly. In this paper, we propose Guided Convolutional Neural Network (Guided-CNN), which is a novel CNN architecture with parallel sub-CNN models as guide and learners. Unique loss functions are introduced, which would serve as joint supervision for images within and across resolutions. Our experiments not only verify the use of our model for cross-resolution recognition, but also its applicability of recognizing face images with different degrees of occlusion.
交叉分辨率人脸识别解决的是不同分辨率人脸图像的匹配问题。尽管基于卷积神经网络(CNN)的最先进的方法在标准人脸识别问题上有很好的表现,但这些模型不能充分描述与训练中看到的分辨率不同的图像,因此无法解决上述任务。本文提出了一种以并行子CNN模型作为引导和学习器的新型CNN结构——Guided-CNN。引入了独特的损失函数,作为分辨率内和跨分辨率图像的联合监督。我们的实验不仅验证了我们的模型在交叉分辨率识别中的应用,而且验证了它在识别不同程度遮挡的人脸图像中的适用性。
{"title":"Learning guided convolutional neural networks for cross-resolution face recognition","authors":"Tzu-Chien Fu, Wei-Chen Chiu, Y. Wang","doi":"10.1109/MLSP.2017.8168180","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168180","url":null,"abstract":"Cross-resolution face recognition tackles the problem of matching face images with different resolutions. Although state-of-the-art convolutional neural network (CNN) based methods have reported promising performances on standard face recognition problems, such models cannot sufficiently describe images with resolution different from those seen during training, and thus cannot solve the above task accordingly. In this paper, we propose Guided Convolutional Neural Network (Guided-CNN), which is a novel CNN architecture with parallel sub-CNN models as guide and learners. Unique loss functions are introduced, which would serve as joint supervision for images within and across resolutions. Our experiments not only verify the use of our model for cross-resolution recognition, but also its applicability of recognizing face images with different degrees of occlusion.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"4 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76097600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Domain-Adaptive generative adversarial networks for sketch-to-photo inversion 领域自适应生成对抗网络用于草图到照片的反演
Yen-Cheng Liu, Wei-Chen Chiu, Sheng-De Wang, Y. Wang
Generating photo-realistic images from multiple style sketches is one of challenging tasks in image synthesis with important applications such as facial composite for suspects. While machine learning techniques have been applied for solving this problem, the requirement of collecting sketch and face photo image pairs would limit the use of the learned model for rendering sketches of different styles. In this paper, we propose a novel deep learning model of Domain-adaptive Generative Adversarial Networks (DA-GAN). The design of DA-GAN performs cross-style sketch-to-photo inversion, which mitigates the difference across input sketch styles without the need to collect a large number of sketch and face image pairs for training purposes. In experiments, we show that our method is able to produce satisfactory results as well as performing favorably against state-of-the-art approaches.
从多种风格的草图中生成逼真的图像是图像合成中具有挑战性的任务之一,具有重要的应用,如嫌疑犯的面部合成。虽然机器学习技术已经被应用于解决这个问题,但收集草图和人脸照片图像对的要求会限制学习模型在绘制不同风格草图时的使用。在本文中,我们提出了一种新的领域自适应生成对抗网络(DA-GAN)深度学习模型。DA-GAN的设计进行了跨风格的草图到照片的反演,这减少了输入草图风格之间的差异,而不需要收集大量的草图和人脸图像对进行训练。在实验中,我们表明我们的方法能够产生令人满意的结果,并且与最先进的方法相比表现良好。
{"title":"Domain-Adaptive generative adversarial networks for sketch-to-photo inversion","authors":"Yen-Cheng Liu, Wei-Chen Chiu, Sheng-De Wang, Y. Wang","doi":"10.1109/MLSP.2017.8168181","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168181","url":null,"abstract":"Generating photo-realistic images from multiple style sketches is one of challenging tasks in image synthesis with important applications such as facial composite for suspects. While machine learning techniques have been applied for solving this problem, the requirement of collecting sketch and face photo image pairs would limit the use of the learned model for rendering sketches of different styles. In this paper, we propose a novel deep learning model of Domain-adaptive Generative Adversarial Networks (DA-GAN). The design of DA-GAN performs cross-style sketch-to-photo inversion, which mitigates the difference across input sketch styles without the need to collect a large number of sketch and face image pairs for training purposes. In experiments, we show that our method is able to produce satisfactory results as well as performing favorably against state-of-the-art approaches.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86664574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adversarial domain separation and adaptation 对抗性领域分离和适应
Jen-Chieh Tsai, Jen-Tzung Chien
Traditional domain adaptation methods attempted to learn the shared representation for distribution matching between source domain and target domain where the individual information in both domains was not characterized. Such a solution suffers from the mixing problem of individual information with the shared features which considerably constrains the performance for domain adaptation. To relax this constraint, it is crucial to extract both shared information and individual information. This study captures both information via a new domain separation network where the shared features are extracted and purified via separate modeling of individual information in both domains. In particular, a hybrid adversarial learning is incorporated in a separation network as well as an adaptation network where the associated discriminators are jointly trained for domain separation and adaptation according to the minmax optimization over separation loss and domain discrepancy, respectively. Experiments on different tasks show the merit of using the proposed adversarial domain separation and adaptation.
传统的领域自适应方法试图通过学习共享表示来实现源域和目标域之间的分布匹配,而源域和目标域的个体信息不具有特征。这种解决方案存在个体信息与共享特征的混合问题,严重限制了领域自适应的性能。为了放松这种约束,提取共享信息和个人信息是至关重要的。本研究通过一个新的领域分离网络来捕获这两个信息,其中通过对两个领域中的单个信息进行单独建模来提取和纯化共享特征。特别地,在分离网络和自适应网络中结合了混合对抗学习,其中根据分离损失和域差异的最小最大优化,分别联合训练相关判别器进行域分离和自适应。在不同任务上的实验表明了该方法的优越性。
{"title":"Adversarial domain separation and adaptation","authors":"Jen-Chieh Tsai, Jen-Tzung Chien","doi":"10.1109/MLSP.2017.8168121","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168121","url":null,"abstract":"Traditional domain adaptation methods attempted to learn the shared representation for distribution matching between source domain and target domain where the individual information in both domains was not characterized. Such a solution suffers from the mixing problem of individual information with the shared features which considerably constrains the performance for domain adaptation. To relax this constraint, it is crucial to extract both shared information and individual information. This study captures both information via a new domain separation network where the shared features are extracted and purified via separate modeling of individual information in both domains. In particular, a hybrid adversarial learning is incorporated in a separation network as well as an adaptation network where the associated discriminators are jointly trained for domain separation and adaptation according to the minmax optimization over separation loss and domain discrepancy, respectively. Experiments on different tasks show the merit of using the proposed adversarial domain separation and adaptation.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76925083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Iterative data-driven coronary vessel labeling 迭代数据驱动的冠状血管标记
Tsaipei Wang
This paper describes an iterative data-driven algorithm for automatically labeling coronary vessel segments in MDCT images. Such techniques are useful for effective presentation and communication of findings on coronary vessel pathology by physicians and computer-assisted diagnosis systems. The experiments are done on the 18 sets of coronary vessel data in the Rotterdam Coronary Artery Algorithm Evaluation Framework that contain segment labeling by medical experts. The performance of our algorithm show both good accuracy and efficiency compared to previous works on this task.
本文描述了一种迭代数据驱动算法,用于自动标记MDCT图像中的冠状血管片段。这些技术有助于医生和计算机辅助诊断系统有效地展示和交流冠状动脉病理结果。实验是在鹿特丹冠状动脉算法评估框架中包含医学专家分段标记的18组冠状动脉数据上进行的。与以往的工作相比,我们的算法具有良好的精度和效率。
{"title":"Iterative data-driven coronary vessel labeling","authors":"Tsaipei Wang","doi":"10.1109/MLSP.2017.8168190","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168190","url":null,"abstract":"This paper describes an iterative data-driven algorithm for automatically labeling coronary vessel segments in MDCT images. Such techniques are useful for effective presentation and communication of findings on coronary vessel pathology by physicians and computer-assisted diagnosis systems. The experiments are done on the 18 sets of coronary vessel data in the Rotterdam Coronary Artery Algorithm Evaluation Framework that contain segment labeling by medical experts. The performance of our algorithm show both good accuracy and efficiency compared to previous works on this task.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"140 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86668216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory augmented neural network for source separation 用于源分离的记忆增强神经网络
K. Tsou, Jen-Tzung Chien
Recurrent neural network (RNN) based on long short-term memory (LSTM) has been successfully developed for single-channel source separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The performance of source separation is constrained due to the limitation of internal memory which could not sufficiently preserve long-term characteristics from different sources. This study deals with this limitation by incorporating an external memory in RNN and accordingly presents a memory augmented neural network for source separation. In particular, we carry out a neural Turing machine to learn a separation model for sequential signals of speech and noise in presence of different speakers and noise types. Experiments show that speech enhancement based on memory augmented neural network consistently outperforms that using deep neural network and LSTM in terms of short-term objective intelligibility measure.
基于长短期记忆(LSTM)的递归神经网络(RNN)已成功地用于单通道信源分离。时间信息是通过使用动态状态来学习的,动态状态随着时间的推移而进化,并作为内部记忆存储。由于内部存储器的限制,不能充分地保存来自不同源的长期特征,从而限制了源分离的性能。本研究通过在RNN中加入外部记忆来解决这一限制,并相应地提出了一种用于源分离的记忆增强神经网络。特别是,我们实现了一个神经图灵机来学习语音和噪声序列信号在不同说话者和噪声类型存在下的分离模型。实验表明,基于记忆增强神经网络的语音增强在短期客观可理解性方面始终优于基于深度神经网络和LSTM的语音增强。
{"title":"Memory augmented neural network for source separation","authors":"K. Tsou, Jen-Tzung Chien","doi":"10.1109/MLSP.2017.8168120","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168120","url":null,"abstract":"Recurrent neural network (RNN) based on long short-term memory (LSTM) has been successfully developed for single-channel source separation. Temporal information is learned by using dynamic states which are evolved through time and stored as an internal memory. The performance of source separation is constrained due to the limitation of internal memory which could not sufficiently preserve long-term characteristics from different sources. This study deals with this limitation by incorporating an external memory in RNN and accordingly presents a memory augmented neural network for source separation. In particular, we carry out a neural Turing machine to learn a separation model for sequential signals of speech and noise in presence of different speakers and noise types. Experiments show that speech enhancement based on memory augmented neural network consistently outperforms that using deep neural network and LSTM in terms of short-term objective intelligibility measure.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"37 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78901930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
End-to-End learning of cost-volume aggregation for real-time dense stereo 实时密集立体系统中成本-体积聚合的端到端学习
Andrey Kuzmin, Dmitry Mikushin, V. Lempitsky
We present a new deep learning-based approach for dense stereo matching. Compared to previous works, our approach does not use deep learning of pixel appearance descriptors, employing very fast classical matching scores instead. At the same time, our approach uses a deep convolutional network to predict the local parameters of cost volume aggregation process, which in this paper we implement using differentiable domain transform. By treating such transform as a recurrent neural network, we are able to train our whole system that includes cost volume computation, cost-volume aggregation (smoothing), and winner-takes-all disparity selection end-to-end. The resulting method is highly efficient at test time, while achieving good matching accuracy. On the KITTI 2012 and KITTI 2015 benchmark, it achieves a result of 5.08% and 6.34% error rate respectively while running at 29 frames per second rate on a modern GPU.
我们提出了一种新的基于深度学习的密集立体匹配方法。与以前的工作相比,我们的方法没有使用像素外观描述符的深度学习,而是使用非常快速的经典匹配分数。同时,我们的方法使用深度卷积网络来预测成本体积聚集过程的局部参数,本文使用可微域变换来实现。通过将这种转换视为递归神经网络,我们能够训练整个系统,包括成本体积计算,成本体积聚合(平滑)和端到端赢家通吃的差异选择。所得到的方法在测试时效率很高,同时获得了良好的匹配精度。在KITTI 2012和KITTI 2015基准测试中,它在现代GPU上以每秒29帧的速率运行时分别达到了5.08%和6.34%的错误率。
{"title":"End-to-End learning of cost-volume aggregation for real-time dense stereo","authors":"Andrey Kuzmin, Dmitry Mikushin, V. Lempitsky","doi":"10.1109/MLSP.2017.8168183","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168183","url":null,"abstract":"We present a new deep learning-based approach for dense stereo matching. Compared to previous works, our approach does not use deep learning of pixel appearance descriptors, employing very fast classical matching scores instead. At the same time, our approach uses a deep convolutional network to predict the local parameters of cost volume aggregation process, which in this paper we implement using differentiable domain transform. By treating such transform as a recurrent neural network, we are able to train our whole system that includes cost volume computation, cost-volume aggregation (smoothing), and winner-takes-all disparity selection end-to-end. The resulting method is highly efficient at test time, while achieving good matching accuracy. On the KITTI 2012 and KITTI 2015 benchmark, it achieves a result of 5.08% and 6.34% error rate respectively while running at 29 frames per second rate on a modern GPU.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2016-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89802647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Parallelizable sparse inverse formulation Gaussian processes (SpInGP) 并行稀疏逆公式高斯过程(SpInGP)
A. Grigorievskiy, Neil D. Lawrence, S. Särkkä
We propose a parallelizable sparse inverse formulation Gaussian process (SpInGP) for temporal models. It uses a sparse precision GP formulation and sparse matrix routines to speed up the computations. Due to the state-space formulation used in the algorithm, the time complexity of the basic SpInGP is linear, and because all the computations are parallelizable, the parallel form of the algorithm is sublinear in the number of data points. We provide example algorithms to implement the sparse matrix routines and experimentally test the method using both simulated and real data.
我们提出了一个并行稀疏逆公式高斯过程(SpInGP)的时间模型。它采用稀疏精度GP公式和稀疏矩阵例程来加快计算速度。由于该算法采用状态空间公式,基本SpInGP的时间复杂度是线性的,而由于所有的计算都是可并行化的,因此该算法的并行形式在数据点数量上是次线性的。我们提供了实现稀疏矩阵例程的示例算法,并使用模拟和实际数据对该方法进行了实验测试。
{"title":"Parallelizable sparse inverse formulation Gaussian processes (SpInGP)","authors":"A. Grigorievskiy, Neil D. Lawrence, S. Särkkä","doi":"10.1109/MLSP.2017.8168130","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168130","url":null,"abstract":"We propose a parallelizable sparse inverse formulation Gaussian process (SpInGP) for temporal models. It uses a sparse precision GP formulation and sparse matrix routines to speed up the computations. Due to the state-space formulation used in the algorithm, the time complexity of the basic SpInGP is linear, and because all the computations are parallelizable, the parallel form of the algorithm is sublinear in the number of data points. We provide example algorithms to implement the sparse matrix routines and experimentally test the method using both simulated and real data.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"11 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2016-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81862144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1