首页 > 最新文献

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Reduced Dimension Minimum BER PSK Precoding for Constrained Transmit Signals in Massive MIMO 大规模MIMO中约束发射信号的降维最小误码率PSK预编码
A. L. Swindlehurst, H. Jedda, I. Fijalkow
Recently a number of nonlinear precoding algorithms have been developed for designing a downlink transmit signal that is constrained by some nonlinearity, such as one-bit quantization, power-amplifier saturation or constant modulus. These methods use iterative search algorithms to directly design the signal that is transmitted from each antenna. Since the dimension of the search space equals the number of antennas, the computational complexity of these approaches can be high for massive MIMO scenarios. Thus, in this paper we pose the problem in a smaller dimensional space by constraining the signal prior to the nonlinearity to be the output of a linear precoder. The search is then over the vector of predistorted symbols at the input to the linear precoder, which is typically much smaller than the number of antennas. We focus on algorithms that minimize the bit error rate at the receivers, and show that performance can be obtained that is similar to algorithms that operate directly in the antenna domain.
近年来,人们开发了许多非线性预编码算法,用于设计受一些非线性约束的下行传输信号,如位量化、功率放大器饱和或恒模。这些方法使用迭代搜索算法直接设计从每个天线发射的信号。由于搜索空间的维数等于天线的数量,对于大规模MIMO场景,这些方法的计算复杂度可能很高。因此,在本文中,我们通过将非线性之前的信号约束为线性预编码器的输出,在较小的维空间中提出问题。然后搜索线性预编码器输入处的预失真符号向量,这通常比天线的数量要小得多。我们专注于最小化接收机误码率的算法,并表明可以获得与直接在天线域中操作的算法相似的性能。
{"title":"Reduced Dimension Minimum BER PSK Precoding for Constrained Transmit Signals in Massive MIMO","authors":"A. L. Swindlehurst, H. Jedda, I. Fijalkow","doi":"10.1109/ICASSP.2018.8461642","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461642","url":null,"abstract":"Recently a number of nonlinear precoding algorithms have been developed for designing a downlink transmit signal that is constrained by some nonlinearity, such as one-bit quantization, power-amplifier saturation or constant modulus. These methods use iterative search algorithms to directly design the signal that is transmitted from each antenna. Since the dimension of the search space equals the number of antennas, the computational complexity of these approaches can be high for massive MIMO scenarios. Thus, in this paper we pose the problem in a smaller dimensional space by constraining the signal prior to the nonlinearity to be the output of a linear precoder. The search is then over the vector of predistorted symbols at the input to the linear precoder, which is typically much smaller than the number of antennas. We focus on algorithms that minimize the bit error rate at the receivers, and show that performance can be obtained that is similar to algorithms that operate directly in the antenna domain.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"3584-3588"},"PeriodicalIF":0.0,"publicationDate":"2018-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89708378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Low Complexity Joint RDO of Prediction Units Couples for HEVC Intra Coding HEVC编码预测单元对的低复杂度联合RDO
Maxime Bichon, J. L. Tanou, M. Ropert, W. Hamidouche, L. Morin, Lu Zhang
HEVC is the latest block-based video compression standard, outperforming H.264/AVC by 50% bitrate savings for the same perceptual quality. An HEVC encoder provides Rate-Distortion optimization coding tools for block-wise compression. Because of complexity limitations, Rate-Distortion Optimization (RDO) is usually performed independently for each block, assuming coding efficiency losses to be negligible. In this paper, we propose an acceleration solution for the Intra coding scheme named Dual-JRDO, which takes advantage of Inter-Block dependencies related to both predictive coding and CABAC. The Dual-JRDO improves Intra coding efficiency at the expense of higher computational complexity. The acceleration of the Dual-JRDO scheme includes adaptive use of the Dual-JRDO model based on source analysis, short-listing and early decisions strategies. The proposed Fast Dual-JRDO reduces the original model complexity by 89.54%, while providing tractable computation for average R-D gains of −0.45% (up to −0.82%) in the HM16.12 reference software model.
HEVC是最新的基于块的视频压缩标准,在相同的感知质量下,比H.264/AVC节省50%的比特率。HEVC编码器为块压缩提供了率失真优化编码工具。由于复杂性的限制,通常在假设编码效率损失可以忽略不计的情况下,对每个块独立执行率失真优化(RDO)。在本文中,我们提出了一种名为Dual-JRDO的内部编码方案的加速解决方案,该方案利用了与预测编码和CABAC相关的块间依赖关系。Dual-JRDO以更高的计算复杂度为代价提高了Intra编码效率。双jrdo方案的加速包括基于源分析、短列表和早期决策策略的双jrdo模型的自适应使用。提出的Fast Dual-JRDO将原始模型的复杂度降低了89.54%,同时在HM16.12参考软件模型中提供了−0.45%(最高−0.82%)的平均R-D增益的易于处理的计算。
{"title":"Low Complexity Joint RDO of Prediction Units Couples for HEVC Intra Coding","authors":"Maxime Bichon, J. L. Tanou, M. Ropert, W. Hamidouche, L. Morin, Lu Zhang","doi":"10.1109/ICASSP.2018.8462489","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462489","url":null,"abstract":"HEVC is the latest block-based video compression standard, outperforming H.264/AVC by 50% bitrate savings for the same perceptual quality. An HEVC encoder provides Rate-Distortion optimization coding tools for block-wise compression. Because of complexity limitations, Rate-Distortion Optimization (RDO) is usually performed independently for each block, assuming coding efficiency losses to be negligible. In this paper, we propose an acceleration solution for the Intra coding scheme named Dual-JRDO, which takes advantage of Inter-Block dependencies related to both predictive coding and CABAC. The Dual-JRDO improves Intra coding efficiency at the expense of higher computational complexity. The acceleration of the Dual-JRDO scheme includes adaptive use of the Dual-JRDO model based on source analysis, short-listing and early decisions strategies. The proposed Fast Dual-JRDO reduces the original model complexity by 89.54%, while providing tractable computation for average R-D gains of −0.45% (up to −0.82%) in the HM16.12 reference software model.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"1733-1737"},"PeriodicalIF":0.0,"publicationDate":"2018-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89820456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Non-Native Children Speech Recognition Through Transfer Learning 通过迁移学习的非母语儿童语音识别
M. Matassoni, R. Gretter, D. Falavigna, D. Giuliani
This work deals with non-native children's speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language. The application scenario is characterized by young students learning English and German and reading sentences in these second-languages, as well as in their mother language. The paper analyzes and discusses techniques for training effective DNN-based acoustic models starting from children's native speech and performing adaptation with limited non-native audio material. A multi -lingual model is adopted as baseline, where a common phonetic lexicon, defined in terms of the units of the International Phonetic Alphabet (IPA), is shared across the three languages at hand (Italian, German and English); DNN adaptation methods based on transfer learning are evaluated on significant non-native evaluation sets. Results show that the resulting non-native models allow a significant improvement with respect to a mono-lingual system adapted to speakers of the target language.
这项工作涉及非母语儿童的语言,并研究了多任务和迁移学习方法,以使多语言深度神经网络(DNN)适应说话者,特别是儿童学习外语。应用场景的特点是年轻学生学习英语和德语,并阅读这两种第二语言和母语的句子。本文分析和讨论了训练有效的基于dnn的声学模型的技术,从儿童的母语语言开始,使用有限的非母语音频材料进行适应。采用多语言模型作为基准,其中以国际音标(IPA)单位定义的共同语音词汇在手边的三种语言(意大利语、德语和英语)中共享;基于迁移学习的深度神经网络自适应方法在显著的非原生评价集上进行了评价。结果表明,由此产生的非本地模型相对于适应目标语言使用者的单语系统有了显著的改进。
{"title":"Non-Native Children Speech Recognition Through Transfer Learning","authors":"M. Matassoni, R. Gretter, D. Falavigna, D. Giuliani","doi":"10.1109/ICASSP.2018.8462059","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462059","url":null,"abstract":"This work deals with non-native children's speech and investigates both multi-task and transfer learning approaches to adapt a multi-language Deep Neural Network (DNN) to speakers, specifically children, learning a foreign language. The application scenario is characterized by young students learning English and German and reading sentences in these second-languages, as well as in their mother language. The paper analyzes and discusses techniques for training effective DNN-based acoustic models starting from children's native speech and performing adaptation with limited non-native audio material. A multi -lingual model is adopted as baseline, where a common phonetic lexicon, defined in terms of the units of the International Phonetic Alphabet (IPA), is shared across the three languages at hand (Italian, German and English); DNN adaptation methods based on transfer learning are evaluated on significant non-native evaluation sets. Results show that the resulting non-native models allow a significant improvement with respect to a mono-lingual system adapted to speakers of the target language.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"137 1","pages":"6229-6233"},"PeriodicalIF":0.0,"publicationDate":"2018-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75301711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Ranking Using Transition Probabilities Learned from Multi-Attribute Data 基于转移概率的多属性数据排序
Sigurd Løkse, R. Jenssen
In this paper, as a novel approach, we learn Markov chain transition probabilities for ranking of multi -attribute data from the inherent structures in the data itself. The procedure is inspired by consensus clustering and exploits a suitable form of the PageRank algorithm. This is very much in the spirit of the original PageRank utilizing the hyperlink structure to learn such probabilities. As opposed to existing approaches for ranking multi -attribute data, our method is not dependent on tuning of critical user-specified parameters. Experiments show the benefits of the proposed method.
本文采用一种新颖的方法,从数据本身的固有结构中学习多属性数据排序的马尔可夫链转移概率。该方法受到共识聚类的启发,并利用了PageRank算法的一种合适形式。这非常符合原始PageRank利用超链接结构来学习这种概率的精神。与现有的多属性数据排序方法相反,我们的方法不依赖于用户指定的关键参数的调优。实验证明了该方法的有效性。
{"title":"Ranking Using Transition Probabilities Learned from Multi-Attribute Data","authors":"Sigurd Løkse, R. Jenssen","doi":"10.1109/ICASSP.2018.8462132","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462132","url":null,"abstract":"In this paper, as a novel approach, we learn Markov chain transition probabilities for ranking of multi -attribute data from the inherent structures in the data itself. The procedure is inspired by consensus clustering and exploits a suitable form of the PageRank algorithm. This is very much in the spirit of the original PageRank utilizing the hyperlink structure to learn such probabilities. As opposed to existing approaches for ranking multi -attribute data, our method is not dependent on tuning of critical user-specified parameters. Experiments show the benefits of the proposed method.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"2851-2855"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88983114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthesis of Images by Two-Stage Generative Adversarial Networks 基于两阶段生成对抗网络的图像合成
Qiang Huang, P. Jackson, Mark D. Plumbley, Wenwu Wang
In this paper, we propose a divide-and-conquer approach using two generative adversarial networks (GANs) to explore how a machine can draw colorful pictures (bird) using a small amount of training data. In our work, we simulate the procedure of an artist drawing a picture, where one begins with drawing objects' contours and edges and then paints them different colors. We adopt two GAN models to process basic visual features including shape, texture and color. We use the first GAN model to generate object shape, and then paint the black and white image based on the knowledge learned using the second GAN model. We run our experiments on 600 color images. The experimental results show that the use of our approach can generate good quality synthetic images, comparable to real ones.
在本文中,我们提出了一种分而治之的方法,使用两个生成对抗网络(gan)来探索机器如何使用少量训练数据绘制彩色图片(鸟)。在我们的工作中,我们模拟艺术家绘画的过程,首先画出物体的轮廓和边缘,然后涂上不同的颜色。我们采用两种GAN模型来处理基本的视觉特征,包括形状、纹理和颜色。我们使用第一个GAN模型生成物体形状,然后根据使用第二个GAN模型学习的知识绘制黑白图像。我们在600张彩色图像上进行实验。实验结果表明,使用我们的方法可以生成与真实图像相当的高质量合成图像。
{"title":"Synthesis of Images by Two-Stage Generative Adversarial Networks","authors":"Qiang Huang, P. Jackson, Mark D. Plumbley, Wenwu Wang","doi":"10.1109/ICASSP.2018.8461984","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461984","url":null,"abstract":"In this paper, we propose a divide-and-conquer approach using two generative adversarial networks (GANs) to explore how a machine can draw colorful pictures (bird) using a small amount of training data. In our work, we simulate the procedure of an artist drawing a picture, where one begins with drawing objects' contours and edges and then paints them different colors. We adopt two GAN models to process basic visual features including shape, texture and color. We use the first GAN model to generate object shape, and then paint the black and white image based on the knowledge learned using the second GAN model. We run our experiments on 600 color images. The experimental results show that the use of our approach can generate good quality synthetic images, comparable to real ones.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"1593-1597"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77001369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pulse-Stream Models in Time-of-Flight Imaging 飞行时间成像中的脉冲流模型
Adrien Besson, Dimitris Perdios, Y. Wiaux, J. Thiran
This paper considers the problem of reconstructing raw signals from random projections in the context of time-of-flight imaging with an array of sensors. It presents a new signal model, coined as multi-channel pulse-stream model, which exploits pulse-stream models and accounts for additional structure induced by inter-sensor dependencies. We propose a sampling theorem and a reconstruction algorithm, based on ℓ -minimization, for signals belonging to such a model. We demonstrate the benefits of the proposed approach by means of numerical simulations and on a real non-destructive-evaluation application where the peak-signal-to-noise-ratio is increased by 3 dB compared to standard compressed-sensing strategies.
本文研究了一组传感器在飞行时间成像中从随机投影中重建原始信号的问题。它提出了一种新的信号模型,称为多通道脉冲流模型,该模型利用脉冲流模型并考虑了由传感器间依赖引起的附加结构。对于属于这种模型的信号,我们提出了一个采样定理和基于最小化的重构算法。我们通过数值模拟和真实的无损评估应用证明了所提出方法的优点,其中与标准压缩感知策略相比,峰值信噪比增加了3 dB。
{"title":"Pulse-Stream Models in Time-of-Flight Imaging","authors":"Adrien Besson, Dimitris Perdios, Y. Wiaux, J. Thiran","doi":"10.1109/ICASSP.2018.8461767","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461767","url":null,"abstract":"This paper considers the problem of reconstructing raw signals from random projections in the context of time-of-flight imaging with an array of sensors. It presents a new signal model, coined as multi-channel pulse-stream model, which exploits pulse-stream models and accounts for additional structure induced by inter-sensor dependencies. We propose a sampling theorem and a reconstruction algorithm, based on ℓ -minimization, for signals belonging to such a model. We demonstrate the benefits of the proposed approach by means of numerical simulations and on a real non-destructive-evaluation application where the peak-signal-to-noise-ratio is increased by 3 dB compared to standard compressed-sensing strategies.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"3389-3393"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78417803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Emg Acquisition and Hand Pose Classification for Bionic Hands from Randomly-Placed Sensors 随机传感器仿生手的肌电信号采集与手部姿势分类
Sumit A. Raurale, J. McAllister, J. M. D. Rincón
This paper presents a unique real-time motion recognition system for Electromyographic (EMG) signal acquisition and classification. It is the first approach which can classify hand poses from multi-channel EMG signals gathered from randomly placed arm sensors as accurately as current placed-sensor EMG acquisition approaches. It combines time-domain feature extraction, Linear Discriminant Analysis (LDA) feature projection and Multilayer Perceptron (MLP) classification to allow nine distinct poses to be correctly identified more than 95% of the time. This is comparable to state-of-the-art placed-sensor EMG acquisition systems. Processing times of 11.70 ms also make this a viable candidate approach for real-time EMG acquisition and processing in practical prosthesis applications.
提出了一种独特的用于肌电信号采集和分类的实时运动识别系统。这是第一个可以从随机放置的手臂传感器收集的多通道肌电信号中准确地分类手部姿势的方法。它结合了时域特征提取、线性判别分析(LDA)特征投影和多层感知器(MLP)分类,允许超过95%的时间正确识别9种不同的姿势。这可与最先进的放置式传感器肌电信号采集系统相媲美。11.70 ms的处理时间也使其成为在实际假肢应用中实时肌电采集和处理的可行候选方法。
{"title":"Emg Acquisition and Hand Pose Classification for Bionic Hands from Randomly-Placed Sensors","authors":"Sumit A. Raurale, J. McAllister, J. M. D. Rincón","doi":"10.1109/ICASSP.2018.8462409","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462409","url":null,"abstract":"This paper presents a unique real-time motion recognition system for Electromyographic (EMG) signal acquisition and classification. It is the first approach which can classify hand poses from multi-channel EMG signals gathered from randomly placed arm sensors as accurately as current placed-sensor EMG acquisition approaches. It combines time-domain feature extraction, Linear Discriminant Analysis (LDA) feature projection and Multilayer Perceptron (MLP) classification to allow nine distinct poses to be correctly identified more than 95% of the time. This is comparable to state-of-the-art placed-sensor EMG acquisition systems. Processing times of 11.70 ms also make this a viable candidate approach for real-time EMG acquisition and processing in practical prosthesis applications.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"1105-1109"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86457749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Statistical T+2d Subband Modelling for Crowd Counting 人群计数的统计T+2d子带建模
Deepayan Bhowmik, A. Wallace
Counting people automatically in a crowded scenario is important to assess safety and to determine behaviour in surveillance operations. In this paper we propose a new algorithm using the statistics of the spatio-temporal wavelet subbands. A t+2D lifting based wavelet transform is exploited to generate a motion saliency map which is then used to extract novel parametric statistical texture features. We compare our approach to existing crowd counting approaches and show improvement on standard benchmark sequences, demonstrating the robustness of the extracted features.
在拥挤的情况下自动计算人数对于评估安全性和确定监视行动中的行为非常重要。本文提出了一种利用时空小波子带统计量的新算法。利用t+2D提升小波变换生成运动显著性图,提取新的参数统计纹理特征。我们将我们的方法与现有的人群计数方法进行了比较,并展示了对标准基准序列的改进,证明了提取特征的鲁棒性。
{"title":"Statistical T+2d Subband Modelling for Crowd Counting","authors":"Deepayan Bhowmik, A. Wallace","doi":"10.1109/ICASSP.2018.8462345","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462345","url":null,"abstract":"Counting people automatically in a crowded scenario is important to assess safety and to determine behaviour in surveillance operations. In this paper we propose a new algorithm using the statistics of the spatio-temporal wavelet subbands. A t+2D lifting based wavelet transform is exploited to generate a motion saliency map which is then used to extract novel parametric statistical texture features. We compare our approach to existing crowd counting approaches and show improvement on standard benchmark sequences, demonstrating the robustness of the extracted features.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"1533-1537"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78030925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inexact Proximal Operators for $ell_{p}$-Quasinorm Minimization $ell_{p}$-拟信息最小化的不精确近邻算子
Cian O'Brien, Mark D. Plumbley
Proximal methods are an important tool in signal processing applications, where many problems can be characterized by the minimization of an expression involving a smooth fitting term and a convex regularization term - for example the classic $ell_{1}$ -Lasso. Such problems can be solved using the relevant proximal operator. Here we consider the use of proximal operators for the $ell_{p}$ -quasinorm where $0leq pleq 1$. Rather than seek a closed form solution, we develop an iterative algorithm using a Majorization-Minimization procedure which results in an inexact operator. Experiments on image denoising show that for $pleq 1$ the algorithm is effective in the high-noise scenario, outperforming the Lasso despite the inexactness of the proximal step.
近端方法是信号处理应用中的一个重要工具,其中许多问题可以通过包含平滑拟合项和凸正则化项的表达式的最小化来表征-例如经典的$ell_{1}$ - lasso。这些问题可以使用相关的近端算子来解决。在这里,我们考虑使用近端算子的$ell_{p}$ -拟规范,其中$0leq pleq 1$。而不是寻求封闭形式的解决方案,我们开发了一个迭代算法,使用最大化-最小化过程,导致不精确的算子。图像去噪实验表明,对于$pleq 1$,该算法在高噪声场景下是有效的,尽管近端步骤不精确,但性能优于Lasso。
{"title":"Inexact Proximal Operators for $ell_{p}$-Quasinorm Minimization","authors":"Cian O'Brien, Mark D. Plumbley","doi":"10.1109/ICASSP.2018.8462524","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462524","url":null,"abstract":"Proximal methods are an important tool in signal processing applications, where many problems can be characterized by the minimization of an expression involving a smooth fitting term and a convex regularization term - for example the classic $ell_{1}$ -Lasso. Such problems can be solved using the relevant proximal operator. Here we consider the use of proximal operators for the $ell_{p}$ -quasinorm where $0leq pleq 1$. Rather than seek a closed form solution, we develop an iterative algorithm using a Majorization-Minimization procedure which results in an inexact operator. Experiments on image denoising show that for $pleq 1$ the algorithm is effective in the high-noise scenario, outperforming the Lasso despite the inexactness of the proximal step.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"4724-4728"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81441379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Segment Clustering for Real-Time Exemplar-Based Speech Enhancement 基于实例的实时语音增强的语音片段聚类
David Nesbitt, D. Crookes, J. Ming
Exemplar-based (or Corpus-based) speech enhancement algorithms have great potential but are typically slow due to needing to search through the entire corpus. The properties of speech can be exploited to improve these algorithms. Firstly, a corpus can be clustered by a phonetic ordering into a search tree which can be used to find a best matching segment. This dramatically reduces the search space, reducing the time complexity of searching a corpus of $n$ segments from O(n) to O(log(n)). Secondly, clustering can be used to give a lossy compression of a speech corpus by replacing original segments with codewords. These techniques are shown in comparison with sequential search and non-compressed corpora using a simple speech enhancement algorithm. A combination of these techniques for a corpus of a quarter of WSJO results in a speedup of approximately 3000x.
基于范例(或基于语料库)的语音增强算法具有很大的潜力,但由于需要搜索整个语料库,通常速度很慢。可以利用语音的特性来改进这些算法。首先,将语料库按语音排序聚类成搜索树,通过搜索树找到最佳匹配词段;这极大地减少了搜索空间,将搜索$n$段的语料库的时间复杂度从O(n)降低到O(log(n))。其次,聚类可以通过用码字替换原始片段来对语音语料库进行有损压缩。使用简单的语音增强算法将这些技术与顺序搜索和非压缩语料库进行比较。对于四分之一的WSJO语料库,这些技术的组合将导致大约3000倍的加速。
{"title":"Speech Segment Clustering for Real-Time Exemplar-Based Speech Enhancement","authors":"David Nesbitt, D. Crookes, J. Ming","doi":"10.1109/ICASSP.2018.8461689","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461689","url":null,"abstract":"Exemplar-based (or Corpus-based) speech enhancement algorithms have great potential but are typically slow due to needing to search through the entire corpus. The properties of speech can be exploited to improve these algorithms. Firstly, a corpus can be clustered by a phonetic ordering into a search tree which can be used to find a best matching segment. This dramatically reduces the search space, reducing the time complexity of searching a corpus of $n$ segments from O(n) to O(log(n)). Secondly, clustering can be used to give a lossy compression of a speech corpus by replacing original segments with codewords. These techniques are shown in comparison with sequential search and non-compressed corpora using a simple speech enhancement algorithm. A combination of these techniques for a corpus of a quarter of WSJO results in a speedup of approximately 3000x.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 1","pages":"5419-5423"},"PeriodicalIF":0.0,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91480521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1