首页 > 最新文献

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Improvements to filterbank and delta learning within a deep neural network framework 在深度神经网络框架下对滤波器组和增量学习的改进
Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran
Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.
语音识别任务中使用的许多特征都是手工制作的,并不总是与手边的目标相关,即最小化单词错误率。最近,我们表明,用与深度神经网络的其余部分共同学习的滤波器库层取代感知激励的mel-filter bank是有希望的。在本文中,我们将滤波器学习扩展到一个自适应的、最先进的系统。首先,我们将增量学习合并到过滤器学习框架中。其次,我们结合了各种说话人自适应技术,包括VTLN翘曲和说话人身份特征。在一个50小时的英语广播新闻任务中,我们表明,与使用一组固定的过滤器和delta相比,使用过滤器和delta学习可以在单词错误率(WER)方面实现5%的相对改进。此外,在演讲者适应之后,我们发现与最先进的CNN相比,过滤器和delta学习允许在WER中相对提高3%。
{"title":"Improvements to filterbank and delta learning within a deep neural network framework","authors":"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, G. Saon, B. Ramabhadran","doi":"10.1109/ICASSP.2014.6854925","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854925","url":null,"abstract":"Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 1","pages":"6839-6843"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75775499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Frequency-shift filtering for OFDM recovery in narrowband power line communications 窄带电力线通信中OFDM恢复的移频滤波
Nir Shlezinger, R. Dabora
Power line communications (PLC) has been drawing considerable interest in recent years due to the growing interest in smart grid implementation. In smart grids, network control and grid applications are allocated the frequency band of 0-500 kHz, commonly referred to as the narrowband PLC channel. This channel is characterized by strong periodic noise and low signal to noise ratio (SNR). In this work we propose a receiver which uses frequency shift filtering to exploit the cyclostationary properties of both the narrowband PLC noise, as well as the information signal, digitally modulated using orthogonal frequency division multiplexing. The results show that the new receiver obtains a substantial performance gain over previously proposed receivers, without requiring any coordination with the transmitter.
近年来,由于对智能电网实施的兴趣日益浓厚,电力线通信(PLC)引起了相当大的兴趣。在智能电网中,网络控制和电网应用被分配到0-500 kHz的频段,通常称为窄带PLC信道。该信道具有周期性噪声强、信噪比低的特点。在这项工作中,我们提出了一种使用移频滤波来利用窄带PLC噪声和信息信号的周期平稳特性的接收器,使用正交频分复用进行数字调制。结果表明,与先前提出的接收机相比,新接收机获得了实质性的性能增益,而无需与发射机进行任何协调。
{"title":"Frequency-shift filtering for OFDM recovery in narrowband power line communications","authors":"Nir Shlezinger, R. Dabora","doi":"10.1109/ICASSP.2014.6855173","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855173","url":null,"abstract":"Power line communications (PLC) has been drawing considerable interest in recent years due to the growing interest in smart grid implementation. In smart grids, network control and grid applications are allocated the frequency band of 0-500 kHz, commonly referred to as the narrowband PLC channel. This channel is characterized by strong periodic noise and low signal to noise ratio (SNR). In this work we propose a receiver which uses frequency shift filtering to exploit the cyclostationary properties of both the narrowband PLC noise, as well as the information signal, digitally modulated using orthogonal frequency division multiplexing. The results show that the new receiver obtains a substantial performance gain over previously proposed receivers, without requiring any coordination with the transmitter.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"8073-8077"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75794418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reconstruction of sparse signals from highly corrupted measurements by nonconvex minimization 用非凸极小化方法重建高度损坏测量的稀疏信号
Marko Filipovic
We propose a method for signal recovery in compressed sensing when measurements can be highly corrupted. It is based on ℓp minimization for 0 <; p ≤ 1. Since it was shown that ℓp minimization performs better than ℓ1 minimization when there are no large errors, the proposed approach is a natural extension to compressed sensing with corruptions. We provide a theoretical justification of this idea, based on analogous reasoning as in the case when measurements are not corrupted by large errors. Better performance of the proposed approach compared to ℓ1 minimization is illustrated in numerical experiments.
我们提出了一种压缩感知中测量数据严重损坏时的信号恢复方法。在不存在较大误差的情况下,该方法的性能优于1最小化,是对带损坏的压缩感知的自然扩展。我们提供了一个理论的理由,基于类似的推理,当测量没有被大误差损坏的情况下。数值实验表明,该方法比最小化方法具有更好的性能。
{"title":"Reconstruction of sparse signals from highly corrupted measurements by nonconvex minimization","authors":"Marko Filipovic","doi":"10.1109/ICASSP.2014.6854230","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854230","url":null,"abstract":"We propose a method for signal recovery in compressed sensing when measurements can be highly corrupted. It is based on ℓ<sub>p</sub> minimization for 0 <; p ≤ 1. Since it was shown that ℓ<sub>p</sub> minimization performs better than ℓ<sub>1</sub> minimization when there are no large errors, the proposed approach is a natural extension to compressed sensing with corruptions. We provide a theoretical justification of this idea, based on analogous reasoning as in the case when measurements are not corrupted by large errors. Better performance of the proposed approach compared to ℓ<sub>1</sub> minimization is illustrated in numerical experiments.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"3395-3399"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74659349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A computationally efficient calibration algorithm for the LOFAR radio astronomical array 一种计算效率高的LOFAR射电天文阵列标定算法
Yuntao Wu, Amir Leshem, S. Wijnholds
In this paper, the problem of self-calibration for large astronomical arrays such as the Dutch Low Frequency Array (LOFAR) is considered. We assume direction dependent gain and phase errors which need to be estimated and calibrated out. Combining the subspace fitting and least square approaches, the signal subspace of the received single short-term interval (STI) sample data of the LOFAR is used to build a cost function whose minimizer is a statistically efficient estimator of the unknown parameters-the gains and phases of the telescopes. Subsequently, an iterative algorithm for finding the minimum of the cost function is presented and the unknown calibration parameters of both the core stations and the external subarray are separated. As a result, the computational complexity of the proposed method is significantly reduced compared to the existing methods based on a direct covariance fitting. Finally, the performance of the proposed method is compared with the conventional peeling method in computer simulation. An example for calibrating the core of the LOFAR array on Cyg A is also provided.
本文研究了荷兰低频阵列(LOFAR)等大型天文阵列的自标定问题。我们假设方向相关的增益和相位误差需要估计和校准。结合子空间拟合和最小二乘方法,利用接收到的单短期间隔(STI)样本数据的信号子空间构建成本函数,该函数的最小值是望远镜增益和相位未知参数的统计有效估计。随后,提出了一种求代价函数最小值的迭代算法,分离了核心站和外部子阵的未知标定参数。结果表明,与现有基于直接协方差拟合的方法相比,该方法的计算复杂度显著降低。最后,在计算机仿真中比较了该方法与传统剥离方法的性能。给出了在Cyg A上校准LOFAR阵列核心的实例。
{"title":"A computationally efficient calibration algorithm for the LOFAR radio astronomical array","authors":"Yuntao Wu, Amir Leshem, S. Wijnholds","doi":"10.1109/ICASSP.2014.6854635","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854635","url":null,"abstract":"In this paper, the problem of self-calibration for large astronomical arrays such as the Dutch Low Frequency Array (LOFAR) is considered. We assume direction dependent gain and phase errors which need to be estimated and calibrated out. Combining the subspace fitting and least square approaches, the signal subspace of the received single short-term interval (STI) sample data of the LOFAR is used to build a cost function whose minimizer is a statistically efficient estimator of the unknown parameters-the gains and phases of the telescopes. Subsequently, an iterative algorithm for finding the minimum of the cost function is presented and the unknown calibration parameters of both the core stations and the external subarray are separated. As a result, the computational complexity of the proposed method is significantly reduced compared to the existing methods based on a direct covariance fitting. Finally, the performance of the proposed method is compared with the conventional peeling method in computer simulation. An example for calibrating the core of the LOFAR array on Cyg A is also provided.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"5402-5406"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74875531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Retrieving the syntactic structure of erroneous ASR transcriptions for open-domain Spoken Language Understanding 开放域口语理解中错误ASR转录的句法结构检索
Frédéric Béchet, Benoit Favre, Alexis Nasr, Mathieu Morey
Retrieving the syntactic structure of erroneous ASR transcriptions can be of great interest for open-domain Spoken Language Understanding tasks in order to correct or at least reduce the impact of ASR errors on final applications. Most of the previous works on ASR and syntactic parsing have addressed this problem by using syntactic features during ASR to help reducing Word Error Rate (WER). The improvement obtained is often rather small, however the structure and the relations between words obtained through parsing can be of great interest for the SLU processes, even without a significant decrease of WER. That is why we adopt another point of view in this paper: considering that ASR transcriptions contain inevitably some errors, we show in this study that it is possible to improve the syntactic analysis of these erroneous transcriptions by performing a joint error detection / syntactic parsing process. The applicative framework used in this study is a speech-to-speech system developed through the DARPA BOLT project.
为了纠正或至少减少ASR错误对最终应用的影响,检索错误ASR转录的语法结构对于开放域口语理解任务非常有意义。以前关于自动语音识别和句法分析的大部分工作都是通过在自动语音识别过程中使用句法特征来帮助降低单词错误率来解决这个问题的。所获得的改进通常是相当小的,但是通过解析获得的结构和单词之间的关系对于SLU进程来说是非常有趣的,即使没有显著降低WER。因此,我们在本文中采取了另一种观点:考虑到ASR转录本不可避免地包含一些错误,我们在本研究中表明,通过执行联合错误检测/语法解析过程,可以改善这些错误转录本的语法分析。本研究中使用的应用框架是通过DARPA BOLT项目开发的语音对语音系统。
{"title":"Retrieving the syntactic structure of erroneous ASR transcriptions for open-domain Spoken Language Understanding","authors":"Frédéric Béchet, Benoit Favre, Alexis Nasr, Mathieu Morey","doi":"10.1109/ICASSP.2014.6854372","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854372","url":null,"abstract":"Retrieving the syntactic structure of erroneous ASR transcriptions can be of great interest for open-domain Spoken Language Understanding tasks in order to correct or at least reduce the impact of ASR errors on final applications. Most of the previous works on ASR and syntactic parsing have addressed this problem by using syntactic features during ASR to help reducing Word Error Rate (WER). The improvement obtained is often rather small, however the structure and the relations between words obtained through parsing can be of great interest for the SLU processes, even without a significant decrease of WER. That is why we adopt another point of view in this paper: considering that ASR transcriptions contain inevitably some errors, we show in this study that it is possible to improve the syntactic analysis of these erroneous transcriptions by performing a joint error detection / syntactic parsing process. The applicative framework used in this study is a speech-to-speech system developed through the DARPA BOLT project.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"4097-4101"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73007246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Investigation of unsupervised adaptation of DNN acoustic models with filter bank input 带滤波器组输入的DNN声学模型的无监督自适应研究
Takuya Yoshioka, A. Ragni, M. Gales
Adaptation to speaker variations is an essential component of speech recognition systems. One common approach to adapting deep neural network (DNN) acoustic models is to perform global constrained maximum likelihood linear regression (CMLLR) at some point of the systems. Using CMLLR (or more generally, generative approaches) is advantageous especially in unsupervised adaptation scenarios with high baseline error rates. On the other hand, as the DNNs are less sensitive to the increase in the input dimensionality than GMMs, it is becoming more popular to use rich speech representations, such as log mel-filter bank channel outputs, instead of conventional low-dimensional feature vectors, such as MFCCs and PLP coefficients. This work discusses and compares three different configurations of DNN acoustic models that allow CMLLR-based speaker adaptive training (SAT) to be performed in systems with filter bank inputs. Results of unsupervised adaptation experiments conducted on three different data sets are presented, demonstrating that, by choosing an appropriate configuration, SAT with CMLLR can improve the performance of a well-trained filter bank-based speaker independent DNN system by 10.6% relative in a challenging task with a baseline error rate above 40%. It is also shown that the filter bank features are advantageous than the conventional features even when they are used with SAT models. Some other insights are also presented, including the effects of block diagonal transforms and system combination.
对说话人变化的适应是语音识别系统的重要组成部分。自适应深度神经网络(DNN)声学模型的一种常用方法是在系统的某个点执行全局约束最大似然线性回归(CMLLR)。使用cmlr(或者更一般地说,生成方法)是有利的,特别是在具有高基线错误率的无监督适应场景中。另一方面,由于dnn对输入维数的增加不像gmm那么敏感,因此使用丰富的语音表示(如对数梅尔滤波器组通道输出)而不是传统的低维特征向量(如mfccc和PLP系数)变得越来越流行。本研究讨论并比较了DNN声学模型的三种不同配置,这些模型允许基于cmlr的扬声器自适应训练(SAT)在具有滤波器组输入的系统中执行。在三个不同的数据集上进行的无监督自适应实验结果表明,通过选择适当的配置,SAT与cmlr可以将训练良好的基于滤波器组的说话人独立DNN系统的性能提高10.6%,相对于基线错误率在40%以上的挑战性任务。结果还表明,即使与SAT模型一起使用,滤波器组特征也比传统特征更有优势。本文还提出了一些其他的见解,包括块对角变换和系统组合的影响。
{"title":"Investigation of unsupervised adaptation of DNN acoustic models with filter bank input","authors":"Takuya Yoshioka, A. Ragni, M. Gales","doi":"10.1109/ICASSP.2014.6854825","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854825","url":null,"abstract":"Adaptation to speaker variations is an essential component of speech recognition systems. One common approach to adapting deep neural network (DNN) acoustic models is to perform global constrained maximum likelihood linear regression (CMLLR) at some point of the systems. Using CMLLR (or more generally, generative approaches) is advantageous especially in unsupervised adaptation scenarios with high baseline error rates. On the other hand, as the DNNs are less sensitive to the increase in the input dimensionality than GMMs, it is becoming more popular to use rich speech representations, such as log mel-filter bank channel outputs, instead of conventional low-dimensional feature vectors, such as MFCCs and PLP coefficients. This work discusses and compares three different configurations of DNN acoustic models that allow CMLLR-based speaker adaptive training (SAT) to be performed in systems with filter bank inputs. Results of unsupervised adaptation experiments conducted on three different data sets are presented, demonstrating that, by choosing an appropriate configuration, SAT with CMLLR can improve the performance of a well-trained filter bank-based speaker independent DNN system by 10.6% relative in a challenging task with a baseline error rate above 40%. It is also shown that the filter bank features are advantageous than the conventional features even when they are used with SAT models. Some other insights are also presented, including the effects of block diagonal transforms and system combination.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"6344-6348"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73112069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Improved low-delay MDCT-based coding of both stationary and transient audio signals 改进的低延迟mdct编码的平稳和瞬态音频信号
Christian R. Helmrich, Goran Markovic, B. Edler
General-purpose MDCT-based audio coders like MP3 or HE-AAC utilize long inter-transform overlap and lookahead-based transform length switching to provide good coding quality for both stationary and non-stationary, i. e. transient, input signals even at low bitrates. In low-delay communication scenarios such as Voice over IP, however, algorithmic delay due to framing and overlap typically needs to be reduced and additional lookahead must be avoided. We show that these restrictions limit the performance of contemporary low-delay transform coders on either stationary or transient material and propose 3 modifications: an improved noise substitution technique and increased overlap between “long”transforms for stationary, and “long to short” transform length switching without lookahead and directly from the long overlap for transient frames. A listening test indicates the merit of these changes when integrated into AAC-LD.
通用的基于mdct的音频编码器,如MP3或HE-AAC,利用长变换间重叠和基于前导的变换长度切换,即使在低比特率下,也能为平稳和非平稳(即瞬态)输入信号提供良好的编码质量。然而,在IP语音等低延迟通信场景中,通常需要减少因分帧和重叠引起的算法延迟,并且必须避免额外的前瞻性。我们表明,这些限制限制了当代低延迟变换编码器在静止或瞬态材料上的性能,并提出了3个修改:改进的噪声替代技术和增加静止“长”变换之间的重叠,以及“长到短”变换长度切换,没有前瞻性,直接从瞬态帧的长重叠。听力测试表明,当集成到AAC-LD时,这些变化的优点。
{"title":"Improved low-delay MDCT-based coding of both stationary and transient audio signals","authors":"Christian R. Helmrich, Goran Markovic, B. Edler","doi":"10.1109/ICASSP.2014.6854948","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854948","url":null,"abstract":"General-purpose MDCT-based audio coders like MP3 or HE-AAC utilize long inter-transform overlap and lookahead-based transform length switching to provide good coding quality for both stationary and non-stationary, i. e. transient, input signals even at low bitrates. In low-delay communication scenarios such as Voice over IP, however, algorithmic delay due to framing and overlap typically needs to be reduced and additional lookahead must be avoided. We show that these restrictions limit the performance of contemporary low-delay transform coders on either stationary or transient material and propose 3 modifications: an improved noise substitution technique and increased overlap between “long”transforms for stationary, and “long to short” transform length switching without lookahead and directly from the long overlap for transient frames. A listening test indicates the merit of these changes when integrated into AAC-LD.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"6954-6958"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75314659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
New bivariate statistical model of natural image correlations 自然图像相关性的新二元统计模型
Che-Chun Su, L. Cormack, A. Bovik
We perform bivariate statistical analysis and modeling of the joint distributions of spatially adjacent sub-band responses for both luminance/chrominance and range data in natural scenes. In particular, we introduce a multivariate generalized Gaussian distribution and an exponentiated sine function to model the underlying statistics and correlations. The experimental results show that the bivariate statistics relating spatially adjacent pixels in both 2D color images and range maps are well described by the proposed models. We validate the robustness of the proposed bivariate models using a multi-variate statistical hypothesis test, and further demonstrate their effectiveness with application to a prototype depth estimation algorithm.
我们对自然场景中亮度/色度和距离数据的空间相邻子带响应联合分布进行了二元统计分析和建模。特别地,我们引入了一个多元广义高斯分布和一个指数正弦函数来模拟潜在的统计和相关性。实验结果表明,该模型能够很好地描述二维彩色图像和距离图中空间相邻像素的二元统计量。我们使用多变量统计假设检验验证了所提出的二元模型的鲁棒性,并通过应用于原型深度估计算法进一步证明了其有效性。
{"title":"New bivariate statistical model of natural image correlations","authors":"Che-Chun Su, L. Cormack, A. Bovik","doi":"10.1109/ICASSP.2014.6854627","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854627","url":null,"abstract":"We perform bivariate statistical analysis and modeling of the joint distributions of spatially adjacent sub-band responses for both luminance/chrominance and range data in natural scenes. In particular, we introduce a multivariate generalized Gaussian distribution and an exponentiated sine function to model the underlying statistics and correlations. The experimental results show that the bivariate statistics relating spatially adjacent pixels in both 2D color images and range maps are well described by the proposed models. We validate the robustness of the proposed bivariate models using a multi-variate statistical hypothesis test, and further demonstrate their effectiveness with application to a prototype depth estimation algorithm.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"70 8 1","pages":"5362-5366"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75566605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Selective analytic signal construction from a non-uniformly sampled bandpass signal 非均匀采样带通信号的选择性分析信号构建
Jean-Adrien Vernhes, M. Chabert, B. Lacaze, G. Lesthievent, R. Baudin
This paper proposes a method that simultaneously builds the analytic signal from non-uniform samples of a bandpass signal and rejects interferences. The analytic signal is required for many onboard operations in communication satellites. This method operates in the time domain and without preliminary demodulation, using Periodic Non-uniform Sampling of order 2 (PNS2). This non-uniform sampling scheme can be easily implemented with available devices. Exact formulas for the analytic signal construction are derived for an infinite observation window (an infinite number of samples). For practical applications, the formulas should also demonstrate a high convergence rate due to the finite observation window. Formulas with increasing convergence rates are thus derived. The proposed method has been tested through simulations according to the number of available samples, the interference parameters and the filter transfer function regularity.
本文提出了一种从带通信号的非均匀采样中同时构建解析信号并抑制干扰的方法。通信卫星的许多星载操作都需要分析信号。该方法采用2阶周期性非均匀采样(PNS2),在时域内运行,无需预解调。这种非均匀采样方案可以很容易地用现有的设备实现。导出了无限观测窗口(无限个样本)下解析信号构造的精确公式。在实际应用中,由于观测窗口有限,公式还应具有较高的收敛率。从而推导出收敛速率增大的公式。根据可用样本数、干扰参数和滤波器传递函数的规律性,对该方法进行了仿真验证。
{"title":"Selective analytic signal construction from a non-uniformly sampled bandpass signal","authors":"Jean-Adrien Vernhes, M. Chabert, B. Lacaze, G. Lesthievent, R. Baudin","doi":"10.1109/ICASSP.2014.6854549","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854549","url":null,"abstract":"This paper proposes a method that simultaneously builds the analytic signal from non-uniform samples of a bandpass signal and rejects interferences. The analytic signal is required for many onboard operations in communication satellites. This method operates in the time domain and without preliminary demodulation, using Periodic Non-uniform Sampling of order 2 (PNS2). This non-uniform sampling scheme can be easily implemented with available devices. Exact formulas for the analytic signal construction are derived for an infinite observation window (an infinite number of samples). For practical applications, the formulas should also demonstrate a high convergence rate due to the finite observation window. Formulas with increasing convergence rates are thus derived. The proposed method has been tested through simulations according to the number of available samples, the interference parameters and the filter transfer function regularity.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 1","pages":"4978-4982"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75580492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Lossless/near-lossless color image coding by inverse demosaicing 基于反去马赛克的无损/近无损彩色图像编码
Ryo Kuroiwa, Ryo Matsuoka, Seisuke Kyochi, K. Shirai, M. Okuda
In this paper, we introduce a novel framework for lossless/near-lossless (LS/NLS) color image coding assisted by an inverse demosaicing. Conventional frameworks are typically based on prediction (and quantization for NLS coding) followed by entropy coding, such as the JPEG-LS for bit rate saving. The approach of this work is totally different from the conventional ones. Basically, color images are created by demosaicing Bayer-pattern color filter array (CFA) whose operator can be expressed as square matrices. By using the (pseudo) inverse matrix of a joint demosaicing and color-to-gray conversion, the proposed decoder can recover the color image from its corresponding gray image data which is losslessly transmitted by the proposed encoder. Thus, LS/NLS color image reconstruction can be achieved while saving a bit rate significantly. In addition, using the same framework of color image coding, LS/NLS CFA coding can be realized by a comparable bit rate with JPEG-LS.
本文提出了一种基于反去马赛克的彩色图像无损/近无损(LS/NLS)编码框架。传统框架通常基于预测(对于NLS编码是量化的),然后是熵编码,例如用于比特率节省的JPEG-LS。这项工作的方法与传统的完全不同。基本上,彩色图像是通过拜耳模式彩色滤波阵列(CFA)去马赛克生成的,其运算符可以表示为方阵。该解码器通过联合反马赛克和色灰转换的伪逆矩阵,可以从相应的灰度图像数据中恢复彩色图像,该图像数据由编码器无损传输。因此,可以实现LS/NLS彩色图像重建,同时显著节省比特率。此外,使用相同的彩色图像编码框架,可以以与JPEG-LS相当的比特率实现LS/NLS CFA编码。
{"title":"Lossless/near-lossless color image coding by inverse demosaicing","authors":"Ryo Kuroiwa, Ryo Matsuoka, Seisuke Kyochi, K. Shirai, M. Okuda","doi":"10.1109/ICASSP.2014.6853951","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853951","url":null,"abstract":"In this paper, we introduce a novel framework for lossless/near-lossless (LS/NLS) color image coding assisted by an inverse demosaicing. Conventional frameworks are typically based on prediction (and quantization for NLS coding) followed by entropy coding, such as the JPEG-LS for bit rate saving. The approach of this work is totally different from the conventional ones. Basically, color images are created by demosaicing Bayer-pattern color filter array (CFA) whose operator can be expressed as square matrices. By using the (pseudo) inverse matrix of a joint demosaicing and color-to-gray conversion, the proposed decoder can recover the color image from its corresponding gray image data which is losslessly transmitted by the proposed encoder. Thus, LS/NLS color image reconstruction can be achieved while saving a bit rate significantly. In addition, using the same framework of color image coding, LS/NLS CFA coding can be realized by a comparable bit rate with JPEG-LS.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"34 1","pages":"2011-2014"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73720710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1