首页 > 最新文献

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)最新文献

英文 中文
Efficiency of real-time Gaussian transient detectors: comparing the Karhunen-Loeve and the wavelet decompositions 实时高斯瞬态检测器的效率:Karhunen-Loeve分解与小波分解的比较
Francisco M. Garcia, I. Lourtie
In general, finite-dimensional discrete-time representations of continuous-time Gaussian transients is not complete. Such representations typically lead to suboptimal detectors, where the compromise between computational complexity and processor performance requires optimization, specially when real-time processing is mandatory. This paper proposes a procedure for the optimization of the processor parameters, using the Bhattacharyya distance to evaluate the resemblance between the original continuous-time signal and its finite dimensional discrete representation. Two different decompositions are analyzed and compared, namely the Karhunen-Loeve decomposition (KLD) and the discrete wavelet transform (DWT). It is shown that the DWT presents serious advantages when the signals to detect have a large number of important eigenvalues, which is often the case in some applications such as passive sonar.
一般来说,连续高斯瞬态的有限维离散时间表示是不完整的。这种表示通常会导致次优检测器,其中计算复杂性和处理器性能之间的折衷需要优化,特别是在强制进行实时处理时。本文提出了一种优化处理器参数的方法,利用Bhattacharyya距离来评估原始连续时间信号与其有限维离散表示之间的相似性。对Karhunen-Loeve分解(KLD)和离散小波变换(DWT)两种不同的分解方法进行了分析和比较。结果表明,当待检测的信号具有大量重要特征值时,小波变换具有明显的优势,这在被动声纳等应用中是常见的情况。
{"title":"Efficiency of real-time Gaussian transient detectors: comparing the Karhunen-Loeve and the wavelet decompositions","authors":"Francisco M. Garcia, I. Lourtie","doi":"10.1109/ICASSP.2000.861941","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861941","url":null,"abstract":"In general, finite-dimensional discrete-time representations of continuous-time Gaussian transients is not complete. Such representations typically lead to suboptimal detectors, where the compromise between computational complexity and processor performance requires optimization, specially when real-time processing is mandatory. This paper proposes a procedure for the optimization of the processor parameters, using the Bhattacharyya distance to evaluate the resemblance between the original continuous-time signal and its finite dimensional discrete representation. Two different decompositions are analyzed and compared, namely the Karhunen-Loeve decomposition (KLD) and the discrete wavelet transform (DWT). It is shown that the DWT presents serious advantages when the signals to detect have a large number of important eigenvalues, which is often the case in some applications such as passive sonar.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116161952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Characterization of transient wandering tones by dynamic modeling of fractional-Fourier features 利用分数阶傅立叶特征的动态建模来表征瞬态漫游音调
P. Ainsleigh, N. Kehtarnavaz
A novel approach is presented for characterizing transient wandering tones. These signals are segmented and approximated as time series with piecewise linear instantaneous frequency and piecewise constant amplitude. Frequency rate, center frequency, and energy features are estimated in each segment of data using chirped autocorrelations and the fractional Fourier transform. These features are tracked across segments using linear dynamical models whose parameters are estimated using an expectation-maximization algorithm. A new cross-covariance estimator for adjacent states of the dynamical model is given. The feature extraction/tracking algorithm is used to characterize a measured marine-mammal vocalization. Application of the representation algorithm to signal classification is discussed.
提出了一种新的暂态漫游音表征方法。这些信号被分割并近似为具有分段线性瞬时频率和分段恒定振幅的时间序列。频率率,中心频率和能量特征估计在每段数据使用啁啾自相关和分数傅里叶变换。使用线性动态模型跨段跟踪这些特征,该模型的参数使用期望最大化算法进行估计。给出了动态模型邻态的一种新的交叉协方差估计量。特征提取/跟踪算法用于表征测量的海洋哺乳动物发声。讨论了表征算法在信号分类中的应用。
{"title":"Characterization of transient wandering tones by dynamic modeling of fractional-Fourier features","authors":"P. Ainsleigh, N. Kehtarnavaz","doi":"10.1109/ICASSP.2000.859047","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859047","url":null,"abstract":"A novel approach is presented for characterizing transient wandering tones. These signals are segmented and approximated as time series with piecewise linear instantaneous frequency and piecewise constant amplitude. Frequency rate, center frequency, and energy features are estimated in each segment of data using chirped autocorrelations and the fractional Fourier transform. These features are tracked across segments using linear dynamical models whose parameters are estimated using an expectation-maximization algorithm. A new cross-covariance estimator for adjacent states of the dynamical model is given. The feature extraction/tracking algorithm is used to characterize a measured marine-mammal vocalization. Application of the representation algorithm to signal classification is discussed.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"59 26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122570340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Joint source-channel MMSE-decoding of speech parameters 联合源信道mmse解码语音参数
S. Heinen, P. Vary
For speech transmission in digital land mobile telephony, effective compression algorithms have to be used to achieve a high bandwidth efficiency. Furthermore, a variety of adverse transmission effects make it necessary to employ powerful error control techniques to keep bit error rates tolerably low and thus to guarantee a high speech duality. Speech compression is designed to remove irrelevancy and redundancy from the speech signal. Yet measuring the statistical properties of speech parameters extracted by practical compression schemes shows that a considerable amount of redundancy still remains, either in terms of non-uniform distribution or due to time-correlation of parameters extracted from subsequent speech segments. In this contribution, we propose a new minimum mean square error (MMSE) decoder for block-oriented trellis codes, that is able to exploit the time-correlation of subsequent parameter sets. The decoder yields non-discrete speech parameter mean square (MS) estimates. Thus it combines two approaches to exploit residual redundancy: source controlled channel decoding (SCCD) (Hagenauer 1995) and soft bit source decoding (SBSD) (Fingscheidt and Vary 1997) in one algorithm.
数字陆地移动电话的语音传输必须采用有效的压缩算法,以达到较高的带宽效率。此外,各种不利的传输影响使得有必要采用强大的错误控制技术来保持误码率可容忍的低,从而保证高语音对偶性。语音压缩的目的是去除语音信号中的不相关和冗余。然而,测量实际压缩方案提取的语音参数的统计特性表明,由于从后续语音段提取的参数的不均匀分布或时间相关性,仍然存在相当数量的冗余。在这篇贡献中,我们提出了一种新的最小均方误差(MMSE)解码器,用于面向块的网格码,它能够利用后续参数集的时间相关性。解码器产生非离散语音参数均方(MS)估计。因此,它结合了两种方法来利用剩余冗余:源控制信道解码(SCCD) (Hagenauer 1995)和软位源解码(SBSD) (Fingscheidt和Vary 1997)在一个算法中。
{"title":"Joint source-channel MMSE-decoding of speech parameters","authors":"S. Heinen, P. Vary","doi":"10.1109/ICASSP.2000.861929","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861929","url":null,"abstract":"For speech transmission in digital land mobile telephony, effective compression algorithms have to be used to achieve a high bandwidth efficiency. Furthermore, a variety of adverse transmission effects make it necessary to employ powerful error control techniques to keep bit error rates tolerably low and thus to guarantee a high speech duality. Speech compression is designed to remove irrelevancy and redundancy from the speech signal. Yet measuring the statistical properties of speech parameters extracted by practical compression schemes shows that a considerable amount of redundancy still remains, either in terms of non-uniform distribution or due to time-correlation of parameters extracted from subsequent speech segments. In this contribution, we propose a new minimum mean square error (MMSE) decoder for block-oriented trellis codes, that is able to exploit the time-correlation of subsequent parameter sets. The decoder yields non-discrete speech parameter mean square (MS) estimates. Thus it combines two approaches to exploit residual redundancy: source controlled channel decoding (SCCD) (Hagenauer 1995) and soft bit source decoding (SBSD) (Fingscheidt and Vary 1997) in one algorithm.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122672035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Matched filter design for diphone subspace models diphone子空间模型的匹配滤波器设计
K. Reinhard, M. Niranjan
Considering the perceptual importance of phonetic transitions as minimal contextual variant units, this paper addresses the problem by modelling explicitly interphone dynamics covered in diphones. Subspace projections based on a time-constrained PCA (TC-PCA) are developed which focus on the temporal evolution. They reveal characteristic trajectories present in a low-dimensional spectral representation facilitating robust parameter estimation and simultaneously optimise the discriminant information. A matched filter design is applied to a multiple hypotheses rescoring scheme which enables operating in very low-dimensional parameter space. Using such multiple hypotheses paradigm the complementary information effectiveness of modelling explicitly inter-phone dynamics covered in diphones can be shown using the TIMIT database, resulting in improved phone error rates.
考虑到语音转换作为最小上下文变体单位的感知重要性,本文通过明确地建模diphone中覆盖的对讲机动态来解决这个问题。提出了一种基于时间约束PCA (TC-PCA)的子空间投影方法。它们揭示了低维谱表示中存在的特征轨迹,促进了鲁棒参数估计,同时优化了判别信息。将匹配滤波器设计应用于多假设评分方案,使其能够在非常低维的参数空间中运行。使用这种多重假设范式,可以使用TIMIT数据库显示diphone中明确建模的电话间动态的互补信息有效性,从而提高电话错误率。
{"title":"Matched filter design for diphone subspace models","authors":"K. Reinhard, M. Niranjan","doi":"10.1109/ICASSP.2000.860138","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860138","url":null,"abstract":"Considering the perceptual importance of phonetic transitions as minimal contextual variant units, this paper addresses the problem by modelling explicitly interphone dynamics covered in diphones. Subspace projections based on a time-constrained PCA (TC-PCA) are developed which focus on the temporal evolution. They reveal characteristic trajectories present in a low-dimensional spectral representation facilitating robust parameter estimation and simultaneously optimise the discriminant information. A matched filter design is applied to a multiple hypotheses rescoring scheme which enables operating in very low-dimensional parameter space. Using such multiple hypotheses paradigm the complementary information effectiveness of modelling explicitly inter-phone dynamics covered in diphones can be shown using the TIMIT database, resulting in improved phone error rates.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"584 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122721009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimation and segmentation of a dense disparity map for 3D reconstruction 三维重建中密集视差图的估计与分割
M. Rziza, A. Tamtaoui, L. Morin, D. Aboutajdine
This paper presents a new algorithm of disparity map segmentation in planar facets. The origins of this method lie in the process of dense disparity map estimation, using the dynamic programming subject to interest points previously extracted. The segmentation of this map uses the normal vector at each pixel surface. The matching of pixels between the two images by dynamic programming provides us with a scattered disparity map. So the densification of this map is achieved by matching contour points extracted between the two available images. Experiments with real images have validated our method and have clearly shown the improvement over the existing methods. The dense disparity map obtained is reliable when compared to classical methods. We also get a normal vector map segmented in contours and in homogeneous regions reflecting 3D planar facets.
提出了一种新的平面切面视差图分割算法。该方法的起源在于密集视差图估计过程中,利用预先提取的兴趣点进行动态规划。该地图的分割使用每个像素表面的法向量。采用动态规划的方法对两幅图像进行像素匹配,得到离散视差图。因此,该地图的密度是通过匹配在两个可用图像之间提取的轮廓点来实现的。实际图像实验验证了该方法的有效性,并明显优于现有方法。与经典方法相比,得到的密集视差图是可靠的。我们还得到了在等高线和反映三维平面切面的均匀区域中分割的法向量映射。
{"title":"Estimation and segmentation of a dense disparity map for 3D reconstruction","authors":"M. Rziza, A. Tamtaoui, L. Morin, D. Aboutajdine","doi":"10.1109/ICASSP.2000.859279","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859279","url":null,"abstract":"This paper presents a new algorithm of disparity map segmentation in planar facets. The origins of this method lie in the process of dense disparity map estimation, using the dynamic programming subject to interest points previously extracted. The segmentation of this map uses the normal vector at each pixel surface. The matching of pixels between the two images by dynamic programming provides us with a scattered disparity map. So the densification of this map is achieved by matching contour points extracted between the two available images. Experiments with real images have validated our method and have clearly shown the improvement over the existing methods. The dense disparity map obtained is reliable when compared to classical methods. We also get a normal vector map segmented in contours and in homogeneous regions reflecting 3D planar facets.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122490642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
On-line non-stationary ICA using mixture models 基于混合模型的在线非平稳ICA
A. Ahmed, C. Andrieu, A. Doucet, P. Rayner
In this paper we address the problem of on-line source separation with sources modelled as mixtures of Gaussians which are linearly combined via a series of non-stationary mixing matrices. The online recovery of the sources from the observations is a non-linear statistical filtering problem that we address using state of the art particle filter methods. Simulations are presented and satisfactory results are obtained.
在本文中,我们解决了在线源分离的问题,其中源建模为通过一系列非平稳混合矩阵线性组合的高斯混合源。从观测中在线恢复源是一个非线性统计滤波问题,我们使用最先进的粒子滤波方法来解决这个问题。最后进行了仿真,得到了满意的结果。
{"title":"On-line non-stationary ICA using mixture models","authors":"A. Ahmed, C. Andrieu, A. Doucet, P. Rayner","doi":"10.1109/ICASSP.2000.861205","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861205","url":null,"abstract":"In this paper we address the problem of on-line source separation with sources modelled as mixtures of Gaussians which are linearly combined via a series of non-stationary mixing matrices. The online recovery of the sources from the observations is a non-linear statistical filtering problem that we address using state of the art particle filter methods. Simulations are presented and satisfactory results are obtained.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"234 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114544278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A hybrid information maximisation (HIM) algorithm for optimal feature selection from multi-channel data 混合信息最大化算法用于多通道数据的最优特征选择
A. Al-Ani, Mohamed Deriche
A novel feature selection algorithm is derived for multi-channel data. This algorithm is a hybrid information maximisation (HIM) technique based on (1) maximising the mutual information between the input and output of a network using the infomax algorithm proposed by Linsker (1988), and (2) maximising the mutual information between outputs of different network modules using the Imax algorithm introduced by Becker (see Network Computation in Neural Systems, vol.7, p.7-31, 1996). The infomax algorithm is useful in reducing the redundancy in the output units, while the Imax algorithm is capable of selecting higher order features from the input units. In this paper, we analyse the two methods and generalise the learning procedure of the Imax algorithm to make it suitable for maximising the mutual information between multi-dimensional output units from different network modules contrary to the original Imax algorithm which only maximises mutual information between two output units. We show that the proposed HIM algorithm provides a better representation of the input compared to the original two algorithms when used separately. Finally, the HIM is evaluated with respect to biological plausibility in the case of feature selection from two-channel EEG data.
提出了一种新的多通道数据特征选择算法。该算法是一种混合信息最大化(HIM)技术,基于(1)使用Linsker(1988)提出的infomax算法最大化网络输入和输出之间的互信息,以及(2)使用Becker引入的Imax算法最大化不同网络模块输出之间的互信息(参见神经系统网络计算,第7卷,p.7-31, 1996)。infomax算法有助于减少输出单元的冗余,而Imax算法能够从输入单元中选择高阶特征。本文对这两种方法进行了分析,并对Imax算法的学习过程进行了概括,使其适合于最大化不同网络模块的多维输出单元之间的互信息,而不是原来的Imax算法只最大化两个输出单元之间的互信息。我们表明,当单独使用时,与原始的两种算法相比,所提出的HIM算法提供了更好的输入表示。最后,在双通道脑电图数据特征选择的情况下,对HIM进行生物合理性评估。
{"title":"A hybrid information maximisation (HIM) algorithm for optimal feature selection from multi-channel data","authors":"A. Al-Ani, Mohamed Deriche","doi":"10.1109/ICASSP.2000.860148","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860148","url":null,"abstract":"A novel feature selection algorithm is derived for multi-channel data. This algorithm is a hybrid information maximisation (HIM) technique based on (1) maximising the mutual information between the input and output of a network using the infomax algorithm proposed by Linsker (1988), and (2) maximising the mutual information between outputs of different network modules using the Imax algorithm introduced by Becker (see Network Computation in Neural Systems, vol.7, p.7-31, 1996). The infomax algorithm is useful in reducing the redundancy in the output units, while the Imax algorithm is capable of selecting higher order features from the input units. In this paper, we analyse the two methods and generalise the learning procedure of the Imax algorithm to make it suitable for maximising the mutual information between multi-dimensional output units from different network modules contrary to the original Imax algorithm which only maximises mutual information between two output units. We show that the proposed HIM algorithm provides a better representation of the input compared to the original two algorithms when used separately. Finally, the HIM is evaluated with respect to biological plausibility in the case of feature selection from two-channel EEG data.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122204687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Least squares deconvolution in wavelet domain for 1/f driven LTI systems 1/f驱动LTI系统的小波域最小二乘反卷积
M. Izzetoglu, Tayfun Akgül, B. Onaral, N. Bilgutay
In this paper we propose a least squares deconvolution method in the wavelet domain for linear time-invariant (LTI) systems with 1/f type input signals. We model the output of the system as convolution of the impulse response and the input signal, which exhibits 1/f type spectral behavior. Our aim in solving the deconvolution problem is to estimate a filter which approximates the inverse of the impulse response, so that by applying this filter to the output data we can estimate the input signal. In order to achieve this objective, we use the wavelet transform and its properties for 1/f signals, where the logarithm of the variance of the wavelet coefficients in each stage progresses linearly. We define an error criterion in the wavelet domain whose minimization yields the optimum inverse filter. We present the error minimization algorithm and the simulation results.
本文针对输入信号为1/f型的线性时不变系统,提出了一种小波域的最小二乘反卷积方法。我们将系统的输出建模为脉冲响应与输入信号的卷积,其表现为1/f型谱行为。我们解决反卷积问题的目的是估计一个近似脉冲响应逆的滤波器,这样通过将该滤波器应用于输出数据,我们可以估计输入信号。为了实现这一目标,我们对1/f信号使用小波变换及其性质,其中每个阶段小波系数方差的对数线性发展。我们在小波域定义了一个误差准则,它的最小化产生最优的逆滤波器。给出了误差最小化算法和仿真结果。
{"title":"Least squares deconvolution in wavelet domain for 1/f driven LTI systems","authors":"M. Izzetoglu, Tayfun Akgül, B. Onaral, N. Bilgutay","doi":"10.1109/ICASSP.2000.859061","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859061","url":null,"abstract":"In this paper we propose a least squares deconvolution method in the wavelet domain for linear time-invariant (LTI) systems with 1/f type input signals. We model the output of the system as convolution of the impulse response and the input signal, which exhibits 1/f type spectral behavior. Our aim in solving the deconvolution problem is to estimate a filter which approximates the inverse of the impulse response, so that by applying this filter to the output data we can estimate the input signal. In order to achieve this objective, we use the wavelet transform and its properties for 1/f signals, where the logarithm of the variance of the wavelet coefficients in each stage progresses linearly. We define an error criterion in the wavelet domain whose minimization yields the optimum inverse filter. We present the error minimization algorithm and the simulation results.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116806394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Low complexity speaker independent command word recognition in car environments 汽车环境下低复杂度独立于说话人的命令词识别
S. Riis, O. Viikki
In this paper we compare a standard HMM based recognizer to a highly parameter efficient hybrid denoted hidden neural network (HNN). The comparison was done on a speaker independent command word recognition task aimed at car hands-free applications. Monophone based HMM and HNN recognizers were initially trained on clean Wall Street Journal British English data. Evaluation of these baseline models on noisy car speech data indicated superior performance of the HMMs. After smoothing to the car environment, however, an HNN with 28k parameters provided a relative error rate reduction of 23-53% over HMMs containing 21k-168k parameters. Due to the low number of parameters in the HNNs, they have a real-time decoding complexity 2-4 times below that of comparable HMMs. The low memory and computational requirements of the HNN makes it particularly attractive for implementation on portable commercial hardware like mobile phones and personal digital assistants.
在本文中,我们比较了一种基于标准HMM的识别器和一种参数高效的混合隐式神经网络(HNN)。在针对汽车免提应用程序的独立于扬声器的命令词识别任务中进行了比较。基于单声道的HMM和HNN识别器最初是在干净的《华尔街日报》英式英语数据上训练的。在有噪声的汽车语音数据上对这些基线模型的评估表明hmm具有优越的性能。然而,在平滑到汽车环境之后,具有28k个参数的HNN比包含21k-168k个参数的hmm的相对错误率降低了23-53%。由于hnn中的参数数量较少,它们的实时解码复杂度比同类hmm低2-4倍。HNN的低内存和计算需求使其对便携式商业硬件(如移动电话和个人数字助理)的实现特别有吸引力。
{"title":"Low complexity speaker independent command word recognition in car environments","authors":"S. Riis, O. Viikki","doi":"10.1109/ICASSP.2000.862089","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.862089","url":null,"abstract":"In this paper we compare a standard HMM based recognizer to a highly parameter efficient hybrid denoted hidden neural network (HNN). The comparison was done on a speaker independent command word recognition task aimed at car hands-free applications. Monophone based HMM and HNN recognizers were initially trained on clean Wall Street Journal British English data. Evaluation of these baseline models on noisy car speech data indicated superior performance of the HMMs. After smoothing to the car environment, however, an HNN with 28k parameters provided a relative error rate reduction of 23-53% over HMMs containing 21k-168k parameters. Due to the low number of parameters in the HNNs, they have a real-time decoding complexity 2-4 times below that of comparable HMMs. The low memory and computational requirements of the HNN makes it particularly attractive for implementation on portable commercial hardware like mobile phones and personal digital assistants.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"314 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116809232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Source adaptive software 2D iDCT with SIMD 源自适应软件2D iDCT与SIMD
L. Winger
This paper presents a fast two-dimensional inverse discrete cosine transform that adapts to compressed video source statistics to reduce execution time. iDCT algorithms for sparse blocks eliminate calculations for some zero coefficients and are implemented with quad-word parallel single-instruction-multiple-data (SIMD) multimedia instructions. It is observed that end-of-block marker value histograms vary little within single shots. An adaptive control mechanism is proposed that chooses the optimal set of iDCTs to prepare for an entire shot from its 1st frames (to reduce software overheads and penalties). This introduces no degradation of decoded video quality compared with a conventional SIMD 8/spl times/8 iDCT implemented with Intel MMX instructions. It is confirmed that execution time is reduced an additional 15% with Murata's method for 4 Mbps MPEG2 natural video. In comparison, execution time is reduced 22% with a modified version Murata's method, and by 35% with the new source adaptive method.
本文提出了一种快速二维反离散余弦变换,该变换适应于压缩视频源统计,以减少执行时间。稀疏块的iDCT算法消除了一些零系数的计算,并采用四字并行单指令多数据(SIMD)多媒体指令实现。观察到,块末标记值直方图在单个镜头内变化很小。提出了一种自适应控制机制,从第一帧开始选择最优的idct集来准备整个镜头(以减少软件开销和处罚)。与使用Intel MMX指令实现的传统SIMD 8/spl times/8 iDCT相比,这不会导致解码视频质量的下降。经证实,对于4 Mbps的MPEG2自然视频,使用村田的方法可以额外减少15%的执行时间。相比之下,改进版本的Murata方法的执行时间减少了22%,新的源代码自适应方法的执行时间减少了35%。
{"title":"Source adaptive software 2D iDCT with SIMD","authors":"L. Winger","doi":"10.1109/ICASSP.2000.860191","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860191","url":null,"abstract":"This paper presents a fast two-dimensional inverse discrete cosine transform that adapts to compressed video source statistics to reduce execution time. iDCT algorithms for sparse blocks eliminate calculations for some zero coefficients and are implemented with quad-word parallel single-instruction-multiple-data (SIMD) multimedia instructions. It is observed that end-of-block marker value histograms vary little within single shots. An adaptive control mechanism is proposed that chooses the optimal set of iDCTs to prepare for an entire shot from its 1st frames (to reduce software overheads and penalties). This introduces no degradation of decoded video quality compared with a conventional SIMD 8/spl times/8 iDCT implemented with Intel MMX instructions. It is confirmed that execution time is reduced an additional 15% with Murata's method for 4 Mbps MPEG2 natural video. In comparison, execution time is reduced 22% with a modified version Murata's method, and by 35% with the new source adaptive method.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128422264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1