首页 > 最新文献

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Information bottleneck based speaker diarization of meetings using non-speech as side information 基于信息瓶颈的以非语音作为辅助信息的会议发言人划分
S. Yella, H. Bourlard
Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.
背景噪声和语音/非语音检测中的错误会导致说话人拨号系统的输出显著下降。在典型的说话人分化系统中,非语音片段在无监督聚类之前被排除。在本研究中,我们利用录音中存在的非语音片段的信息来提高基于信息瓶颈框架的说话人分化系统的输出。这是通过提供来自非语音片段的信息作为基于信息瓶颈的聚类的侧(不相关)信息来实现的。在rt06、07、09的会议录音评估集上进行的实验表明,与基于信息瓶颈框架的基线发言者拨号系统相比,该方法将拨号错误率降低了18%左右。与基于HMM/GMM框架的信息瓶颈系统的比较表明,该方法显著减小了信息瓶颈系统与HMM/GMM系统在性能上的差距。
{"title":"Information bottleneck based speaker diarization of meetings using non-speech as side information","authors":"S. Yella, H. Bourlard","doi":"10.1109/ICASSP.2014.6853565","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853565","url":null,"abstract":"Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72639838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Amplitude and phase estimator for real-time biomedical spectral Doppler applications 用于实时生物医学频谱多普勒应用的幅度和相位估计器
S. Ricci, R. Matera, A. Dallai
In a typical echo-Doppler investigation the moving blood is periodically insonated by the transmitting bursts of ultrasound energy. The echoes, shifted in frequency according to the Doppler effect, are received, coherently demodulated and processed through a spectral estimator. The detected frequency shift can be exploited for blood velocity assessment. The spectral analysis is typically performed by the conventional Fast Fourier Transform (FFT), but, recently, the application of the Amplitude and Phase EStimator (APES) was proved to produce a good quality sonogram based on a reduced number of transmissions. Unfortunately, the much higher calculation effort needed by APES hampers its use in real-time applications. In this work, a fixed point DSP implementation of APES is presented. A spectral estimate - based on 32 transmissions - occurs in less than 120μs. Results obtained on echo-Doppler investigations on a volunteer are presented.
在典型的超声多普勒检查中,运动中的血液周期性地通过超声能量的发射脉冲进行超声。根据多普勒效应进行频率偏移的回波被接收、相干解调并通过频谱估计器进行处理。检测到的频移可以用于血流速度评估。频谱分析通常由传统的快速傅里叶变换(FFT)进行,但是,最近,应用幅度和相位估计器(APES)被证明可以在减少传输次数的基础上产生高质量的超声图。不幸的是,APES所需的高得多的计算量阻碍了它在实时应用程序中的使用。本文提出了一种基于定点DSP的APES实现方案。基于32次传输的光谱估计在不到120μs的时间内发生。本文介绍了一名志愿者的超声多普勒检查结果。
{"title":"Amplitude and phase estimator for real-time biomedical spectral Doppler applications","authors":"S. Ricci, R. Matera, A. Dallai","doi":"10.1109/ICASSP.2014.6854584","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854584","url":null,"abstract":"In a typical echo-Doppler investigation the moving blood is periodically insonated by the transmitting bursts of ultrasound energy. The echoes, shifted in frequency according to the Doppler effect, are received, coherently demodulated and processed through a spectral estimator. The detected frequency shift can be exploited for blood velocity assessment. The spectral analysis is typically performed by the conventional Fast Fourier Transform (FFT), but, recently, the application of the Amplitude and Phase EStimator (APES) was proved to produce a good quality sonogram based on a reduced number of transmissions. Unfortunately, the much higher calculation effort needed by APES hampers its use in real-time applications. In this work, a fixed point DSP implementation of APES is presented. A spectral estimate - based on 32 transmissions - occurs in less than 120μs. Results obtained on echo-Doppler investigations on a volunteer are presented.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"5149-5152"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72655653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Look who's talking: Detecting the dominant speaker in a cluttered scenario 看谁在说话:在混乱的场景中发现占主导地位的说话人
Eleonora D'Arca, N. Robertson, J. Hopgood
In this work we propose a novel method to automatically detect and localise the dominant speaker in an enclosed scenario by means of audio and video cues. The underpinning idea is that gesturing means speaking, so observing motions means observing an audio signal. To the best of our knowledge state-of-the-art algorithms are focussed on stationary motion scenarios and close-up scenes where only one audio source exists, whereas we enlarge the extent of the method to larger field of views and cluttered scenarios including multiple non-stationary moving speakers. In such contexts, moving objects which are not correlated to the dominant audio may exist and their motion may incorrectly drive the audio-video (AV) correlation estimation. This suggests extra localisation data may be fused at decision level to avoid detecting false positives. In this work, we learn Mel-frequency cepstral coefficients (MFCC) coefficients and correlate them to the optical flow. We also exploit the audio and video signals to estimate the position of the actual speaker, narrowing down the visual space of search, hence reducing the probability of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real datasets a 36% precision improvement in localising a moving dominant speaker through occlusions and speech interferences.
在这项工作中,我们提出了一种新的方法,通过音频和视频线索来自动检测和定位封闭场景中的主要说话者。其基本理念是,手势意味着说话,所以观察动作意味着观察音频信号。据我们所知,最先进的算法专注于静止运动场景和只有一个音频源存在的特写场景,而我们将该方法的范围扩大到更大的视野和包括多个非静止移动扬声器的混乱场景。在这种情况下,可能存在与主导音频不相关的运动对象,并且它们的运动可能不正确地驱动音频-视频(AV)相关估计。这表明额外的定位数据可以在决策级别融合,以避免检测误报。在这项工作中,我们学习了mel频率倒谱系数(MFCC)系数,并将它们与光流相关联。我们还利用音频和视频信号来估计实际说话人的位置,缩小搜索的视觉空间,从而减少在错误的语音-像素区域关联中产生的概率。我们将我们的工作与最先进的现有算法进行了比较,并在真实数据集上展示了通过遮挡和语音干扰定位移动主导说话者的36%精度提高。
{"title":"Look who's talking: Detecting the dominant speaker in a cluttered scenario","authors":"Eleonora D'Arca, N. Robertson, J. Hopgood","doi":"10.1109/ICASSP.2014.6853854","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853854","url":null,"abstract":"In this work we propose a novel method to automatically detect and localise the dominant speaker in an enclosed scenario by means of audio and video cues. The underpinning idea is that gesturing means speaking, so observing motions means observing an audio signal. To the best of our knowledge state-of-the-art algorithms are focussed on stationary motion scenarios and close-up scenes where only one audio source exists, whereas we enlarge the extent of the method to larger field of views and cluttered scenarios including multiple non-stationary moving speakers. In such contexts, moving objects which are not correlated to the dominant audio may exist and their motion may incorrectly drive the audio-video (AV) correlation estimation. This suggests extra localisation data may be fused at decision level to avoid detecting false positives. In this work, we learn Mel-frequency cepstral coefficients (MFCC) coefficients and correlate them to the optical flow. We also exploit the audio and video signals to estimate the position of the actual speaker, narrowing down the visual space of search, hence reducing the probability of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real datasets a 36% precision improvement in localising a moving dominant speaker through occlusions and speech interferences.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"150 1","pages":"1532-1536"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77400161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Histogram of Log-Gabor Magnitude Patterns for face recognition 用于人脸识别的Log-Gabor幅度模式直方图
J. Yi, Fei Su
The Gabor-based features have achieved excellent performances for face recognition on traditional face databases. However, on the recent LFW (Labeled Faces in the Wild) face database, Gabor-based features attract little attention due to their high computing complexity and feature dimension and poor performance. In this paper, we propose a Gabor-based feature termed Histogram of Gabor Magnitude Patterns (HGMP) which is very simple but effective. HGMP adopts the Bag-of-Words (BoW) image representation framework. It views the Gabor filters as codewords and the Gabor magnitudes of each point as the responses of the point to these codewords. Then the point is coded by the orientation normalization and scale non-maximum suppression of its magnitudes, which are efficient to compute. Moreover, the number of codewords is so small that the feature dimension of HGMP is very low. In addition, we analyze the advantages of log-Gabor filters to Gabor filters to serve as the codewords, and propose to replace Gabor filters with log-Gabor filters in HGMP, which produces the Histogram of Log-Gabor Magnitude Patterns (HLGMP) feature. The experimental results on LFW show that HLGMP outperforms HGMP and it achieves the state-of-the-art performance, although its computing complexity and feature dimension are very low.
基于gabor特征的人脸识别在传统人脸数据库上取得了优异的效果。然而,在最近的LFW (Labeled Faces in the Wild)人脸数据库中,基于gabor的特征由于其较高的计算复杂度和特征维数以及较差的性能而很少受到关注。在本文中,我们提出了一个基于Gabor的特征,称为Gabor大小模式直方图(HGMP),这是非常简单但有效的。HGMP采用词袋(Bag-of-Words, BoW)图像表示框架。它将Gabor过滤器视为码字,并将每个点的Gabor幅度视为该点对这些码字的响应。然后对点进行方向归一化编码,并对其大小进行尺度非最大值抑制,提高了计算效率。而且码字的数量很少,使得HGMP的特征维数很低。此外,我们分析了log-Gabor滤波器作为码字的优点,并提出用log-Gabor滤波器代替HGMP中的Gabor滤波器,从而产生log-Gabor大小模式直方图(HLGMP)特征。在LFW上的实验结果表明,尽管HLGMP的计算复杂度和特征维数都很低,但其性能优于HGMP,达到了最先进的水平。
{"title":"Histogram of Log-Gabor Magnitude Patterns for face recognition","authors":"J. Yi, Fei Su","doi":"10.1109/ICASSP.2014.6853650","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853650","url":null,"abstract":"The Gabor-based features have achieved excellent performances for face recognition on traditional face databases. However, on the recent LFW (Labeled Faces in the Wild) face database, Gabor-based features attract little attention due to their high computing complexity and feature dimension and poor performance. In this paper, we propose a Gabor-based feature termed Histogram of Gabor Magnitude Patterns (HGMP) which is very simple but effective. HGMP adopts the Bag-of-Words (BoW) image representation framework. It views the Gabor filters as codewords and the Gabor magnitudes of each point as the responses of the point to these codewords. Then the point is coded by the orientation normalization and scale non-maximum suppression of its magnitudes, which are efficient to compute. Moreover, the number of codewords is so small that the feature dimension of HGMP is very low. In addition, we analyze the advantages of log-Gabor filters to Gabor filters to serve as the codewords, and propose to replace Gabor filters with log-Gabor filters in HGMP, which produces the Histogram of Log-Gabor Magnitude Patterns (HLGMP) feature. The experimental results on LFW show that HLGMP outperforms HGMP and it achieves the state-of-the-art performance, although its computing complexity and feature dimension are very low.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 1","pages":"519-523"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77634745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Interference shaping constraints for underlay MIMO interference channels 底层MIMO干扰信道的干扰整形约束
C. Lameiro, I. Santamaría, W. Utschick
In this paper, a cognitive radio (CR) scenario comprised of a secondary interference channel (IC) and a primary point-to-point link (PPL) is studied, when the former interferes the latter. In order to satisfy a given rate requirement at the PPL, typical approaches impose an interference temperature constraint (IT). When the PPL transmits multiple streams, however, the spatial structure of the interference comes into play. In such cases, we show that spatial interference shaping constraints can provide higher sum-rate performance to the IC while ensuring the required rate at the PPL. Then, we extend the interference leakage minimization algorithm (MinIL) to incorporate such constraints. An additional power control step is included in the optimization procedure to improve the sum-rate when the interference alignment (IA) problem becomes infeasible due to the additional constraint. Numerical examples are provided to illustrate the effectiveness of the spatial shaping constraint in comparison to IT when the PPL transmits multiple data streams.
本文研究了一种由次级干扰信道(IC)和主点对点链路(PPL)组成的认知无线电(CR)场景,当次级干扰信道(IC)和主点对点链路(PPL)相互干扰时。为了满足PPL给定的速率要求,典型的方法施加了干扰温度约束(IT)。但是,当PPL传输多个流时,干扰的空间结构就会起作用。在这种情况下,我们表明空间干涉整形约束可以在确保PPL所需速率的同时为IC提供更高的和速率性能。然后,我们扩展了干扰泄漏最小化算法(MinIL)来包含这些约束。在优化过程中加入了一个额外的功率控制步骤,以便在干扰对准问题由于附加约束而变得不可行的情况下提高和速率。数值算例表明,当PPL传输多个数据流时,空间整形约束与IT相比是有效的。
{"title":"Interference shaping constraints for underlay MIMO interference channels","authors":"C. Lameiro, I. Santamaría, W. Utschick","doi":"10.1109/ICASSP.2014.6855020","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855020","url":null,"abstract":"In this paper, a cognitive radio (CR) scenario comprised of a secondary interference channel (IC) and a primary point-to-point link (PPL) is studied, when the former interferes the latter. In order to satisfy a given rate requirement at the PPL, typical approaches impose an interference temperature constraint (IT). When the PPL transmits multiple streams, however, the spatial structure of the interference comes into play. In such cases, we show that spatial interference shaping constraints can provide higher sum-rate performance to the IC while ensuring the required rate at the PPL. Then, we extend the interference leakage minimization algorithm (MinIL) to incorporate such constraints. An additional power control step is included in the optimization procedure to improve the sum-rate when the interference alignment (IA) problem becomes infeasible due to the additional constraint. Numerical examples are provided to illustrate the effectiveness of the spatial shaping constraint in comparison to IT when the PPL transmits multiple data streams.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"49 1","pages":"7313-7317"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77731050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hierarchical depth processing with adaptive search range and fusion 具有自适应搜索范围和融合的层次深度处理
Zucheul Lee, Truong Q. Nguyen
In this paper, we present an effective hierarchical depth processing and fusion for large stereo images. We propose the adaptive disparity search range based on the combined local structure from image and initial disparity. The adaptive search range can propagate the smoothness property at the coarse level to the fine level while preserving details and suppressing undesirable errors. The spatial-multiscale total variation method is investigated to enforce the spatial and scaling consistency of multi-scale depth estimates. The experimental results demonstrate that the proposed hierarchical scheme produces high quality and high resolution depth maps by fusing individual multi-scale depth maps, while reducing complexity.
本文提出了一种有效的大型立体图像分层深度处理与融合方法。提出了基于图像局部结构和初始视差相结合的自适应视差搜索范围。自适应搜索范围可以在保留细节和抑制不良误差的同时,将粗级平滑特性传播到细级。为了提高多尺度深度估计的空间一致性和尺度一致性,研究了空间-多尺度全变分方法。实验结果表明,该算法通过融合单个多比例尺深度图,得到高质量、高分辨率的深度图,同时降低了深度图的复杂度。
{"title":"Hierarchical depth processing with adaptive search range and fusion","authors":"Zucheul Lee, Truong Q. Nguyen","doi":"10.1109/ICASSP.2014.6853663","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853663","url":null,"abstract":"In this paper, we present an effective hierarchical depth processing and fusion for large stereo images. We propose the adaptive disparity search range based on the combined local structure from image and initial disparity. The adaptive search range can propagate the smoothness property at the coarse level to the fine level while preserving details and suppressing undesirable errors. The spatial-multiscale total variation method is investigated to enforce the spatial and scaling consistency of multi-scale depth estimates. The experimental results demonstrate that the proposed hierarchical scheme produces high quality and high resolution depth maps by fusing individual multi-scale depth maps, while reducing complexity.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"584-588"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77858538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-pitch tracking using Gaussian mixture model with time varying parameters and Grating Compression Transform 采用时变参数高斯混合模型和光栅压缩变换进行多基音跟踪
M. Abhijith, P. Ghosh, K. Rajgopal
Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error ≤ 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.
光栅压缩变换(GCT)是一种对语音信号进行二维分析的方法,在语音混合的多音高跟踪中具有较好的效果。基于GCT的多基音跟踪方法采用卡尔曼滤波框架获得基音轨迹,需要使用真实基音轨迹训练滤波器参数。我们提出了一种无监督的方法来获取多个音轨。在该方法中,使用高斯混合模型(GMM)的时变均值对多个音轨进行建模,称为TVGMM。TVGMM参数是使用GCT在给定话语的每帧中从频谱图的不同块中获得的多个基音值来估计的。我们评估了所提出的方法在所有浊音混合以及具有良好分离和紧密音轨的随机语音混合中的性能。TVGMM实现了多音高跟踪,随机混合和全浊音混合的多音高估计分别为51%和53%,误差≤20%。与卡尔曼滤波相比,TVGMM在基音轨迹估计中具有更低的均方根误差。
{"title":"Multi-pitch tracking using Gaussian mixture model with time varying parameters and Grating Compression Transform","authors":"M. Abhijith, P. Ghosh, K. Rajgopal","doi":"10.1109/ICASSP.2014.6853842","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853842","url":null,"abstract":"Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error ≤ 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"60 1","pages":"1473-1477"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78168216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Objective similarity metrics for scenic bilevel images 风景双层图像的客观相似度度量
Yuanhao Zhai, D. Neuhoff
This paper proposes new objective similarity metrics for scenic bilevel images, which are images containing natural scenes such as landscapes and portraits. Though percentage error is the most commonly used similarity metric for bilevel images, it is not always consistent with human perception. Based on hypotheses about human perception of bilevel images, this paper proposes new metrics that outperform percentage error in the sense of attaining significantly higher Pearson and Spearman-rank correlation coefficients with respect to subjective ratings. The new metrics include Adjusted Percentage Error, Bilevel Gradient Histogram and Connected Components Comparison. The subjective ratings come from similarity evaluations described in a companion paper. Combinations of these metrics are also proposed, which exploit their complementarity to attain even better performance.
本文提出了一种新的风景双层图像的客观相似度度量方法,这些图像包含风景和肖像等自然场景。虽然百分比误差是两层图像最常用的相似性度量,但它并不总是与人类感知一致。基于人类对双层图像感知的假设,本文提出了新的指标,在相对于主观评分获得显着更高的Pearson和Spearman-rank相关系数的意义上优于百分比误差。新的指标包括调整百分比误差,双层梯度直方图和连接组件比较。主观评分来自一篇配套论文中描述的相似性评估。还提出了这些指标的组合,利用它们的互补性来获得更好的性能。
{"title":"Objective similarity metrics for scenic bilevel images","authors":"Yuanhao Zhai, D. Neuhoff","doi":"10.1109/ICASSP.2014.6854109","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854109","url":null,"abstract":"This paper proposes new objective similarity metrics for scenic bilevel images, which are images containing natural scenes such as landscapes and portraits. Though percentage error is the most commonly used similarity metric for bilevel images, it is not always consistent with human perception. Based on hypotheses about human perception of bilevel images, this paper proposes new metrics that outperform percentage error in the sense of attaining significantly higher Pearson and Spearman-rank correlation coefficients with respect to subjective ratings. The new metrics include Adjusted Percentage Error, Bilevel Gradient Histogram and Connected Components Comparison. The subjective ratings come from similarity evaluations described in a companion paper. Combinations of these metrics are also proposed, which exploit their complementarity to attain even better performance.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"96 1","pages":"2793-2797"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80127164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multi-group multi-way relaying with reduced number of relay antennas 减少中继天线数量的多组多路中继
R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein
In this paper, multi-group multi-way relaying is considered. There are L groups with K nodes in each group. Each node wants to share d data streams with all the other nodes in its group. A single MIMO relay assists the communications. The relay does not have enough antennas to spatially separate the data streams. However, the relay assists in performing interference alignment at the receivers. In order to find the interference alignment solution, we generalize the concept of signal and channel alignment developed for the MIMO Y channel and the two-way relay channel to group signal alignment and group channel alignment. In comparison to conventional multi-group multi-way relaying schemes [1, 2], where at least R ≥ LKd - d antennas are required, in our proposed scheme, exploiting the multiple antennas at the nodes, only R ≥ LKd - Ld antennas are needed. The number of antennas required at the nodes to achieve this is also derived. It is shown that the proposed interference alignment based scheme achieves more degrees of freedom than the reference schemes without interference alignment.
本文研究了多组多路继电保护问题。有L个组,每组有K个节点。每个节点都希望与其组中的所有其他节点共享数据流。单个MIMO中继协助通信。中继没有足够的天线在空间上分离数据流。然而,中继协助在接收器上执行干扰校准。为了找到干扰对准的解决方案,我们将MIMO Y信道和双向中继信道发展的信号和信道对准的概念推广到群信号对准和群信道对准。传统的多组多路中继方案[1,2]至少需要R≥LKd - d个天线,而我们提出的方案利用节点上的多个天线,只需要R≥LKd - Ld个天线。还推导了实现这一目标所需的节点天线数量。结果表明,基于干涉对准的方案比无干涉对准的参考方案具有更大的自由度。
{"title":"Multi-group multi-way relaying with reduced number of relay antennas","authors":"R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein","doi":"10.1109/ICASSP.2014.6854093","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854093","url":null,"abstract":"In this paper, multi-group multi-way relaying is considered. There are L groups with K nodes in each group. Each node wants to share d data streams with all the other nodes in its group. A single MIMO relay assists the communications. The relay does not have enough antennas to spatially separate the data streams. However, the relay assists in performing interference alignment at the receivers. In order to find the interference alignment solution, we generalize the concept of signal and channel alignment developed for the MIMO Y channel and the two-way relay channel to group signal alignment and group channel alignment. In comparison to conventional multi-group multi-way relaying schemes [1, 2], where at least R ≥ LKd - d antennas are required, in our proposed scheme, exploiting the multiple antennas at the nodes, only R ≥ LKd - Ld antennas are needed. The number of antennas required at the nodes to achieve this is also derived. It is shown that the proposed interference alignment based scheme achieves more degrees of freedom than the reference schemes without interference alignment.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"2714-2718"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80223113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Motion detection with spatiotemporal sequences 基于时空序列的运动检测
T. Zhang, Haixian Wang
In this paper we propose a new method to detect motion in a greyscale video. In our algorithm, several spatiotemporal sequences with different lengths are used to filter the frames in the video. Then these filtered images are combined together to get the real motion. The performance of our algorithm is tested with several human action datasets in which different actions are performed. The detected results of our algorithm are compared with previous works and the targets we extract manually. The experimental results show that the responses of our filter are close to the real action of the human in the original video.
本文提出了一种新的灰度视频运动检测方法。在我们的算法中,使用多个不同长度的时空序列来过滤视频中的帧。然后将这些过滤后的图像组合在一起,得到真实的运动。我们的算法的性能用几个人类动作数据集进行了测试,其中执行了不同的动作。将本文算法的检测结果与前人的工作以及人工提取的目标进行了比较。实验结果表明,该滤波器的响应接近原始视频中人的真实动作。
{"title":"Motion detection with spatiotemporal sequences","authors":"T. Zhang, Haixian Wang","doi":"10.1109/ICASSP.2014.6854422","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854422","url":null,"abstract":"In this paper we propose a new method to detect motion in a greyscale video. In our algorithm, several spatiotemporal sequences with different lengths are used to filter the frames in the video. Then these filtered images are combined together to get the real motion. The performance of our algorithm is tested with several human action datasets in which different actions are performed. The detected results of our algorithm are compared with previous works and the targets we extract manually. The experimental results show that the responses of our filter are close to the real action of the human in the original video.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"4344-4348"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79345497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1