Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6853565
S. Yella, H. Bourlard
Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.
{"title":"Information bottleneck based speaker diarization of meetings using non-speech as side information","authors":"S. Yella, H. Bourlard","doi":"10.1109/ICASSP.2014.6853565","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853565","url":null,"abstract":"Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72639838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854584
S. Ricci, R. Matera, A. Dallai
In a typical echo-Doppler investigation the moving blood is periodically insonated by the transmitting bursts of ultrasound energy. The echoes, shifted in frequency according to the Doppler effect, are received, coherently demodulated and processed through a spectral estimator. The detected frequency shift can be exploited for blood velocity assessment. The spectral analysis is typically performed by the conventional Fast Fourier Transform (FFT), but, recently, the application of the Amplitude and Phase EStimator (APES) was proved to produce a good quality sonogram based on a reduced number of transmissions. Unfortunately, the much higher calculation effort needed by APES hampers its use in real-time applications. In this work, a fixed point DSP implementation of APES is presented. A spectral estimate - based on 32 transmissions - occurs in less than 120μs. Results obtained on echo-Doppler investigations on a volunteer are presented.
{"title":"Amplitude and phase estimator for real-time biomedical spectral Doppler applications","authors":"S. Ricci, R. Matera, A. Dallai","doi":"10.1109/ICASSP.2014.6854584","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854584","url":null,"abstract":"In a typical echo-Doppler investigation the moving blood is periodically insonated by the transmitting bursts of ultrasound energy. The echoes, shifted in frequency according to the Doppler effect, are received, coherently demodulated and processed through a spectral estimator. The detected frequency shift can be exploited for blood velocity assessment. The spectral analysis is typically performed by the conventional Fast Fourier Transform (FFT), but, recently, the application of the Amplitude and Phase EStimator (APES) was proved to produce a good quality sonogram based on a reduced number of transmissions. Unfortunately, the much higher calculation effort needed by APES hampers its use in real-time applications. In this work, a fixed point DSP implementation of APES is presented. A spectral estimate - based on 32 transmissions - occurs in less than 120μs. Results obtained on echo-Doppler investigations on a volunteer are presented.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"5149-5152"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72655653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6853854
Eleonora D'Arca, N. Robertson, J. Hopgood
In this work we propose a novel method to automatically detect and localise the dominant speaker in an enclosed scenario by means of audio and video cues. The underpinning idea is that gesturing means speaking, so observing motions means observing an audio signal. To the best of our knowledge state-of-the-art algorithms are focussed on stationary motion scenarios and close-up scenes where only one audio source exists, whereas we enlarge the extent of the method to larger field of views and cluttered scenarios including multiple non-stationary moving speakers. In such contexts, moving objects which are not correlated to the dominant audio may exist and their motion may incorrectly drive the audio-video (AV) correlation estimation. This suggests extra localisation data may be fused at decision level to avoid detecting false positives. In this work, we learn Mel-frequency cepstral coefficients (MFCC) coefficients and correlate them to the optical flow. We also exploit the audio and video signals to estimate the position of the actual speaker, narrowing down the visual space of search, hence reducing the probability of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real datasets a 36% precision improvement in localising a moving dominant speaker through occlusions and speech interferences.
{"title":"Look who's talking: Detecting the dominant speaker in a cluttered scenario","authors":"Eleonora D'Arca, N. Robertson, J. Hopgood","doi":"10.1109/ICASSP.2014.6853854","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853854","url":null,"abstract":"In this work we propose a novel method to automatically detect and localise the dominant speaker in an enclosed scenario by means of audio and video cues. The underpinning idea is that gesturing means speaking, so observing motions means observing an audio signal. To the best of our knowledge state-of-the-art algorithms are focussed on stationary motion scenarios and close-up scenes where only one audio source exists, whereas we enlarge the extent of the method to larger field of views and cluttered scenarios including multiple non-stationary moving speakers. In such contexts, moving objects which are not correlated to the dominant audio may exist and their motion may incorrectly drive the audio-video (AV) correlation estimation. This suggests extra localisation data may be fused at decision level to avoid detecting false positives. In this work, we learn Mel-frequency cepstral coefficients (MFCC) coefficients and correlate them to the optical flow. We also exploit the audio and video signals to estimate the position of the actual speaker, narrowing down the visual space of search, hence reducing the probability of incurring in a wrong voice-to-pixel region association. We compare our work with a state-of-the-art existing algorithm and show on real datasets a 36% precision improvement in localising a moving dominant speaker through occlusions and speech interferences.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"150 1","pages":"1532-1536"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77400161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6853650
J. Yi, Fei Su
The Gabor-based features have achieved excellent performances for face recognition on traditional face databases. However, on the recent LFW (Labeled Faces in the Wild) face database, Gabor-based features attract little attention due to their high computing complexity and feature dimension and poor performance. In this paper, we propose a Gabor-based feature termed Histogram of Gabor Magnitude Patterns (HGMP) which is very simple but effective. HGMP adopts the Bag-of-Words (BoW) image representation framework. It views the Gabor filters as codewords and the Gabor magnitudes of each point as the responses of the point to these codewords. Then the point is coded by the orientation normalization and scale non-maximum suppression of its magnitudes, which are efficient to compute. Moreover, the number of codewords is so small that the feature dimension of HGMP is very low. In addition, we analyze the advantages of log-Gabor filters to Gabor filters to serve as the codewords, and propose to replace Gabor filters with log-Gabor filters in HGMP, which produces the Histogram of Log-Gabor Magnitude Patterns (HLGMP) feature. The experimental results on LFW show that HLGMP outperforms HGMP and it achieves the state-of-the-art performance, although its computing complexity and feature dimension are very low.
基于gabor特征的人脸识别在传统人脸数据库上取得了优异的效果。然而,在最近的LFW (Labeled Faces in the Wild)人脸数据库中,基于gabor的特征由于其较高的计算复杂度和特征维数以及较差的性能而很少受到关注。在本文中,我们提出了一个基于Gabor的特征,称为Gabor大小模式直方图(HGMP),这是非常简单但有效的。HGMP采用词袋(Bag-of-Words, BoW)图像表示框架。它将Gabor过滤器视为码字,并将每个点的Gabor幅度视为该点对这些码字的响应。然后对点进行方向归一化编码,并对其大小进行尺度非最大值抑制,提高了计算效率。而且码字的数量很少,使得HGMP的特征维数很低。此外,我们分析了log-Gabor滤波器作为码字的优点,并提出用log-Gabor滤波器代替HGMP中的Gabor滤波器,从而产生log-Gabor大小模式直方图(HLGMP)特征。在LFW上的实验结果表明,尽管HLGMP的计算复杂度和特征维数都很低,但其性能优于HGMP,达到了最先进的水平。
{"title":"Histogram of Log-Gabor Magnitude Patterns for face recognition","authors":"J. Yi, Fei Su","doi":"10.1109/ICASSP.2014.6853650","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853650","url":null,"abstract":"The Gabor-based features have achieved excellent performances for face recognition on traditional face databases. However, on the recent LFW (Labeled Faces in the Wild) face database, Gabor-based features attract little attention due to their high computing complexity and feature dimension and poor performance. In this paper, we propose a Gabor-based feature termed Histogram of Gabor Magnitude Patterns (HGMP) which is very simple but effective. HGMP adopts the Bag-of-Words (BoW) image representation framework. It views the Gabor filters as codewords and the Gabor magnitudes of each point as the responses of the point to these codewords. Then the point is coded by the orientation normalization and scale non-maximum suppression of its magnitudes, which are efficient to compute. Moreover, the number of codewords is so small that the feature dimension of HGMP is very low. In addition, we analyze the advantages of log-Gabor filters to Gabor filters to serve as the codewords, and propose to replace Gabor filters with log-Gabor filters in HGMP, which produces the Histogram of Log-Gabor Magnitude Patterns (HLGMP) feature. The experimental results on LFW show that HLGMP outperforms HGMP and it achieves the state-of-the-art performance, although its computing complexity and feature dimension are very low.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 1","pages":"519-523"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77634745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6855020
C. Lameiro, I. Santamaría, W. Utschick
In this paper, a cognitive radio (CR) scenario comprised of a secondary interference channel (IC) and a primary point-to-point link (PPL) is studied, when the former interferes the latter. In order to satisfy a given rate requirement at the PPL, typical approaches impose an interference temperature constraint (IT). When the PPL transmits multiple streams, however, the spatial structure of the interference comes into play. In such cases, we show that spatial interference shaping constraints can provide higher sum-rate performance to the IC while ensuring the required rate at the PPL. Then, we extend the interference leakage minimization algorithm (MinIL) to incorporate such constraints. An additional power control step is included in the optimization procedure to improve the sum-rate when the interference alignment (IA) problem becomes infeasible due to the additional constraint. Numerical examples are provided to illustrate the effectiveness of the spatial shaping constraint in comparison to IT when the PPL transmits multiple data streams.
{"title":"Interference shaping constraints for underlay MIMO interference channels","authors":"C. Lameiro, I. Santamaría, W. Utschick","doi":"10.1109/ICASSP.2014.6855020","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6855020","url":null,"abstract":"In this paper, a cognitive radio (CR) scenario comprised of a secondary interference channel (IC) and a primary point-to-point link (PPL) is studied, when the former interferes the latter. In order to satisfy a given rate requirement at the PPL, typical approaches impose an interference temperature constraint (IT). When the PPL transmits multiple streams, however, the spatial structure of the interference comes into play. In such cases, we show that spatial interference shaping constraints can provide higher sum-rate performance to the IC while ensuring the required rate at the PPL. Then, we extend the interference leakage minimization algorithm (MinIL) to incorporate such constraints. An additional power control step is included in the optimization procedure to improve the sum-rate when the interference alignment (IA) problem becomes infeasible due to the additional constraint. Numerical examples are provided to illustrate the effectiveness of the spatial shaping constraint in comparison to IT when the PPL transmits multiple data streams.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"49 1","pages":"7313-7317"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77731050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6853663
Zucheul Lee, Truong Q. Nguyen
In this paper, we present an effective hierarchical depth processing and fusion for large stereo images. We propose the adaptive disparity search range based on the combined local structure from image and initial disparity. The adaptive search range can propagate the smoothness property at the coarse level to the fine level while preserving details and suppressing undesirable errors. The spatial-multiscale total variation method is investigated to enforce the spatial and scaling consistency of multi-scale depth estimates. The experimental results demonstrate that the proposed hierarchical scheme produces high quality and high resolution depth maps by fusing individual multi-scale depth maps, while reducing complexity.
{"title":"Hierarchical depth processing with adaptive search range and fusion","authors":"Zucheul Lee, Truong Q. Nguyen","doi":"10.1109/ICASSP.2014.6853663","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853663","url":null,"abstract":"In this paper, we present an effective hierarchical depth processing and fusion for large stereo images. We propose the adaptive disparity search range based on the combined local structure from image and initial disparity. The adaptive search range can propagate the smoothness property at the coarse level to the fine level while preserving details and suppressing undesirable errors. The spatial-multiscale total variation method is investigated to enforce the spatial and scaling consistency of multi-scale depth estimates. The experimental results demonstrate that the proposed hierarchical scheme produces high quality and high resolution depth maps by fusing individual multi-scale depth maps, while reducing complexity.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"584-588"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77858538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6853842
M. Abhijith, P. Ghosh, K. Rajgopal
Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error ≤ 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.
{"title":"Multi-pitch tracking using Gaussian mixture model with time varying parameters and Grating Compression Transform","authors":"M. Abhijith, P. Ghosh, K. Rajgopal","doi":"10.1109/ICASSP.2014.6853842","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6853842","url":null,"abstract":"Grating Compression Transform (GCT) is a two-dimensional analysis of speech signal which has been shown to be effective in multi-pitch tracking in speech mixtures. Multi-pitch tracking methods using GCT apply Kalman filter framework to obtain pitch tracks which requires training of the filter parameters using true pitch tracks. We propose an unsupervised method for obtaining multiple pitch tracks. In the proposed method, multiple pitch tracks are modeled using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using GCT. We evaluate the performance of the proposed method on all voiced speech mixtures as well as random speech mixtures having well separated and close pitch tracks. TVGMM achieves multi-pitch tracking with 51% and 53% multi-pitch estimates having error ≤ 20% for random mixtures and all-voiced mixtures respectively. TVGMM also results in lower root mean squared error in pitch track estimation compared to that by Kalman filtering.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"60 1","pages":"1473-1477"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78168216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854109
Yuanhao Zhai, D. Neuhoff
This paper proposes new objective similarity metrics for scenic bilevel images, which are images containing natural scenes such as landscapes and portraits. Though percentage error is the most commonly used similarity metric for bilevel images, it is not always consistent with human perception. Based on hypotheses about human perception of bilevel images, this paper proposes new metrics that outperform percentage error in the sense of attaining significantly higher Pearson and Spearman-rank correlation coefficients with respect to subjective ratings. The new metrics include Adjusted Percentage Error, Bilevel Gradient Histogram and Connected Components Comparison. The subjective ratings come from similarity evaluations described in a companion paper. Combinations of these metrics are also proposed, which exploit their complementarity to attain even better performance.
{"title":"Objective similarity metrics for scenic bilevel images","authors":"Yuanhao Zhai, D. Neuhoff","doi":"10.1109/ICASSP.2014.6854109","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854109","url":null,"abstract":"This paper proposes new objective similarity metrics for scenic bilevel images, which are images containing natural scenes such as landscapes and portraits. Though percentage error is the most commonly used similarity metric for bilevel images, it is not always consistent with human perception. Based on hypotheses about human perception of bilevel images, this paper proposes new metrics that outperform percentage error in the sense of attaining significantly higher Pearson and Spearman-rank correlation coefficients with respect to subjective ratings. The new metrics include Adjusted Percentage Error, Bilevel Gradient Histogram and Connected Components Comparison. The subjective ratings come from similarity evaluations described in a companion paper. Combinations of these metrics are also proposed, which exploit their complementarity to attain even better performance.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"96 1","pages":"2793-2797"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80127164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854093
R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein
In this paper, multi-group multi-way relaying is considered. There are L groups with K nodes in each group. Each node wants to share d data streams with all the other nodes in its group. A single MIMO relay assists the communications. The relay does not have enough antennas to spatially separate the data streams. However, the relay assists in performing interference alignment at the receivers. In order to find the interference alignment solution, we generalize the concept of signal and channel alignment developed for the MIMO Y channel and the two-way relay channel to group signal alignment and group channel alignment. In comparison to conventional multi-group multi-way relaying schemes [1, 2], where at least R ≥ LKd - d antennas are required, in our proposed scheme, exploiting the multiple antennas at the nodes, only R ≥ LKd - Ld antennas are needed. The number of antennas required at the nodes to achieve this is also derived. It is shown that the proposed interference alignment based scheme achieves more degrees of freedom than the reference schemes without interference alignment.
{"title":"Multi-group multi-way relaying with reduced number of relay antennas","authors":"R. S. Ganesan, Hussein Al-Shatri, Xiang Li, T. Weber, A. Klein","doi":"10.1109/ICASSP.2014.6854093","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854093","url":null,"abstract":"In this paper, multi-group multi-way relaying is considered. There are L groups with K nodes in each group. Each node wants to share d data streams with all the other nodes in its group. A single MIMO relay assists the communications. The relay does not have enough antennas to spatially separate the data streams. However, the relay assists in performing interference alignment at the receivers. In order to find the interference alignment solution, we generalize the concept of signal and channel alignment developed for the MIMO Y channel and the two-way relay channel to group signal alignment and group channel alignment. In comparison to conventional multi-group multi-way relaying schemes [1, 2], where at least R ≥ LKd - d antennas are required, in our proposed scheme, exploiting the multiple antennas at the nodes, only R ≥ LKd - Ld antennas are needed. The number of antennas required at the nodes to achieve this is also derived. It is shown that the proposed interference alignment based scheme achieves more degrees of freedom than the reference schemes without interference alignment.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"2714-2718"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80223113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-05-04DOI: 10.1109/ICASSP.2014.6854422
T. Zhang, Haixian Wang
In this paper we propose a new method to detect motion in a greyscale video. In our algorithm, several spatiotemporal sequences with different lengths are used to filter the frames in the video. Then these filtered images are combined together to get the real motion. The performance of our algorithm is tested with several human action datasets in which different actions are performed. The detected results of our algorithm are compared with previous works and the targets we extract manually. The experimental results show that the responses of our filter are close to the real action of the human in the original video.
{"title":"Motion detection with spatiotemporal sequences","authors":"T. Zhang, Haixian Wang","doi":"10.1109/ICASSP.2014.6854422","DOIUrl":"https://doi.org/10.1109/ICASSP.2014.6854422","url":null,"abstract":"In this paper we propose a new method to detect motion in a greyscale video. In our algorithm, several spatiotemporal sequences with different lengths are used to filter the frames in the video. Then these filtered images are combined together to get the real motion. The performance of our algorithm is tested with several human action datasets in which different actions are performed. The detected results of our algorithm are compared with previous works and the targets we extract manually. The experimental results show that the responses of our filter are close to the real action of the human in the original video.","PeriodicalId":6545,"journal":{"name":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"4344-4348"},"PeriodicalIF":0.0,"publicationDate":"2014-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79345497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}