Pub Date : 2009-02-24DOI: 10.1109/TVT.2009.2015670
R. Gohary, W. Mesbah, T. Davidson
We consider a multiple-input multiple-output (MIMO) wireless communication scenario in which the channel follows a general spatially-correlated complex Gaussian distribution with non-zero mean. We derive an explicit characterization of the optimal input covariance from an ergodic rate perspective for systems that operate at low SNRs. This characterization is in terms of the eigen decomposition of a matrix that depends on the mean and the covariance of the channel, and typically results in a beamforming strategy along the principal eigenvector of that matrix. Simulation results show the potential impact of (jointly) exploiting the mean and the covariance of the channel on the ergodic achievable rate at both low and moderate- to-high SNRs.
{"title":"Rate-optimal MIMO transmission with mean and covariance feedback at low SNR","authors":"R. Gohary, W. Mesbah, T. Davidson","doi":"10.1109/TVT.2009.2015670","DOIUrl":"https://doi.org/10.1109/TVT.2009.2015670","url":null,"abstract":"We consider a multiple-input multiple-output (MIMO) wireless communication scenario in which the channel follows a general spatially-correlated complex Gaussian distribution with non-zero mean. We derive an explicit characterization of the optimal input covariance from an ergodic rate perspective for systems that operate at low SNRs. This characterization is in terms of the eigen decomposition of a matrix that depends on the mean and the covariance of the channel, and typically results in a beamforming strategy along the principal eigenvector of that matrix. Simulation results show the potential impact of (jointly) exploiting the mean and the covariance of the channel on the ergodic achievable rate at both low and moderate- to-high SNRs.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125222574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-06DOI: 10.1109/ICASSP.2008.4517787
Sui-Yuk Lam, O. Au, P. Wong
The state-of-the-art H.264/AVC video coding standard achieves significant improvements in coding efficiency by introducing many new coding techniques. However the computation complexity is inevitably increased during both the encoding and decoding process. Many previous works, such as fast motion estimation and fast mode decision algorithms, have been proposed aiming at reducing the encoder complexity while maintaining the coding efficiency. In this paper, we propose a new encoding approach which accounts for the decoding complexity. Simulation results show that the decoding complexity can be reduced by up to 15% in terms of motion compensation operations, which is the most complex part of the decoder, while maintaining the R-D performance with only about 0.1 dB degradation.
{"title":"Complexity adaptive H.264 encoding using multiple reference frames","authors":"Sui-Yuk Lam, O. Au, P. Wong","doi":"10.1109/ICASSP.2008.4517787","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517787","url":null,"abstract":"The state-of-the-art H.264/AVC video coding standard achieves significant improvements in coding efficiency by introducing many new coding techniques. However the computation complexity is inevitably increased during both the encoding and decoding process. Many previous works, such as fast motion estimation and fast mode decision algorithms, have been proposed aiming at reducing the encoder complexity while maintaining the coding efficiency. In this paper, we propose a new encoding approach which accounts for the decoding complexity. Simulation results show that the decoding complexity can be reduced by up to 15% in terms of motion compensation operations, which is the most complex part of the decoder, while maintaining the R-D performance with only about 0.1 dB degradation.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127756511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518530
A. McCree, K. Brady, T. Quatieri
We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and cross-channel noise cancellation. One coder uses a 600 bps scalable phonetic vocoder, with a phonetic speech recognizer followed by joint predictive vector quantization of the error in wideband MELP parameters. The second coder operates at 300 bps with fixed 80 ms segments, using novel variable-rate multistage matrix quantization techniques. Formal test results show that both coders achieve equivalent intelligibility to the 2.4 kbps NATO standard MELPe coder in harsh acoustic noise environments, at much lower bit rates, with only modest quality loss.
{"title":"Multisensor very lowbit rate speech coding using segment quantization","authors":"A. McCree, K. Brady, T. Quatieri","doi":"10.1109/ICASSP.2008.4518530","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518530","url":null,"abstract":"We present two approaches to noise robust very low bit rate speech coding using wideband MELP analysis/synthesis. Both methods exploit multiple acoustic and non-acoustic input sensors, using our previously-presented dynamic waveform fusion algorithm to simultaneously perform waveform fusion, noise suppression, and cross-channel noise cancellation. One coder uses a 600 bps scalable phonetic vocoder, with a phonetic speech recognizer followed by joint predictive vector quantization of the error in wideband MELP parameters. The second coder operates at 300 bps with fixed 80 ms segments, using novel variable-rate multistage matrix quantization techniques. Formal test results show that both coders achieve equivalent intelligibility to the 2.4 kbps NATO standard MELPe coder in harsh acoustic noise environments, at much lower bit rates, with only modest quality loss.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114987539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517976
S. Ababneh, R. Ansari, A. Khokhar
This paper presents an image authentication scheme based on compensated watermarking employing a Lagrangian-based closed-form solution to compensate for signature perturbation due to the embedding operation. The proposed scheme uses a spread-spectrum based watermarking technique and a blind detector, thus making it attractive for applications that may not have the original image available at the time of authentication. Existing compensated signature embedding frameworks use an iterative mechanism to reach a desired compensation. The iterative approach is time consuming and less effective than the closed-form approach proposed in this paper, which performs an accurate compensation in one step while meeting the minimum distortion criteria of image least mean square distortion to guarantee image fidelity. Simulation results are presented to show the proposed scheme's efficiency and accuracy.
{"title":"Improved image authentication using closed-form compensation and spread-spectrum watermarking","authors":"S. Ababneh, R. Ansari, A. Khokhar","doi":"10.1109/ICASSP.2008.4517976","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517976","url":null,"abstract":"This paper presents an image authentication scheme based on compensated watermarking employing a Lagrangian-based closed-form solution to compensate for signature perturbation due to the embedding operation. The proposed scheme uses a spread-spectrum based watermarking technique and a blind detector, thus making it attractive for applications that may not have the original image available at the time of authentication. Existing compensated signature embedding frameworks use an iterative mechanism to reach a desired compensation. The iterative approach is time consuming and less effective than the closed-form approach proposed in this paper, which performs an accurate compensation in one step while meeting the minimum distortion criteria of image least mean square distortion to guarantee image fidelity. Simulation results are presented to show the proposed scheme's efficiency and accuracy.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115141216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518050
Chao Yuan, Claus Neubauer
We propose a dynamic Bayesian framework for sensor estimation, a critical step of many machine condition monitoring systems. The temporal behavior of normal sensor data is described by a stationary switching autoregressive (SSAR) model that possesses two advantages over traditional switching autoregressive (SAR) models. First, the SSAR model removes time dependency of signals during mode switching and fits sensor data better. Secondly, the SSAR model is stationary in that at each time, sensor data have the same distribution which represents the normal operating range of a system; this ensures that estimates are accurate and are not distracted by deviations. During monitoring the deviation covariance is estimated adaptively, which effectively handles variable levels of deviations. Tests on gas turbine data are presented.
{"title":"Robust sensor estimation using temporal information","authors":"Chao Yuan, Claus Neubauer","doi":"10.1109/ICASSP.2008.4518050","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518050","url":null,"abstract":"We propose a dynamic Bayesian framework for sensor estimation, a critical step of many machine condition monitoring systems. The temporal behavior of normal sensor data is described by a stationary switching autoregressive (SSAR) model that possesses two advantages over traditional switching autoregressive (SAR) models. First, the SSAR model removes time dependency of signals during mode switching and fits sensor data better. Secondly, the SSAR model is stationary in that at each time, sensor data have the same distribution which represents the normal operating range of a system; this ensures that estimates are accurate and are not distracted by deviations. During monitoring the deviation covariance is estimated adaptively, which effectively handles variable levels of deviations. Tests on gas turbine data are presented.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115143101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4517779
Jie Lin, J. Ming, D. Crookes
This paper presents a new approach to face recognition where the images are subject to unknown, partial distortion/occlusion. The new approach is a probabilistic decision-based neural network (PDBNN), built on a statistical method called the posterior union model (PUM). PUM is an approach for ignoring severely mismatched local features and focusing the recognition mainly on the matched local features. It thereby improves the robustness while assuming no prior information about the corruption. We call the new approach the posterior union decision-based neural network (PUDBNN). The new PUDBNN has been evaluated on two face image databases, XM2VTS and ORL, using testing images subjected to various types of partial distortion and occlusion. The new system has demonstrated improved performance over other systems.
{"title":"A probabilistic union approach to robust face recognition with partial distortion and occlusion","authors":"Jie Lin, J. Ming, D. Crookes","doi":"10.1109/ICASSP.2008.4517779","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4517779","url":null,"abstract":"This paper presents a new approach to face recognition where the images are subject to unknown, partial distortion/occlusion. The new approach is a probabilistic decision-based neural network (PDBNN), built on a statistical method called the posterior union model (PUM). PUM is an approach for ignoring severely mismatched local features and focusing the recognition mainly on the matched local features. It thereby improves the robustness while assuming no prior information about the corruption. We call the new approach the posterior union decision-based neural network (PUDBNN). The new PUDBNN has been evaluated on two face image databases, XM2VTS and ORL, using testing images subjected to various types of partial distortion and occlusion. The new system has demonstrated improved performance over other systems.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115421576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518702
J. Bellegarda
Part-of-speech tagging is a necessary pre-processing step for many natural language tasks. Recent statistical approaches, such as conditional random fields, rely on well chosen feature functions to ensure that important characteristics of the empirical training distribution are reflected in the trained model. In practice, however, it is not always clear how to best select these feature functions in order to obtain a suitably robust model. This paper proposes an alternative strategy based on the principle of latent analogy. For each sentence under consideration, we construct a neighborhood of globally relevant training sentences through an appropriate data-driven mapping of the input surface form. Tagging then proceeds via locally optimal sequence alignment and maximum likelihood position scoring. Empirical evidence shows that this solution is competitive with state-of-the-art Markovian techniques.
{"title":"A novel approach to part-of-speech tagging based on latent analogy","authors":"J. Bellegarda","doi":"10.1109/ICASSP.2008.4518702","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518702","url":null,"abstract":"Part-of-speech tagging is a necessary pre-processing step for many natural language tasks. Recent statistical approaches, such as conditional random fields, rely on well chosen feature functions to ensure that important characteristics of the empirical training distribution are reflected in the trained model. In practice, however, it is not always clear how to best select these feature functions in order to obtain a suitably robust model. This paper proposes an alternative strategy based on the principle of latent analogy. For each sentence under consideration, we construct a neighborhood of globally relevant training sentences through an appropriate data-driven mapping of the input surface form. Tagging then proceeds via locally optimal sequence alignment and maximum likelihood position scoring. Empirical evidence shows that this solution is competitive with state-of-the-art Markovian techniques.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115468257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518304
Mingzheng Cao, H. Ge
In this work we study the performance degradation caused by the in-phase/quadrature (I/Q) imbalance in space-time block coded (STBC) multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) communication systems. The 2-Tx Alamouti scheme, 4-Tx quasi- orthogonal STBC (QOSTBC) scheme, and 4-Tx rotated QOSTBC (RQOSTBC) scheme with I/Q imbalance are examined in details. Our study shows that I/Q imbalance causes severe distortion in STBC MIMO-OFDM systems. By exploiting the structure of the received signal, low-complexity solutions are developed to mitigate the resultant distortion successfully.
{"title":"I/Q imbalance mitigation for STBC MIMO-OFDM communication systems","authors":"Mingzheng Cao, H. Ge","doi":"10.1109/ICASSP.2008.4518304","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518304","url":null,"abstract":"In this work we study the performance degradation caused by the in-phase/quadrature (I/Q) imbalance in space-time block coded (STBC) multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) communication systems. The 2-Tx Alamouti scheme, 4-Tx quasi- orthogonal STBC (QOSTBC) scheme, and 4-Tx rotated QOSTBC (RQOSTBC) scheme with I/Q imbalance are examined in details. Our study shows that I/Q imbalance causes severe distortion in STBC MIMO-OFDM systems. By exploiting the structure of the received signal, low-complexity solutions are developed to mitigate the resultant distortion successfully.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115611388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518093
M. Ryynänen, Anssi Klapuri
This paper proposes a query by humming method based on locality sensitive hashing (LSH). The method constructs an index of melodic fragments by extracting pitch vectors from a database of melodies. In retrieval, the method automatically transcribes a sung query into notes and then extracts pitch vectors similarly to the index construction. For each query pitch vector, the method searches for similar melodic fragments in the database to obtain a list of candidate melodies. This is performed efficiently by using LSH. The candidate melodies are ranked by their distance to the entire query and returned to the user. In our experiments, the method achieved mean reciprocal rank of 0.885 for 2797 queries when searching from a database of 6030 MIDI melodies. To retrieve audio signals, we apply an automatic melody transcription method to construct the melody database directly from music recordings and report the corresponding retrieval results.
{"title":"Query by humming of midi and audio using locality sensitive hashing","authors":"M. Ryynänen, Anssi Klapuri","doi":"10.1109/ICASSP.2008.4518093","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518093","url":null,"abstract":"This paper proposes a query by humming method based on locality sensitive hashing (LSH). The method constructs an index of melodic fragments by extracting pitch vectors from a database of melodies. In retrieval, the method automatically transcribes a sung query into notes and then extracts pitch vectors similarly to the index construction. For each query pitch vector, the method searches for similar melodic fragments in the database to obtain a list of candidate melodies. This is performed efficiently by using LSH. The candidate melodies are ranked by their distance to the entire query and returned to the user. In our experiments, the method achieved mean reciprocal rank of 0.885 for 2797 queries when searching from a database of 6030 MIDI melodies. To retrieve audio signals, we apply an automatic melody transcription method to construct the melody database directly from music recordings and report the corresponding retrieval results.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115639903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-05-12DOI: 10.1109/ICASSP.2008.4518174
Ashraf Atalla, A. Jeremic
Localization of chemical sources and prediction of their spread is an important issue in many applications. We propose computationally efficient framework for localizing low-intensity chemical sources using stochastic differential equations. The main advantage of this technique lies in the fact that it accounts for random effects such as Brownian motion which are not accounted for in commonly used classical techniques based on Fick's law of diffusion. We model the dispersion using Fokker-Planck equation and derive corresponding inverse model. We then derive maximum likelihood estimator of source intensity, location and release time. We demonstrate the applicability of our results using numerical examples.
{"title":"Localization of chemical sources using stochastic differential equations","authors":"Ashraf Atalla, A. Jeremic","doi":"10.1109/ICASSP.2008.4518174","DOIUrl":"https://doi.org/10.1109/ICASSP.2008.4518174","url":null,"abstract":"Localization of chemical sources and prediction of their spread is an important issue in many applications. We propose computationally efficient framework for localizing low-intensity chemical sources using stochastic differential equations. The main advantage of this technique lies in the fact that it accounts for random effects such as Brownian motion which are not accounted for in commonly used classical techniques based on Fick's law of diffusion. We model the dispersion using Fokker-Planck equation and derive corresponding inverse model. We then derive maximum likelihood estimator of source intensity, location and release time. We demonstrate the applicability of our results using numerical examples.","PeriodicalId":333742,"journal":{"name":"2008 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2008-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115717403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}