Pub Date : 1994-09-01DOI: 10.1109/ICASSP.1994.389877
Chong-Yung Chi, J. Hwang, C. Rau
In this paper, a new nonlinear parameter estimation method for a noncausal autoregressive (AR) system based on a new quadratic equation relating the unknown AR parameters to higher-order (/spl ges/3) cumulants of non-Gaussian output measurements in the presence of additive Gaussian noise, is described. It is applicable no matter whether or not the order of the system is known in advance; it is also applicable for the case of causal AR system. Some simulation results are offered to justify that the proposed method is effective.<>
{"title":"A new cumulant based parameter estimation method for noncausal autoregressive systems","authors":"Chong-Yung Chi, J. Hwang, C. Rau","doi":"10.1109/ICASSP.1994.389877","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389877","url":null,"abstract":"In this paper, a new nonlinear parameter estimation method for a noncausal autoregressive (AR) system based on a new quadratic equation relating the unknown AR parameters to higher-order (/spl ges/3) cumulants of non-Gaussian output measurements in the presence of additive Gaussian noise, is described. It is applicable no matter whether or not the order of the system is known in advance; it is also applicable for the case of causal AR system. Some simulation results are offered to justify that the proposed method is effective.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117187412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-05-26DOI: 10.1109/ICASSP.1994.389219
Yaxin Zhang, M. Alder, R. Togneri
The paper describes a speaker-independent isolated word recognition system which uses a well known technique, the combination of vector quantization with hidden Markov modeling. The conventional vector quantization algorithm is substituted by a statistical clustering algorithm, the expectation-maximization algorithm, in this system. Based on the investigation of the data space, the phonemes were manually extracted from the training data and were used to generate the Gaussians in a code book in which each code word is a Gaussian rather than a centroid vector of the data class. Word-based hidden Markov modeling was then performed. Two English isolated digits data bases were investigated and the 12 Mel-spaced filter bank coefficients employed as the input feature. Compared with the conventional discrete HMM, the present system obtained a significant improvement of recognition accuracy.<>
{"title":"Using Gaussian mixture modeling in speech recognition","authors":"Yaxin Zhang, M. Alder, R. Togneri","doi":"10.1109/ICASSP.1994.389219","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389219","url":null,"abstract":"The paper describes a speaker-independent isolated word recognition system which uses a well known technique, the combination of vector quantization with hidden Markov modeling. The conventional vector quantization algorithm is substituted by a statistical clustering algorithm, the expectation-maximization algorithm, in this system. Based on the investigation of the data space, the phonemes were manually extracted from the training data and were used to generate the Gaussians in a code book in which each code word is a Gaussian rather than a centroid vector of the data class. Word-based hidden Markov modeling was then performed. Two English isolated digits data bases were investigated and the 12 Mel-spaced filter bank coefficients employed as the input feature. Compared with the conventional discrete HMM, the present system obtained a significant improvement of recognition accuracy.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116218323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389959
S. R. Pillai, G. H. Allen
A new higher order generalization of magnitude and power complementary filters is proposed in this paper. The proposed scheme is shown to have superior frequency characteristics compared to the ordinary complementary filters. Applications of these generalized complementary filters include subband coding for audio and video, and sharpening of amplitude characteristics of digital filters. Interestingly, this new design procedure can be used to generate ordinary multichannel magnitude and power complementary filters with sharper band responses.<>
{"title":"Generalized magnitude and power complementary filters","authors":"S. R. Pillai, G. H. Allen","doi":"10.1109/ICASSP.1994.389959","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389959","url":null,"abstract":"A new higher order generalization of magnitude and power complementary filters is proposed in this paper. The proposed scheme is shown to have superior frequency characteristics compared to the ordinary complementary filters. Applications of these generalized complementary filters include subband coding for audio and video, and sharpening of amplitude characteristics of digital filters. Interestingly, this new design procedure can be used to generate ordinary multichannel magnitude and power complementary filters with sharper band responses.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123135703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389358
S. Krishnan, P. Rao
We propose an efficient, self-organizing segmental measurement based on piecewise linear regression (PLR) fit of the short-term measurement trajectories. The advantages of this description are: (i) it serves to decouple temporal measurements from the recognition strategy; and, (ii) it leads to lesser computation as compared with conventional methods. Also, acoustic context can be easily integrated into this framework. The PLR measurements are cast into a stochastic segmental framework for phoneme classification. We show that this requires static classifiers for each regression component. Finally, we evaluate this approach on the phoneme recognition task. Using the TIMIT database. This shows that the PLR description leads to a computationally simple alternative to existing approaches.<>
{"title":"Segmental phoneme recognition using piecewise linear regression","authors":"S. Krishnan, P. Rao","doi":"10.1109/ICASSP.1994.389358","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389358","url":null,"abstract":"We propose an efficient, self-organizing segmental measurement based on piecewise linear regression (PLR) fit of the short-term measurement trajectories. The advantages of this description are: (i) it serves to decouple temporal measurements from the recognition strategy; and, (ii) it leads to lesser computation as compared with conventional methods. Also, acoustic context can be easily integrated into this framework. The PLR measurements are cast into a stochastic segmental framework for phoneme classification. We show that this requires static classifiers for each regression component. Finally, we evaluate this approach on the phoneme recognition task. Using the TIMIT database. This shows that the PLR description leads to a computationally simple alternative to existing approaches.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"428 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123096441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389738
W. Padgett, Douglas B. Williams
This paper describes a novel method for detecting nonstationary signals in colored noise. A first order complex autoregressive, or AR(1), signal model is used which restricts the application of the detector to low order signals, i.e., those which are well modeled by a low order AR process and have only a single spectral peak. The detector assumes the noise covariance is stationary and known. The likelihood function is estimated in the frequency domain because the model simplifies, and the nonstationary frequency estimate can be obtained by an algorithm which approximates the Viterbi algorithm. The AR model parameters are then used to form the appropriate covariance matrix and the approximate likelihood is calculated. Therefore, the detector uses efficient approximations to approximate the generalized likelihood ratio test (GLRT). Simulation results are shown to compare the detector with the known signal likelihood ratio test.<>
{"title":"Detection of nonstationary random signals in colored noise","authors":"W. Padgett, Douglas B. Williams","doi":"10.1109/ICASSP.1994.389738","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389738","url":null,"abstract":"This paper describes a novel method for detecting nonstationary signals in colored noise. A first order complex autoregressive, or AR(1), signal model is used which restricts the application of the detector to low order signals, i.e., those which are well modeled by a low order AR process and have only a single spectral peak. The detector assumes the noise covariance is stationary and known. The likelihood function is estimated in the frequency domain because the model simplifies, and the nonstationary frequency estimate can be obtained by an algorithm which approximates the Viterbi algorithm. The AR model parameters are then used to form the appropriate covariance matrix and the approximate likelihood is calculated. Therefore, the detector uses efficient approximations to approximate the generalized likelihood ratio test (GLRT). Simulation results are shown to compare the detector with the known signal likelihood ratio test.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124276070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389804
Weimin Zhang
The frequency domain filtering method treats the i.i.d. Gaussian samples as in the frequency domain and thus the generation of out-band samples is not necessary. It is efficient when the Doppler bandwidth is low in comparison with the sampling rate, such as in simulations of multipath fading channels. A proposed time domain smooth joining scheme maintains the mean and variance unchanged and controls the power spectral density distortion. The data window used in the joining sessions is cosine and the resultant autocorrelation window ranges from triangle to Papoulis, depending on the degree of overlapping. The spectrum distortions are observed as a trade-off with computational efficiency. It can be seen as a reciprocal work of power spectrum estimation.<>
{"title":"A frequency domain filtering method for generation of long complex Gaussian sequences with required spectra","authors":"Weimin Zhang","doi":"10.1109/ICASSP.1994.389804","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389804","url":null,"abstract":"The frequency domain filtering method treats the i.i.d. Gaussian samples as in the frequency domain and thus the generation of out-band samples is not necessary. It is efficient when the Doppler bandwidth is low in comparison with the sampling rate, such as in simulations of multipath fading channels. A proposed time domain smooth joining scheme maintains the mean and variance unchanged and controls the power spectral density distortion. The data window used in the joining sessions is cosine and the resultant autocorrelation window ranges from triangle to Papoulis, depending on the degree of overlapping. The spectrum distortions are observed as a trade-off with computational efficiency. It can be seen as a reciprocal work of power spectrum estimation.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125214358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389370
S. Nandkumar, J. Hansen
A speech enhancement technique is proposed, where auditory based properties of perception are investigated for the purpose of robust speech characterization and improved speech quality in additive background noise. Constraints based on a novel auditory spectral representation are developed in a dual-channel iterative Wiener filtering framework. The spectral representation model aspects of audition include critical band filtering, intensity to loudness conversion, and lateral inhibition. The auditory transformations and perceptual constraints are shown to result in an improved set of auditory constrained and enhanced linear prediction (ACE-LP) parameters. Objective measures and informal listening tests show improved speech quality for both white Gaussian and colored noise cases. The consistency of speech quality improvement is illustrated over time and across all phonemes from a set of phonetically labeled TIMIT database sentences.<>
{"title":"Speech enhancement based on a new set of auditory constrained parameters","authors":"S. Nandkumar, J. Hansen","doi":"10.1109/ICASSP.1994.389370","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389370","url":null,"abstract":"A speech enhancement technique is proposed, where auditory based properties of perception are investigated for the purpose of robust speech characterization and improved speech quality in additive background noise. Constraints based on a novel auditory spectral representation are developed in a dual-channel iterative Wiener filtering framework. The spectral representation model aspects of audition include critical band filtering, intensity to loudness conversion, and lateral inhibition. The auditory transformations and perceptual constraints are shown to result in an improved set of auditory constrained and enhanced linear prediction (ACE-LP) parameters. Objective measures and informal listening tests show improved speech quality for both white Gaussian and colored noise cases. The consistency of speech quality improvement is illustrated over time and across all phonemes from a set of phonetically labeled TIMIT database sentences.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125233662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389446
T. Fujii, H. Harashima
This paper is concerned with the data compression and interpolation of multi-view images. We propose the affine-based disparity compensation based on a geometric relationship. We first investigate the geometric relationship between the point in object space and its projection onto a view image. Then, we propose the disparity compensation based on the affine transform, which utilize the geometric constraints between view images. In this scheme, multi-view images are compressed into the structure and texture of the triangular patches. This scheme not only compresses the multi-view image but also synthesize the view images from any viewpoints in the viewing zone, because the geometric relationship is taken into account. Finally, we report an experiment, where 19 view images were used as the original multi-view image and the amount of data was reduced to 1/19 with an SNR of 34 dB.<>
{"title":"3-D image coding based on affine transform","authors":"T. Fujii, H. Harashima","doi":"10.1109/ICASSP.1994.389446","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389446","url":null,"abstract":"This paper is concerned with the data compression and interpolation of multi-view images. We propose the affine-based disparity compensation based on a geometric relationship. We first investigate the geometric relationship between the point in object space and its projection onto a view image. Then, we propose the disparity compensation based on the affine transform, which utilize the geometric constraints between view images. In this scheme, multi-view images are compressed into the structure and texture of the triangular patches. This scheme not only compresses the multi-view image but also synthesize the view images from any viewpoints in the viewing zone, because the geometric relationship is taken into account. Finally, we report an experiment, where 19 view images were used as the original multi-view image and the amount of data was reduced to 1/19 with an SNR of 34 dB.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"50 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116700399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389264
J. Smolders, T. Claes, Gert Sablon, Dirk Van Compernolle
One of the problems with speech recognition in the car is the position of the far talk microphone. This position not only implies more or less noise, coming from the car (engine, tires,...) or from other sources (traffic, wind noise,...) but also a different acoustical transfer function. In order to compare the microphone positions in the car, we recorded a multispeaker database in a car with 7 different positions and compared them on the basis of SNR and recognition rate. The position at the ceiling right in front of the speaker gave the best results.<>
{"title":"On the importance of the microphone position for speech recognition in the car","authors":"J. Smolders, T. Claes, Gert Sablon, Dirk Van Compernolle","doi":"10.1109/ICASSP.1994.389264","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389264","url":null,"abstract":"One of the problems with speech recognition in the car is the position of the far talk microphone. This position not only implies more or less noise, coming from the car (engine, tires,...) or from other sources (traffic, wind noise,...) but also a different acoustical transfer function. In order to compare the microphone positions in the car, we recorded a multispeaker database in a car with 7 different positions and compared them on the basis of SNR and recognition rate. The position at the ceiling right in front of the speaker gave the best results.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117202930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-04-19DOI: 10.1109/ICASSP.1994.389400
G. Akar, A. Tekalp, L. Onural
We address the problem of 3-D motion estimation in the context of knowledge-based coding of facial image sequences. The proposed method handles the global and local motion estimation and the adaptation of a generic wire-frame to a particular speaker simultaneously within an optical flow based framework including the photometric effects of motion. We use a flexible wire-frame model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wire-frame model. For the initialization of the motion and structure parameters, a modified feature based algorithm is used. Experimental results with simulated facial image sequences are given.<>
{"title":"Simultaneous 3-D motion estimation and wire-frame model adaptation including photometric effects for knowledge-based video coding","authors":"G. Akar, A. Tekalp, L. Onural","doi":"10.1109/ICASSP.1994.389400","DOIUrl":"https://doi.org/10.1109/ICASSP.1994.389400","url":null,"abstract":"We address the problem of 3-D motion estimation in the context of knowledge-based coding of facial image sequences. The proposed method handles the global and local motion estimation and the adaptation of a generic wire-frame to a particular speaker simultaneously within an optical flow based framework including the photometric effects of motion. We use a flexible wire-frame model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wire-frame model. For the initialization of the motion and structure parameters, a modified feature based algorithm is used. Experimental results with simulated facial image sequences are given.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121367897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}