Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625591
F. Grigoras, H. Teodorescu, V. Apopei
Several techniques used in the analysis of dynamic nonlinear systems are applied in order to evidence and analyse some of the short-term nonlinear nonstationary characteristics of speech signal production. A new method of speech signal decomposition is introduced.
{"title":"Analysis of nonlinear and nonstationary processes in speech production","authors":"F. Grigoras, H. Teodorescu, V. Apopei","doi":"10.1109/ASPAA.1997.625591","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625591","url":null,"abstract":"Several techniques used in the analysis of dynamic nonlinear systems are applied in order to evidence and analyse some of the short-term nonlinear nonstationary characteristics of speech signal production. A new method of speech signal decomposition is introduced.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114679543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625629
M. Joho, G. Moschytz
In this paper an adaptive broadband beamformer is presented which is based on a partitioned frequency-domain least-mean-square algorithm (PFDLMS). This block algorithm is known for its efficient computation and fast convergence even when the input signals are correlated. In applications where long filters are required but only a small processing delay is allowed, a frequency domain adaptive beamformer without partitioning demands a large FFT length despite the small block size. The FFT length can be shortened significantly by filter partitioning, without increasing the number of FFT operations. The weaker requirement on the FFT size makes the algorithm attractive for acoustical applications.
{"title":"Adaptive beamforming with partitioned frequency-domain filters","authors":"M. Joho, G. Moschytz","doi":"10.1109/ASPAA.1997.625629","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625629","url":null,"abstract":"In this paper an adaptive broadband beamformer is presented which is based on a partitioned frequency-domain least-mean-square algorithm (PFDLMS). This block algorithm is known for its efficient computation and fast convergence even when the input signals are correlated. In applications where long filters are required but only a small processing delay is allowed, a frequency domain adaptive beamformer without partitioning demands a large FFT length despite the small block size. The FFT length can be shortened significantly by filter partitioning, without increasing the number of FFT operations. The weaker requirement on the FFT size makes the algorithm attractive for acoustical applications.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625589
C. Avendaño, H. Hermansky
In this paper we report on the results that we have obtained in the application of temporal processing to speech signals. We describe what are the properties that make temporal processing an interesting and useful technique to alleviate the harmful effects that environmental factors have on speech. Though temporal processing has been used in the past, its analysis and properties have not been studied in detail. We summarize some results that we obtained in a detailed analysis, and describe a data-driven design technique to design the processing. We demonstrate a speech enhancement system which illustrates some properties, advantages, and short-comings of the technique.
{"title":"On the properties of temporal processing for speech in adverse environments","authors":"C. Avendaño, H. Hermansky","doi":"10.1109/ASPAA.1997.625589","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625589","url":null,"abstract":"In this paper we report on the results that we have obtained in the application of temporal processing to speech signals. We describe what are the properties that make temporal processing an interesting and useful technique to alleviate the harmful effects that environmental factors have on speech. Though temporal processing has been used in the past, its analysis and properties have not been studied in detail. We summarize some results that we obtained in a detailed analysis, and describe a data-driven design technique to design the processing. We demonstrate a speech enhancement system which illustrates some properties, advantages, and short-comings of the technique.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127487907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625636
J. Huang, N. Ohnishi, N. Sugie
This paper presents a model-based method for sound localization of concurrent and continuous speech sources in a reverberant environment. A new algorithm adopted from the echo-avoidance model of the precedence effect was used to detect the echo-free onsets by specifying a generalized pattern of impulse response. Fine structure time differences were calculated from the zero-crossing points in different microphones. They were integrated into an azimuth histogram by the restrictions between them. Two sound sources were localized in both an anechoic chamber and a normal room which has walls, floor and ceiling made of concrete. The time segment needed for localization was 0.5 to 2 seconds and the accuracy was a few degrees in both environments.
{"title":"Sound localization of concurrent and continuous speech sources in reverberant environment","authors":"J. Huang, N. Ohnishi, N. Sugie","doi":"10.1109/ASPAA.1997.625636","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625636","url":null,"abstract":"This paper presents a model-based method for sound localization of concurrent and continuous speech sources in a reverberant environment. A new algorithm adopted from the echo-avoidance model of the precedence effect was used to detect the echo-free onsets by specifying a generalized pattern of impulse response. Fine structure time differences were calculated from the zero-crossing points in different microphones. They were integrated into an azimuth histogram by the restrictions between them. Two sound sources were localized in both an anechoic chamber and a normal room which has walls, floor and ceiling made of concrete. The time segment needed for localization was 0.5 to 2 seconds and the accuracy was a few degrees in both environments.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127157024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625622
Keith D. Martin
Neurophysiological evidence suggests that the so-called precedence effect is a composite of multiple phenomena, in particular echo suppression and "active" mechanisms that build up and release suppression. The authors propose a simple functional model of echo suppression in a population of low-frequency ITD-sensitive neurons in the inferior colliculus. Their model is based on Zurek's 1987 proposal, and they show that it is consistent with Zurek's 1980 psychophysical data by presenting the results of two experiments. The current model is extensible to other localization cues represented by rate-place codes, and the authors suggest that a model such as this is a necessary component of computational models of spatial hearing.
{"title":"Echo suppression in a computational model of the precedence effect","authors":"Keith D. Martin","doi":"10.1109/ASPAA.1997.625622","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625622","url":null,"abstract":"Neurophysiological evidence suggests that the so-called precedence effect is a composite of multiple phenomena, in particular echo suppression and \"active\" mechanisms that build up and release suppression. The authors propose a simple functional model of echo suppression in a population of low-frequency ITD-sensitive neurons in the inferior colliculus. Their model is based on Zurek's 1987 proposal, and they show that it is consistent with Zurek's 1980 psychophysical data by presenting the results of two experiments. The current model is extensible to other localization cues represented by rate-place codes, and the authors suggest that a model such as this is a necessary component of computational models of spatial hearing.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129547704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625626
K.H. Lehn
The psychophysical based modeling approach of computational auditory scene analysis helps to understand the human auditory system and contributes to the improvement of technical acoustical systems, e.g. hearing aids and hands free telephony. In the present paper the primitive auditory scene analysis (Bregman 1990) is characterized as a cluster analysis problem. This leads to a system based on a temporal fuzzy cluster analysis capable of reproducing psychoacoustical streaming experiments. Moreover, it is possible to effectively combine monaural and binaural features to produce a robust segmentation of auditory scenes. This also facilitates the separation of the original sound source signals.
{"title":"Modeling binaural auditory scene analysis by a temporal fuzzy cluster analysis approach","authors":"K.H. Lehn","doi":"10.1109/ASPAA.1997.625626","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625626","url":null,"abstract":"The psychophysical based modeling approach of computational auditory scene analysis helps to understand the human auditory system and contributes to the improvement of technical acoustical systems, e.g. hearing aids and hands free telephony. In the present paper the primitive auditory scene analysis (Bregman 1990) is characterized as a cluster analysis problem. This leads to a system based on a temporal fuzzy cluster analysis capable of reproducing psychoacoustical streaming experiments. Moreover, it is possible to effectively combine monaural and binaural features to produce a robust segmentation of auditory scenes. This also facilitates the separation of the original sound source signals.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131038386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625630
J. G. Ryan, R. Goubran
This paper describes the effects of optimizing the weights of an arbitrary microphone array for near-field target locations. Optimum near-field weights are shown to provide increased gain for near-field sources when compared to a uniformly weighted delay-and-sum beamformer. Practical improvements in array gain due to constrained optimization are shown to be greatest at locations close to the array and for microphone spacing which is small in relation to the operating wavelength.
{"title":"Optimum near-field response for microphone arrays","authors":"J. G. Ryan, R. Goubran","doi":"10.1109/ASPAA.1997.625630","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625630","url":null,"abstract":"This paper describes the effects of optimizing the weights of an arbitrary microphone array for near-field target locations. Optimum near-field weights are shown to provide increased gain for near-field sources when compared to a uniformly weighted delay-and-sum beamformer. Practical improvements in array gain due to constrained optimization are shown to be greatest at locations close to the array and for microphone spacing which is small in relation to the operating wavelength.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125734307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625606
M. Lorber, R. Hoeldrich
This paper deals with broadband noise reduction for the restoration of audio recordings. The signals are processed in the frequency domain using the short-time Fourier transform. A method based on non-linear spectral subtraction is presented. To prevent the annoying phenomenon of musical noise which is caused by the noise suppression process, over-subtraction is applied to the degraded signal spectrum. This means that depending on the estimated signal-to-noise ratio (SNR) more than the average noise spectrum is subtracted. A masking threshold obtained by spectral smoothing of the degraded signal spectrum is used to determine the SNR. Furthermore, time-averaging is applied to the SNR. The averaging procedures in the time and frequency domain reduce the SNR variance. Therefore, audible processing distortions are reduced, too.
{"title":"A combined approach for broadband noise reduction","authors":"M. Lorber, R. Hoeldrich","doi":"10.1109/ASPAA.1997.625606","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625606","url":null,"abstract":"This paper deals with broadband noise reduction for the restoration of audio recordings. The signals are processed in the frequency domain using the short-time Fourier transform. A method based on non-linear spectral subtraction is presented. To prevent the annoying phenomenon of musical noise which is caused by the noise suppression process, over-subtraction is applied to the degraded signal spectrum. This means that depending on the estimated signal-to-noise ratio (SNR) more than the average noise spectrum is subtracted. A masking threshold obtained by spectral smoothing of the degraded signal spectrum is used to determine the SNR. Furthermore, time-averaging is applied to the SNR. The averaging procedures in the time and frequency domain reduce the SNR variance. Therefore, audible processing distortions are reduced, too.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"13 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133708509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625585
Il-Taek Lim, J. Bahn
We perform a fixed-point analysis for the Dolby AC-3 audio decoding algorithm, and determine the suitable multiplier wordlength (say, N) satisfying the required sound quality. Then, based on the similar simulations, we try to reduce the accumulator wordlength from the usual (8+2N) to (g+N+r) where g is the wordlength for overflow guard bits and r is the wordlength for rounding with the condition r
{"title":"Fixed-point analysis and simulations of AC-3 algorithm","authors":"Il-Taek Lim, J. Bahn","doi":"10.1109/ASPAA.1997.625585","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625585","url":null,"abstract":"We perform a fixed-point analysis for the Dolby AC-3 audio decoding algorithm, and determine the suitable multiplier wordlength (say, N) satisfying the required sound quality. Then, based on the similar simulations, we try to reduce the accumulator wordlength from the usual (8+2N) to (g+N+r) where g is the wordlength for overflow guard bits and r is the wordlength for rounding with the condition r<N. To show that r bit rounding is enough, error signal waveforms are shown which are obtained by subtracting the floating-point simulation generated PCM samples from the r bit rounded fixed-point simulation generated PCM samples. In addition to them, frequency magnitude plots of the decoded PCM samples are shown, and compared with those of the floating-point decoded PCM samples.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121289890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625599
R. Maher
A stereophonic enhancement system is described which expands the perceived width of the stereo sound image. The system accepts electrical audio signals comprising a two-channel (left and right) stereo pair and produces enhanced left and right stereo signals for use with conventional two-channel audio recording and playback systems. The system includes control circuitry which monitors the dissimilarity of the left and right input signals and optionally the dissimilarity of the left and right processed output signals. An all-pass decorrelation subsystem can also be included for mono-to-stereo conversion of monophonic input signals prior to the spatial enhancement system.
{"title":"Single-ended spatial enhancement using a cross-coupled lattice equalizer","authors":"R. Maher","doi":"10.1109/ASPAA.1997.625599","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625599","url":null,"abstract":"A stereophonic enhancement system is described which expands the perceived width of the stereo sound image. The system accepts electrical audio signals comprising a two-channel (left and right) stereo pair and produces enhanced left and right stereo signals for use with conventional two-channel audio recording and playback systems. The system includes control circuitry which monitors the dissimilarity of the left and right input signals and optionally the dissimilarity of the left and right processed output signals. An all-pass decorrelation subsystem can also be included for mono-to-stereo conversion of monophonic input signals prior to the spatial enhancement system.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121423251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}