Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625594
J. Huopaniemi, L. Savioja, M. Karjalainen
A method is presented for modeling sound propagation in rooms using a signal processing approach. Low order digital filters are designed to match to sound propagation transfer functions calculated from boundary material and air absorption data. The technique is applied to low frequency, finite difference time domain (FDTD) simulation of room acoustics and to real-time image-source based virtual acoustics.
{"title":"Modeling of reflections and air absorption in acoustical spaces a digital filter design approach","authors":"J. Huopaniemi, L. Savioja, M. Karjalainen","doi":"10.1109/ASPAA.1997.625594","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625594","url":null,"abstract":"A method is presented for modeling sound propagation in rooms using a signal processing approach. Low order digital filters are designed to match to sound propagation transfer functions calculated from boundary material and air absorption data. The technique is applied to low frequency, finite difference time domain (FDTD) simulation of room acoustics and to real-time image-source based virtual acoustics.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123504750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625625
Daniel P. W. Ellis
The field of computational auditory scene analysis (CASA) strives to build computer models of the human ability to interpret sound mixtures as the combination of distinct sources. A major obstacle to this enterprise is defining and incorporating the kind of high level knowledge of real-world signal structure exploited by listeners. Speech recognition, while typically ignoring the problem of nonspeech inclusions, has been very successful at deriving powerful statistical models of speech structure from training data. In this paper, we describe a scene analysis system that includes both speech and nonspeech components, addressing the problem of working backwards from speech recognizer output to estimate the speech component of a mixture. Ultimately, such hybrid approaches will require more radical adaptation of current speech recognition approaches.
{"title":"Computational auditory scene analysis exploiting speech-recognition knowledge","authors":"Daniel P. W. Ellis","doi":"10.1109/ASPAA.1997.625625","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625625","url":null,"abstract":"The field of computational auditory scene analysis (CASA) strives to build computer models of the human ability to interpret sound mixtures as the combination of distinct sources. A major obstacle to this enterprise is defining and incorporating the kind of high level knowledge of real-world signal structure exploited by listeners. Speech recognition, while typically ignoring the problem of nonspeech inclusions, has been very successful at deriving powerful statistical models of speech structure from training data. In this paper, we describe a scene analysis system that includes both speech and nonspeech components, addressing the problem of working backwards from speech recognizer output to estimate the speech component of a mixture. Ultimately, such hybrid approaches will require more radical adaptation of current speech recognition approaches.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114961822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625637
M. Brandstein
Generalized cross-correlation (GCC) has been the traditional method for estimating the relative time-delay associated with speech signals received by a pair of microphones in a reverberant, noisy environment. The filtering criterion employed is either focussed on the signal degradations due to additive noise or those due exclusively to multipath channel effects. There has been relatively little success at applying GCC weighting schemes which are robust to both of these conditions. This paper details an alternative approach which attempts to employ a signal dependent criterion, namely the estimated periodicity of harmonic spectral intervals, to design a GCC filter appropriate for the combination of noise and multipath signal distortions. Simulations are performed across a range of room conditions to illustrate the utility of the proposed time-delay estimation method relative to conventional GCC filtering approaches.
{"title":"A pitch-based approach to time-delay estimation of reverberant speech","authors":"M. Brandstein","doi":"10.1109/ASPAA.1997.625637","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625637","url":null,"abstract":"Generalized cross-correlation (GCC) has been the traditional method for estimating the relative time-delay associated with speech signals received by a pair of microphones in a reverberant, noisy environment. The filtering criterion employed is either focussed on the signal degradations due to additive noise or those due exclusively to multipath channel effects. There has been relatively little success at applying GCC weighting schemes which are robust to both of these conditions. This paper details an alternative approach which attempts to employ a signal dependent criterion, namely the estimated periodicity of harmonic spectral intervals, to design a GCC filter appropriate for the combination of noise and multipath signal distortions. Simulations are performed across a range of room conditions to illustrate the utility of the proposed time-delay estimation method relative to conventional GCC filtering approaches.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129360153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625583
W. Knecht
This paper discusses some important issues concerning feedback cancellation in hearing aids using adaptive filters. We summarize the main benefits of an anti-feedback system and comment on the pros and cons of a frequency-domain implementation. The length of the adaptive filter is dealt with and important delay elements in the algorithm are described. Finally, we comment on the problem of adaptation in the presence of uncorrelated additive noise which, in this case, is the desired input signal to the hearing aid.
{"title":"Some notes on feedback suppression with adaptive filters in hearing aids","authors":"W. Knecht","doi":"10.1109/ASPAA.1997.625583","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625583","url":null,"abstract":"This paper discusses some important issues concerning feedback cancellation in hearing aids using adaptive filters. We summarize the main benefits of an anti-feedback system and comment on the pros and cons of a frequency-domain implementation. The length of the adaptive filter is dealt with and important delay elements in the algorithm are described. Finally, we comment on the problem of adaptation in the presence of uncorrelated additive noise which, in this case, is the desired input signal to the hearing aid.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625624
P. Divenyi, R. Carré, A. P. Algazi
Experiments were conducted to determine the extent to which a fundamental frequency or formant frequency transition influenced segregation of a simultaneous pair of single-formant harmonic complexes. Results showed that even a minute transition facilitated segregation. The effect was larger for formant frequency than fundamental frequency transitions. It is concluded that dynamic aspects of speech must be taken into account when explaining auditory scene analysis by humans and when designing computational scene analysis methods.
{"title":"Auditory segregation of vowel-like sounds with static and dynamic spectral properties","authors":"P. Divenyi, R. Carré, A. P. Algazi","doi":"10.1109/ASPAA.1997.625624","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625624","url":null,"abstract":"Experiments were conducted to determine the extent to which a fundamental frequency or formant frequency transition influenced segregation of a simultaneous pair of single-formant harmonic complexes. Results showed that even a minute transition facilitated segregation. The effect was larger for formant frequency than fundamental frequency transitions. It is concluded that dynamic aspects of speech must be taken into account when explaining auditory scene analysis by humans and when designing computational scene analysis methods.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"53 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133135347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625578
J. Fernandez Ramos, J. C. Tejero, J. A. Hidalgo, M. J. Martin, A. Gago
This paper describes the design and evaluation of a circuit which performs the compression of narrow-band signals within a multiband analog system for a hearing aid. The system has twelve narrow-band modules. Each module is formed by four stages. The first stage is a band-pass filter which selects the bandwidth of the module. The second stage, the object of this paper, is a compression circuit which performs a nonlinear operation for removing the attack and release times typical of automatic gain control systems. The third stage is another band-pass filter like the first, the function of which is to reduce the distortion produced by the compression stage. The fourth stage is a controlled linear gain stage. The simulation and experimental results obtained show that the compression circuit has good accuracy within the dynamic range of speech signals. The output narrow-band filter reduces to a great extent the harmonic and intermodulation distortion inherent in all compression systems when the input signal has some important formants very close.
{"title":"Compression circuit of a multiband analog system for hearing aid","authors":"J. Fernandez Ramos, J. C. Tejero, J. A. Hidalgo, M. J. Martin, A. Gago","doi":"10.1109/ASPAA.1997.625578","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625578","url":null,"abstract":"This paper describes the design and evaluation of a circuit which performs the compression of narrow-band signals within a multiband analog system for a hearing aid. The system has twelve narrow-band modules. Each module is formed by four stages. The first stage is a band-pass filter which selects the bandwidth of the module. The second stage, the object of this paper, is a compression circuit which performs a nonlinear operation for removing the attack and release times typical of automatic gain control systems. The third stage is another band-pass filter like the first, the function of which is to reduce the distortion produced by the compression stage. The fourth stage is a controlled linear gain stage. The simulation and experimental results obtained show that the compression circuit has good accuracy within the dynamic range of speech signals. The output narrow-band filter reduces to a great extent the harmonic and intermodulation distortion inherent in all compression systems when the input signal has some important formants very close.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124122327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625639
H. Wang, P. Chu
This paper describes the voice source localization algorithm used in the PictureTel automatic camera pointing system (LimeLight/sup TM/, dynamic speech locating technology). The system uses an array of 46 cm wide and 30 cm high, which contains 4 microphones, and is mounted on top of the monitor. The three dimensional position of a sound source is calculated from the time delays of 4 pairs of microphones. In time delay estimation, the averaging of signal onsets of each frequency band is combined with phase correlation to reduce the influence of noise and reverberation. With this approach, it is possible to provide reliable three dimensional voice source localization by a small microphone array. Post processing based on a priori knowledge is also introduced to eliminate the influences of reflections from furniture such as tables. Results of speech source localization under real conference room conditions are given. Some system related issues are also discussed.
{"title":"Voice source localization for automatic camera pointing system in videoconferencing","authors":"H. Wang, P. Chu","doi":"10.1109/ASPAA.1997.625639","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625639","url":null,"abstract":"This paper describes the voice source localization algorithm used in the PictureTel automatic camera pointing system (LimeLight/sup TM/, dynamic speech locating technology). The system uses an array of 46 cm wide and 30 cm high, which contains 4 microphones, and is mounted on top of the monitor. The three dimensional position of a sound source is calculated from the time delays of 4 pairs of microphones. In time delay estimation, the averaging of signal onsets of each frequency band is combined with phase correlation to reduce the influence of noise and reverberation. With this approach, it is possible to provide reliable three dimensional voice source localization by a small microphone array. Post processing based on a priori knowledge is also introduced to eliminate the influences of reflections from furniture such as tables. Results of speech source localization under real conference room conditions are given. Some system related issues are also discussed.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132146130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625604
F. Kurth, M. Clausen
This article discusses M-band wavelet packets which combine the well-known construction of 2-band wavelet packets with concepts of M-band wavelet theory. To make the resulting tilings of the time-frequency plane even more flexible, the concept of a filter bank tree (FBT) is presented. Within this framework the design of decimated filter bank cascades, realizing some arbitrary time-frequency tiling, is possible. Extensive tests on the denoising of high fidelity audio signals and comparison with several standard wavelet packet denoising techniques show the advantages of the new methods. The generality of the concept suggests its application to other problems, not necessarily restricted to the field of audio signal processing.
{"title":"M-band wavelet packets and filter bank trees as flexible tools in audio signal processing","authors":"F. Kurth, M. Clausen","doi":"10.1109/ASPAA.1997.625604","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625604","url":null,"abstract":"This article discusses M-band wavelet packets which combine the well-known construction of 2-band wavelet packets with concepts of M-band wavelet theory. To make the resulting tilings of the time-frequency plane even more flexible, the concept of a filter bank tree (FBT) is presented. Within this framework the design of decimated filter bank cascades, realizing some arbitrary time-frequency tiling, is possible. Extensive tests on the denoising of high fidelity audio signals and comparison with several standard wavelet packet denoising techniques show the advantages of the new methods. The generality of the concept suggests its application to other problems, not necessarily restricted to the field of audio signal processing.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127015096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625600
P. Depalle, Thomas Hélie
A new method which improves the estimation of frequency, amplitude and phase of the partials of a sound is presented. It allows the reduction of the analysis-window size from four periods to two periods. It therefore gives better accuracy in parameter determination, and has proved to remain efficient at low signal-to-noise ratios. The basic idea consists of using a parametric modeling of the short-time Fourier transform. The method alternately estimates the complex amplitudes and the frequencies starting from the result of the classical analysis method. It uses the least-square procedure and a first-order limited expansion of the model around previous estimations. This method leads us to design new windows which do not have any sidelobes in order to help the convergence. Finally an analysis algorithm which has been built according to the observed behavior of the method for various kinds of sound is presented.
{"title":"Extraction of spectral peak parameters using a short-time Fourier transform modeling and no sidelobe windows","authors":"P. Depalle, Thomas Hélie","doi":"10.1109/ASPAA.1997.625600","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625600","url":null,"abstract":"A new method which improves the estimation of frequency, amplitude and phase of the partials of a sound is presented. It allows the reduction of the analysis-window size from four periods to two periods. It therefore gives better accuracy in parameter determination, and has proved to remain efficient at low signal-to-noise ratios. The basic idea consists of using a parametric modeling of the short-time Fourier transform. The method alternately estimates the complex amplitudes and the frequencies starting from the result of the classical analysis method. It uses the least-square procedure and a first-order limited expansion of the model around previous estimations. This method leads us to design new windows which do not have any sidelobes in order to help the convergence. Finally an analysis algorithm which has been built according to the observed behavior of the method for various kinds of sound is presented.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132566831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-10-19DOI: 10.1109/ASPAA.1997.625577
J. A. Hidalgo, J. C. Tejero, A. Daza, O. Oballe, A. Gago
We introduce a core for a digital hearing aid that compensates the signal spoken in sensorineural impaired listeners with object of improving their intelligibility. The technique implemented is based on a digital analysis/synthesis of speech: we divided the input signal into short time blocks then we make a multiband analysis, non-linear amplification and synthesis based in a sinusoidal model of the voice, according to the subject's dynamic range in each band. The system works in real time and has been implemented with only one ASIC in 1/spl mu/ ES2 technology including 3 RAM memories with a capacity of 2432 bits and one 16/spl times/16 multiplier. The size of the die is 30.59 mm/sup 2/.
{"title":"A microelectronic core for a programmable digital hearing aid","authors":"J. A. Hidalgo, J. C. Tejero, A. Daza, O. Oballe, A. Gago","doi":"10.1109/ASPAA.1997.625577","DOIUrl":"https://doi.org/10.1109/ASPAA.1997.625577","url":null,"abstract":"We introduce a core for a digital hearing aid that compensates the signal spoken in sensorineural impaired listeners with object of improving their intelligibility. The technique implemented is based on a digital analysis/synthesis of speech: we divided the input signal into short time blocks then we make a multiband analysis, non-linear amplification and synthesis based in a sinusoidal model of the voice, according to the subject's dynamic range in each band. The system works in real time and has been implemented with only one ASIC in 1/spl mu/ ES2 technology including 3 RAM memories with a capacity of 2432 bits and one 16/spl times/16 multiplier. The size of the die is 30.59 mm/sup 2/.","PeriodicalId":347087,"journal":{"name":"Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115915727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}