Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701830
Daniel J. Gillespie, D. Ellis
This paper introduces a novel method for the solution of guitar distortion circuits based on the use of kernels. The proposed algorithm uses a kernel regression framework to linearize the inherent nonlinear dynamical systems created by such circuits and proposes data and kernel selection algorithms well suited to learn the required regression parameters. Examples are presented using the One Capacitor Diode Clipper and the Common-Cathode Tube Amplifier.
{"title":"Modeling nonlinear circuits with linearized dynamical models via kernel regression","authors":"Daniel J. Gillespie, D. Ellis","doi":"10.1109/WASPAA.2013.6701830","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701830","url":null,"abstract":"This paper introduces a novel method for the solution of guitar distortion circuits based on the use of kernels. The proposed algorithm uses a kernel regression framework to linearize the inherent nonlinear dynamical systems created by such circuits and proposes data and kernel selection algorithms well suited to learn the required regression parameters. Examples are presented using the One Capacitor Diode Clipper and the Common-Cathode Tube Amplifier.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129126558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701864
R. McAulay
The sinusoidal system was reconfigured to use pitch synchronous overlap-add (PSOLA) synthesis so that pitch shifting could be achieved by moving the sine-wave parameters to the pitch-shifted synthesis frames. This, in turn, led to a pitch-marking technique based on the sine-wave phases that required no forward-backward searching for epochs, resulting in real-time pitch scaling. Having access to the sine wave amplitudes led to realistic re-shaping of the vocal tract characteristic, hence the system is well suited for real-time pitch scaling and vocal tract modification of speech.
{"title":"Sine-wave based PSOLA pitch scaling with real-time pitch marking","authors":"R. McAulay","doi":"10.1109/WASPAA.2013.6701864","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701864","url":null,"abstract":"The sinusoidal system was reconfigured to use pitch synchronous overlap-add (PSOLA) synthesis so that pitch shifting could be achieved by moving the sine-wave parameters to the pitch-shifted synthesis frames. This, in turn, led to a pitch-marking technique based on the sine-wave phases that required no forward-backward searching for epochs, resulting in real-time pitch scaling. Having access to the sine wave amplitudes led to realistic re-shaping of the vocal tract characteristic, hence the system is well suited for real-time pitch scaling and vocal tract modification of speech.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133248622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701895
Shoichi Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda
Sound field reproduction methods calculate driving signals of loudspeakers to reproduce the desired sound field. In common recording and reproduction systems, sound pressures at multiple positions obtained in a recording room are only known as the desired sound field; therefore, signal transformation algorithms from sound pressures into driving signals (SP-DS conversion) are necessary. Although several SP-DS conversion methods have been proposed, they do not take into account a priori information about the recorded sound field. However, approximate positions of sound sources can be obtained by using the received signals of microphones or other sensor data. We propose an SP-DS conversion method based on the maximum a posteriori (MAP) estimation when array configurations of the microphones and loudspeakers are planar or linear. The optimal basis functions and their coefficients for representing driving signals of the loudspeakers are optimized based on the prior information of the source positions. Numerical simulation results indicate that the proposed method can achieve higher reproduction accuracy compared to the current SP-DS conversion methods, especially in higher frequencies above the spatial Nyquist frequency.
{"title":"Map estimation of driving signals of loudspeakers for sound field reproduction from pressure measurements","authors":"Shoichi Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda","doi":"10.1109/WASPAA.2013.6701895","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701895","url":null,"abstract":"Sound field reproduction methods calculate driving signals of loudspeakers to reproduce the desired sound field. In common recording and reproduction systems, sound pressures at multiple positions obtained in a recording room are only known as the desired sound field; therefore, signal transformation algorithms from sound pressures into driving signals (SP-DS conversion) are necessary. Although several SP-DS conversion methods have been proposed, they do not take into account a priori information about the recorded sound field. However, approximate positions of sound sources can be obtained by using the received signals of microphones or other sensor data. We propose an SP-DS conversion method based on the maximum a posteriori (MAP) estimation when array configurations of the microphones and loudspeakers are planar or linear. The optimal basis functions and their coefficients for representing driving signals of the loudspeakers are optimized based on the prior information of the source positions. Numerical simulation results indicate that the proposed method can achieve higher reproduction accuracy compared to the current SP-DS conversion methods, especially in higher frequencies above the spatial Nyquist frequency.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"85 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131364759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701833
S. Miyabe, Nobutaka Ono, S. Makino
This paper proposes a blind synchronization of ad-hoc microphone array in the short-time Fourier transform (STFT) domain with the optimized frame analysis centered at non-integer discrete time. We show that the drift caused by sampling frequency mismatch of asynchronous observation channels can be disregarded in a short interval. Utilizing this property, the sampling frequency mismatch and the recording start offset are estimated roughly by finding two pairs of the short intervals corresponding to the same continuous time. Using the estimate, STFT analysis is synchronized roughly between channels with optimized frame central. Since the optimized frame central is generally non-integer, we approximate the frame analysis by the linear phase filtering of the frame centered at the nearest integer sample. Maximum likelihood estimation refines the compensation of sampling frequency mismatch.
{"title":"Optimizing frame analysis with non-integrer shift for sampling mismatch compensation of long recording","authors":"S. Miyabe, Nobutaka Ono, S. Makino","doi":"10.1109/WASPAA.2013.6701833","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701833","url":null,"abstract":"This paper proposes a blind synchronization of ad-hoc microphone array in the short-time Fourier transform (STFT) domain with the optimized frame analysis centered at non-integer discrete time. We show that the drift caused by sampling frequency mismatch of asynchronous observation channels can be disregarded in a short interval. Utilizing this property, the sampling frequency mismatch and the recording start offset are estimated roughly by finding two pairs of the short intervals corresponding to the same continuous time. Using the estimate, STFT analysis is synchronized roughly between channels with optimized frame central. Since the optimized frame central is generally non-integer, we approximate the frame analysis by the linear phase filtering of the frame centered at the nearest integer sample. Maximum likelihood estimation refines the compensation of sampling frequency mismatch.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131693935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701884
D. Ward, C. Athwal, M. Köküer
In this paper, we present an efficient loudness model applicable to time-varying sounds. We use the model of Glasberg and Moore (J. Audio Eng. Soc., 2002) as the basis for our developments, proposing a number of optimization techniques to reduce the computational complexity at each stage of the model. Efficient alternatives to computing the multi-resolution DFT, excitation pattern and pre-cochlea filter are presented. Absolute threshold and equal loudness contour predictions are computed and compared against both steady-state and time-varying loudness models to evaluate the combined accuracy of these techniques in the frequency domain. Finally, computational costs and loudness errors are quantified for a range of time-varying stimuli, demonstrating that the optimized model can execute approximately 50 times faster within tolerable error bounds.
{"title":"An efficient time-varying loudness model","authors":"D. Ward, C. Athwal, M. Köküer","doi":"10.1109/WASPAA.2013.6701884","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701884","url":null,"abstract":"In this paper, we present an efficient loudness model applicable to time-varying sounds. We use the model of Glasberg and Moore (J. Audio Eng. Soc., 2002) as the basis for our developments, proposing a number of optimization techniques to reduce the computational complexity at each stage of the model. Efficient alternatives to computing the multi-resolution DFT, excitation pattern and pre-cochlea filter are presented. Absolute threshold and equal loudness contour predictions are computed and compared against both steady-state and time-varying loudness models to evaluate the combined accuracy of these techniques in the frequency domain. Finally, computational costs and loudness errors are quantified for a range of time-varying stimuli, demonstrating that the optimized model can execute approximately 50 times faster within tolerable error bounds.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129979811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701829
Bracha Laufer-Goldshtein, R. Talmon, S. Gannot
Speaker localization is one of the most prevalent problems in speech processing. Despite significant efforts in the last decades, high reverberation level still limits the performance of localization algorithms. Furthermore, using conventional localization methods, the information that can be extracted from dual microphone measurements is restricted to the time difference of arrival (TDOA). Under far-field regime, this is equivalent to either azimuth or elevation angles estimation. Full description of speaker's coordinates necessitates several microphones. In this contribution we tackle these two limitations by taking a manifold learning perspective for system identification. We present a training-based algorithm, motivated by the concept of diffusion maps, that aims at recovering the fundamental controlling parameters driving the measurements. This approach turns out to be more robust to reverberation, and capable of recovering the speech source location using merely two microphones signals.
{"title":"Relative transfer function modeling for supervised source localization","authors":"Bracha Laufer-Goldshtein, R. Talmon, S. Gannot","doi":"10.1109/WASPAA.2013.6701829","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701829","url":null,"abstract":"Speaker localization is one of the most prevalent problems in speech processing. Despite significant efforts in the last decades, high reverberation level still limits the performance of localization algorithms. Furthermore, using conventional localization methods, the information that can be extracted from dual microphone measurements is restricted to the time difference of arrival (TDOA). Under far-field regime, this is equivalent to either azimuth or elevation angles estimation. Full description of speaker's coordinates necessitates several microphones. In this contribution we tackle these two limitations by taking a manifold learning perspective for system identification. We present a training-based algorithm, motivated by the concept of diffusion maps, that aims at recovering the fundamental controlling parameters driving the measurements. This approach turns out to be more robust to reverberation, and capable of recovering the speech source location using merely two microphones signals.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127405162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701834
E. Durant, Jinjun Xiao, Buye Xu, M. McKinney, Zhang Tao
The goal of noise control in hearing aids is to improve listening perception. In this paper we propose modifying a perceptually motivated active noise control (ANC) algorithm by incorporating a perceptual model into the cost function, resulting in a dynamic residual noise spectrum shaping technique based on the time-varying residual noise. The perceptual criterion to be minimized could be sharpness, discordance, annoyance, etc. As an illustrative example, we use loudness perceived by a hearing-impaired listener as the cost function. Specifically, we design the spectrum shaping filter using the listener's hearing loss and the dynamic residual noise spectrum. Simulations show significant improvements of 3-4 sones over energy reduction (ER) for severe high-frequency losses for some common noises that would be 6-12 without processing. However, average loudness across a wide range of noises is only slightly better than with ER, with greater improvements realized with increasing hearing loss. We analyze one way in which the algorithm fails and trace it to over-reliance on the common psychoacoustic modelling simplification that auditory channels are independent to a first approximation. This suggests future work that may improve performance.
{"title":"Perceptually motivated ANC for hearing-impaired listeners","authors":"E. Durant, Jinjun Xiao, Buye Xu, M. McKinney, Zhang Tao","doi":"10.1109/WASPAA.2013.6701834","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701834","url":null,"abstract":"The goal of noise control in hearing aids is to improve listening perception. In this paper we propose modifying a perceptually motivated active noise control (ANC) algorithm by incorporating a perceptual model into the cost function, resulting in a dynamic residual noise spectrum shaping technique based on the time-varying residual noise. The perceptual criterion to be minimized could be sharpness, discordance, annoyance, etc. As an illustrative example, we use loudness perceived by a hearing-impaired listener as the cost function. Specifically, we design the spectrum shaping filter using the listener's hearing loss and the dynamic residual noise spectrum. Simulations show significant improvements of 3-4 sones over energy reduction (ER) for severe high-frequency losses for some common noises that would be 6-12 without processing. However, average loudness across a wide range of noises is only slightly better than with ER, with greater improvements realized with increasing hearing loss. We analyze one way in which the algorithm fails and trace it to over-reliance on the common psychoacoustic modelling simplification that auditory channels are independent to a first approximation. This suggests future work that may improve performance.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128149006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701886
A. Vijayakumar, A. Makur
Literature shows that the design criteria of pth order analysis having qth order synthesis filters (p ≠ q) with a flexibility to control the system delay has never been addressed concomitantly. In this paper, we propose a systematic design for a filterbank that can have arbitrary delay with a (p, q) order. Such filterbanks play an important role especially in applications where low delay-high quality signals are required, like a digital hearing aid.
{"title":"Design of arbitrary delay filterbank having arbitrary order for audio applications","authors":"A. Vijayakumar, A. Makur","doi":"10.1109/WASPAA.2013.6701886","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701886","url":null,"abstract":"Literature shows that the design criteria of pth order analysis having qth order synthesis filters (p ≠ q) with a flexibility to control the system delay has never been addressed concomitantly. In this paper, we propose a systematic design for a filterbank that can have arbitrary delay with a (p, q) order. Such filterbanks play an important role especially in applications where low delay-high quality signals are required, like a digital hearing aid.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130774493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701836
Yusuke Hioka, T. Betlehem
Sound source signals can be separated using Wiener post-filters calculated by estimating the power spectral densities (PSDs) of sources from the outputs of a set of beamformers. This approach has been shown effective in the under-determined case where the number of sources to be separated exceeds the number of microphones. In this paper, a limit on the maximum number of separable sources is derived beyond which the problem becomes rank deficient. This study reveals the number of sources that can be separated simultaneously is related to the order of the beam patterns. Further, using the principles of cylindrical mode beamforming, the performance can be predicted as a function of frequency. The result is consistent with simulations in which the performance of separating music and speech sound sources was quantified.
{"title":"Under-determined source separation based on power spectral density estimated using cylindrical mode beamforming","authors":"Yusuke Hioka, T. Betlehem","doi":"10.1109/WASPAA.2013.6701836","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701836","url":null,"abstract":"Sound source signals can be separated using Wiener post-filters calculated by estimating the power spectral densities (PSDs) of sources from the outputs of a set of beamformers. This approach has been shown effective in the under-determined case where the number of sources to be separated exceeds the number of microphones. In this paper, a limit on the maximum number of separable sources is derived beyond which the problem becomes rank deficient. This study reveals the number of sources that can be separated simultaneously is related to the order of the beam patterns. Further, using the principles of cylindrical mode beamforming, the performance can be predicted as a function of frequency. The result is consistent with simulations in which the performance of separating music and speech sound sources was quantified.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124541455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-10-01DOI: 10.1109/WASPAA.2013.6701861
O. Dikmen, A. Mesaros
Detection of overlapping sound events generally requires training class models either from separate data for each class or by making assumptions about the dominating events in the mixed signals. Methods based on sound source separation are currently used in this task, but involve the problem of assigning separated components to sources. In this paper, we propose a method which bypasses the need to build separate sound models. Instead, non-negative dictionaries for the sound content and their annotations are learned in a coupled sense. In the testing stage, time activations of the sound dictionary columns are estimated and used to reconstruct annotations using the annotation dictionary. The method requires no separate training data for classes and in general very promising results are obtained using only a small amount of data.
{"title":"Sound event detection using non-negative dictionaries learned from annotated overlapping events","authors":"O. Dikmen, A. Mesaros","doi":"10.1109/WASPAA.2013.6701861","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701861","url":null,"abstract":"Detection of overlapping sound events generally requires training class models either from separate data for each class or by making assumptions about the dominating events in the mixed signals. Methods based on sound source separation are currently used in this task, but involve the problem of assigning separated components to sources. In this paper, we propose a method which bypasses the need to build separate sound models. Instead, non-negative dictionaries for the sound content and their annotations are learned in a coupled sense. In the testing stage, time activations of the sound dictionary columns are estimated and used to reconstruct annotations using the annotation dictionary. The method requires no separate training data for classes and in general very promising results are obtained using only a small amount of data.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"350 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115895534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}