Pub Date : 1999-10-17DOI: 10.1109/ASPAA.1999.810846
Y.E. Kim
The human voice is the most difficult musical instrument to simulate convincingly. Yet a great deal of progress has been made in voice coding, the parameterization and re-synthesis of a source signal according to an assumed voice model. Source-filter models of the human voice, particularly linear predictive coding (LPC), are the basis of most low bit rate (speech) coding techniques in use today. This paper introduces a technique for coding the singing voice using LPC and prior knowledge of the musical score to aid in the process of encoding, reducing the amount of data required to represent the voice. This approach advances the singing voice closer towards a structured audio model in which musical parameters such as pitch, duration, and phonemes are represented orthogonally to the synthesis technique and can thus be modified prior to re-synthesis.
{"title":"Structured encoding of the singing voice using prior knowledge of the musical score","authors":"Y.E. Kim","doi":"10.1109/ASPAA.1999.810846","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810846","url":null,"abstract":"The human voice is the most difficult musical instrument to simulate convincingly. Yet a great deal of progress has been made in voice coding, the parameterization and re-synthesis of a source signal according to an assumed voice model. Source-filter models of the human voice, particularly linear predictive coding (LPC), are the basis of most low bit rate (speech) coding techniques in use today. This paper introduces a technique for coding the singing voice using LPC and prior knowledge of the musical score to aid in the process of encoding, reducing the amount of data required to represent the voice. This approach advances the singing voice closer towards a structured audio model in which musical parameters such as pitch, duration, and phonemes are represented orthogonally to the synthesis technique and can thus be modified prior to re-synthesis.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134256029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-10-17DOI: 10.1109/ASPAA.1999.810887
M. Peters
This paper presents an algorithm and an implementation of orthogonal perfect correlation sequences for acoustical system identification using psychoacoustical masking effects. Therefore, the common NLMS-algorithm has been modified to incorporate hidden orthogonal Ipatov- and Huffman sequences for fast system identification. Using this method, the speed and accuracy of the identification of the loudspeaker-room-microphone (LRM)-system is increased and the overall-performance of echo and noise cancellation has been improved.
{"title":"Psychoacoustical excitation of the (N)LMS algorithm for acoustical system identification","authors":"M. Peters","doi":"10.1109/ASPAA.1999.810887","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810887","url":null,"abstract":"This paper presents an algorithm and an implementation of orthogonal perfect correlation sequences for acoustical system identification using psychoacoustical masking effects. Therefore, the common NLMS-algorithm has been modified to incorporate hidden orthogonal Ipatov- and Huffman sequences for fast system identification. Using this method, the speed and accuracy of the identification of the loudspeaker-room-microphone (LRM)-system is increased and the overall-performance of echo and noise cancellation has been improved.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"89 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120850163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-10-17DOI: 10.1109/ASPAA.1999.810852
M. Karjalainen
Audio technology is rapidly expanding to various directions. In addition to recording, transmission, storage, and reproduction of sounds it supports practically unlimited modification and generation of sounds and their properties. It is possible to create ever more immersive virtual soundscapes. Audio-related information is also considered not only as signals or data anymore: sound can be analyzed more and more deeply, approaching its content. The focus of this article is to discuss a framework in which modern audio signal and information processing could be placed to see more clearly the explored and unexplored realms of research. This is the author's personal framework that has helped in shaping research in the Laboratory of Acoustics and Audio Signal Processing at the Helsinki University of Technology, and hopefully some of the ideas and visions could be useful also to others working in the field.
{"title":"Immersion and content-a framework for audio research","authors":"M. Karjalainen","doi":"10.1109/ASPAA.1999.810852","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810852","url":null,"abstract":"Audio technology is rapidly expanding to various directions. In addition to recording, transmission, storage, and reproduction of sounds it supports practically unlimited modification and generation of sounds and their properties. It is possible to create ever more immersive virtual soundscapes. Audio-related information is also considered not only as signals or data anymore: sound can be analyzed more and more deeply, approaching its content. The focus of this article is to discuss a framework in which modern audio signal and information processing could be placed to see more clearly the explored and unexplored realms of research. This is the author's personal framework that has helped in shaping research in the Laboratory of Acoustics and Audio Signal Processing at the Helsinki University of Technology, and hopefully some of the ideas and visions could be useful also to others working in the field.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123667510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-10-01DOI: 10.1109/ASPAA.1999.810872
J. Hopgood, P. Rayner
This paper considers single channel blind deconvolution, in which a degraded observed signal is modelled as the convolution of a non-stationary source signal with a stationary distortion operator. Recovery of the source signal from the observed signal is achieved by modelling the source signal as a time-varying autoregressive process, the distortion operator by a IIR filter, and then using a Bayesian framework to estimate the parameters of the distorting filter, which can be used to deconvolve the observed signal. The paper also discusses how the non-stationary properties of the source signal allow the identification of the distortion operator to be uniquely determined.
{"title":"Bayesian single channel blind deconvolution using parametric signal and channel models","authors":"J. Hopgood, P. Rayner","doi":"10.1109/ASPAA.1999.810872","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810872","url":null,"abstract":"This paper considers single channel blind deconvolution, in which a degraded observed signal is modelled as the convolution of a non-stationary source signal with a stationary distortion operator. Recovery of the source signal from the observed signal is achieved by modelling the source signal as a time-varying autoregressive process, the distortion operator by a IIR filter, and then using a Bayesian framework to estimate the parameters of the distorting filter, which can be used to deconvolve the observed signal. The paper also discusses how the non-stationary properties of the source signal allow the identification of the distortion operator to be uniquely determined.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117089388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-16DOI: 10.1109/ASPAA.1999.810862
A. Ahmed, P. Rayner, S. Godsill
We investigate the exploitation of non-stationarity for signal separation. A second-order decorrelation method is used to separate synthetic independent autoregressive signals that are made up of stationary blocks that have been convolutively mixed. We compare results obtained by not taking into account the non-stationarity with those that do. Under certain conditions, exploiting non-stationarity results in more robust separation. We present simulation results that vindicate this fact. In addition, we apply the decorrelation method to real microphone signals, to see how exploiting non-stationarity affects separation quality.
{"title":"Considering non-stationarity for blind signal separation","authors":"A. Ahmed, P. Rayner, S. Godsill","doi":"10.1109/ASPAA.1999.810862","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810862","url":null,"abstract":"We investigate the exploitation of non-stationarity for signal separation. A second-order decorrelation method is used to separate synthetic independent autoregressive signals that are made up of stationary blocks that have been convolutively mixed. We compare results obtained by not taking into account the non-stationarity with those that do. Under certain conditions, exploiting non-stationarity results in more robust separation. We present simulation results that vindicate this fact. In addition, we apply the decorrelation method to real microphone signals, to see how exploiting non-stationarity affects separation quality.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"59 24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133949255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-16DOI: 10.1109/ASPAA.1999.810864
Paul J. Walmsley, S. Godsill, P. Rayner
We present a novel approach to pitch estimation and note detection in polyphonic audio signals. We pose the problem in a Bayesian probabilistic framework, which allows us to incorporate prior knowledge about the nature of musical data into the model. We exploit the high correlation between model parameters in adjacent frames of data by explicitly modelling the frequency variation over time using latent variables. Parameters are estimated jointly across a number of adjacent frames to increase the robustness of the estimation against transient events. Individual frames of data are modelled as the sum of harmonic sinusoids. Parameter estimation is performed using Markov chain Monte Carlo (MCMC) methods.
{"title":"Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters","authors":"Paul J. Walmsley, S. Godsill, P. Rayner","doi":"10.1109/ASPAA.1999.810864","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810864","url":null,"abstract":"We present a novel approach to pitch estimation and note detection in polyphonic audio signals. We pose the problem in a Bayesian probabilistic framework, which allows us to incorporate prior knowledge about the nature of musical data into the model. We exploit the high correlation between model parameters in adjacent frames of data by explicitly modelling the frequency variation over time using latent variables. Parameters are estimated jointly across a number of adjacent frames to increase the robustness of the estimation against transient events. Individual frames of data are modelled as the sum of harmonic sinusoids. Parameter estimation is performed using Markov chain Monte Carlo (MCMC) methods.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127313284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-01DOI: 10.1109/ASPAA.1999.810861
M. Kahrs
As the World Wide Web and the Internet becomes the dominant form of information distribution, consideration must be given to the indexing of musical material including themes, melodies, rhythm tracks and so forth. This paper describes the implementation of an algorithm for locating song titles from the vocal input of amateur singers. The prototype algorithm exceeds 90% accuracy for 9 different singers.
{"title":"Vocal interfaces to musical material","authors":"M. Kahrs","doi":"10.1109/ASPAA.1999.810861","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810861","url":null,"abstract":"As the World Wide Web and the Internet becomes the dominant form of information distribution, consideration must be given to the indexing of musical material including themes, melodies, rhythm tracks and so forth. This paper describes the implementation of an algorithm for locating song titles from the vocal input of amateur singers. The prototype algorithm exceeds 90% accuracy for 9 different singers.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117168120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/ASPAA.1999.810870
S. Doclo, M. Moonen
A class of SVD-based signal enhancement procedures is described, which amount to a specific optimal filtering technique for the case where the so-called 'desired response' signal cannot be observed. It is shown that this optimal filter can be written as a function of the generalized singular vectors and singular values of a so-called speech and noise data matrix. A number of simple symmetry properties of the optimal filter are derived, which are valid for the white noise case as well as for the coloured noise case. Also the averaging step of the standard one-microphone SVD-based noise reduction techniques is investigated, leading to serious doubts about the necessity of this averaging step. When applying this technique for multi-microphone noise reduction, it is shown that for simple scenarios, where we consider localised sources and no multipath propagation, this technique exhibits some kind of beamforming behaviour. We further compare the performance of this technique with standard beamforming techniques, showing that for all reverberation times the performance of the SVD-based optimal filter is better than beamforming.
{"title":"SVD-based optimal filtering with applications to noise reduction in speech signals","authors":"S. Doclo, M. Moonen","doi":"10.1109/ASPAA.1999.810870","DOIUrl":"https://doi.org/10.1109/ASPAA.1999.810870","url":null,"abstract":"A class of SVD-based signal enhancement procedures is described, which amount to a specific optimal filtering technique for the case where the so-called 'desired response' signal cannot be observed. It is shown that this optimal filter can be written as a function of the generalized singular vectors and singular values of a so-called speech and noise data matrix. A number of simple symmetry properties of the optimal filter are derived, which are valid for the white noise case as well as for the coloured noise case. Also the averaging step of the standard one-microphone SVD-based noise reduction techniques is investigated, leading to serious doubts about the necessity of this averaging step. When applying this technique for multi-microphone noise reduction, it is shown that for simple scenarios, where we consider localised sources and no multipath propagation, this technique exhibits some kind of beamforming behaviour. We further compare the performance of this technique with standard beamforming techniques, showing that for all reverberation times the performance of the SVD-based optimal filter is better than beamforming.","PeriodicalId":229733,"journal":{"name":"Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133078396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}