Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169742
G. Ahlbom, F. Bimbot, G. Chollet
ATAL [1] introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios{y_{i}= Ln ((1+k_{i})/(1-k_{i}))}where kiare the reflection coefficients obtained from short-time stationary LPC analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of articulatory targets (vector quantization code book). A set of speech segments ("polysons") has been encoded using this technique. It includes diphones, demi-syllables, and other units that are difficult to segment. Temporal decomposition using target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analyticaiy explained and modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic speech synthesis.
ATAL[1]引入了一种技术,根据重叠和相互作用的发音手势,将语音分解为电话长度的时间事件。本文报道了该技术的简化及其在声音合成中的应用。光谱演化由p维空间中Log-Area ratio (y_{i}= Ln ((1+k_{i})/(1-k_{i}))}的时间索引轨迹表示,其中ki为短时平稳LPC分析得到的反射系数。声道结构(谱矢量)与每个插值函数相关联,属于有限的发音目标集合(矢量量化代码书)。一组语音片段(“多义词”)已经使用这种技术进行了编码。它包括双音、半音节和其他难以分割的单位。利用目标光谱进行时间分解可以打破这些片段的复杂编码。特别是,协同衔接效应的分析解释和建模。结果表明,这些新工具为我们寻找更好的声学语音合成规则提供了充分的环境。
{"title":"Modeling spectral speech transitions using temporal decomposition techniques","authors":"G. Ahlbom, F. Bimbot, G. Chollet","doi":"10.1109/ICASSP.1987.1169742","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169742","url":null,"abstract":"ATAL [1] introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios{y_{i}= Ln ((1+k_{i})/(1-k_{i}))}where kiare the reflection coefficients obtained from short-time stationary LPC analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of articulatory targets (vector quantization code book). A set of speech segments (\"polysons\") has been encoded using this technique. It includes diphones, demi-syllables, and other units that are difficult to segment. Temporal decomposition using target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analyticaiy explained and modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic speech synthesis.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129622445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169551
D. Paul
Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress. More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database.
{"title":"A speaker-stress resistant HMM isolated word recognizer","authors":"D. Paul","doi":"10.1109/ICASSP.1987.1169551","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169551","url":null,"abstract":"Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress. More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169690
Stewart Smith, P. Denyer
The use of standard-part multiply/accumulators in digital signal processing is often in the computation of vector products. In the realm of custom VLSI, direct computation of vector products can result in area savings over classical multiply/accumulate methods. A methodology is presented for composition of VLSI architectures for direct vector multiplication, based on three fundamental computational elements. These are register, data selecter, and carry-save add-shift (CSAS) computer. The CSAS computer is a linear array of gated carry-save adders which performs shifting accumulation of partial results. Two's complement serial/parallel carry-save accumulation provides performance, while the use of symmetric-coded distributed arithmetic eliminates redundant computation to effect area-savings.
{"title":"Serial/Parallel architectures for area-efficient vector multiplication","authors":"Stewart Smith, P. Denyer","doi":"10.1109/ICASSP.1987.1169690","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169690","url":null,"abstract":"The use of standard-part multiply/accumulators in digital signal processing is often in the computation of vector products. In the realm of custom VLSI, direct computation of vector products can result in area savings over classical multiply/accumulate methods. A methodology is presented for composition of VLSI architectures for direct vector multiplication, based on three fundamental computational elements. These are register, data selecter, and carry-save add-shift (CSAS) computer. The CSAS computer is a linear array of gated carry-save adders which performs shifting accumulation of partial results. Two's complement serial/parallel carry-save accumulation provides performance, while the use of symmetric-coded distributed arithmetic eliminates redundant computation to effect area-savings.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"1082 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120878729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169602
M. Lagunas, F. Vallverdú, M. Santamaria
This paper is a first attempt to give formalism to non-linear system design and in which context, related with similar linear processing techniques, they are located. A summary on the relation-ship of linear objectives and classical adaptive algorithms, in non-linear design problems, introduces the paper; giving the potential of random search techniques in order to open the different problems in non-linear objectives that could be handled with them. After, the similarity between probability distribution functions and power spectral density in linear processing is shown. This is supported by a nice example of non-linear system design. Finally, some prospective work is reported in the problem of adaptive companding design.
{"title":"Non-linear adaptive signal processor","authors":"M. Lagunas, F. Vallverdú, M. Santamaria","doi":"10.1109/ICASSP.1987.1169602","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169602","url":null,"abstract":"This paper is a first attempt to give formalism to non-linear system design and in which context, related with similar linear processing techniques, they are located. A summary on the relation-ship of linear objectives and classical adaptive algorithms, in non-linear design problems, introduces the paper; giving the potential of random search techniques in order to open the different problems in non-linear objectives that could be handled with them. After, the similarity between probability distribution functions and power spectral density in linear processing is shown. This is supported by a nice example of non-linear system design. Finally, some prospective work is reported in the problem of adaptive companding design.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115987045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169749
A. Nieminen, P. Heinonen, Y. Neuvo
In this paper, we introduce a new type of nonlinear filters, the Adaptive Median Hybrid (AMH) filters, for the suppression and detection of short duration interferences. In the AMH filters, adaptive filter substructures are used to estimate the current signal value from the future and past signal values. The output of the overall filter is the median of the adaptive filter outputs and the current signal value. This kind of nonlinear filter structure is shown to adapt and preserve rapid changes in signal characteristics well. However, it filters out short duration interferences. By examining the difference between the original and filtered data, interferences can be detected. We introduce two types of AMH filters, the AMH filter with separate adaptive substructures (SAMH) and the AMH filter with coupled substructures (CAMH), which have different convergence properties and implementation. We use both synthetic and real data (speech and electroencephalogram (EEG)) to show the applicability of the proposed filters.
{"title":"Suppression and detection of impulse type interference using adaptive median hybrid filters","authors":"A. Nieminen, P. Heinonen, Y. Neuvo","doi":"10.1109/ICASSP.1987.1169749","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169749","url":null,"abstract":"In this paper, we introduce a new type of nonlinear filters, the Adaptive Median Hybrid (AMH) filters, for the suppression and detection of short duration interferences. In the AMH filters, adaptive filter substructures are used to estimate the current signal value from the future and past signal values. The output of the overall filter is the median of the adaptive filter outputs and the current signal value. This kind of nonlinear filter structure is shown to adapt and preserve rapid changes in signal characteristics well. However, it filters out short duration interferences. By examining the difference between the original and filtered data, interferences can be detected. We introduce two types of AMH filters, the AMH filter with separate adaptive substructures (SAMH) and the AMH filter with coupled substructures (CAMH), which have different convergence properties and implementation. We use both synthetic and real data (speech and electroencephalogram (EEG)) to show the applicability of the proposed filters.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130715676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169415
A. Steinhardt
In this paper we present an algorithm which answers the following question: Given a finite number of correlation lags, what is the shortest length sequence which could have produced these correlations? This question is equivalent to asking for the minimum order moving average (all-zero) model which can match a given set of correlations. The algorithm applies to both the case of uniform correlations and missing lag correlations. The algorithm involves quadratic programming coupled with a new representation of the boundary of correlations derived from finite sequences in terms of the spectral decomposition of a certain class of banded Toeplitz matrices.
{"title":"Reconstructing a finite length sequence from several of its correlation lags","authors":"A. Steinhardt","doi":"10.1109/ICASSP.1987.1169415","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169415","url":null,"abstract":"In this paper we present an algorithm which answers the following question: Given a finite number of correlation lags, what is the shortest length sequence which could have produced these correlations? This question is equivalent to asking for the minimum order moving average (all-zero) model which can match a given set of correlations. The algorithm applies to both the case of uniform correlations and missing lag correlations. The algorithm involves quadratic programming coupled with a new representation of the boundary of correlations derived from finite sequences in terms of the spectral decomposition of a certain class of banded Toeplitz matrices.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121951525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169844
H. Ney, D. Mergel, A. Noll, A. Paeseler
This paper describes a data-driven organization of the dynamic programming beam search for large vocabulary, continuous speech recognition. This organization can be viewed as an extension of the one-pass dynamic programming algorithm for connected word recognition. In continuous speech recognition we are faced with a huge search space, and search hypotheses have to be formed at the 10-ms level. The organization of the search presented has the following characteristics. Its computational cost is proportional only to the number of hypotheses actually generated and is independent of the overall size of the potential search space. There is no limit on the number of word hypotheses, there is only a limit to the overall number of hypotheses due to memory constraints. The implementation of the search has been studied and tested on a continuous speech data base comprising 20672 words.
{"title":"A data-driven organization of the dynamic programming beam search for continuous speech recognition","authors":"H. Ney, D. Mergel, A. Noll, A. Paeseler","doi":"10.1109/ICASSP.1987.1169844","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169844","url":null,"abstract":"This paper describes a data-driven organization of the dynamic programming beam search for large vocabulary, continuous speech recognition. This organization can be viewed as an extension of the one-pass dynamic programming algorithm for connected word recognition. In continuous speech recognition we are faced with a huge search space, and search hypotheses have to be formed at the 10-ms level. The organization of the search presented has the following characteristics. Its computational cost is proportional only to the number of hypotheses actually generated and is independent of the overall size of the potential search space. There is no limit on the number of word hypotheses, there is only a limit to the overall number of hypotheses due to memory constraints. The implementation of the search has been studied and tested on a continuous speech data base comprising 20672 words.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123077927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169492
C. Burrus
The traditional Cooley-Tukey and the prime factor FFT algorithms either produce the output in scrambled order or the input data order must be prescrambled. Several methods for scrambling and unscrambling the DFT are presented. The new result in this paper is the observation that the radix-4, radix-8, or any radix-2mFFT can be modified to give the output in the same bit-reversed order as the radix-2 FFT.
{"title":"Bit reverse unscrambling for a radix-2MFFT","authors":"C. Burrus","doi":"10.1109/ICASSP.1987.1169492","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169492","url":null,"abstract":"The traditional Cooley-Tukey and the prime factor FFT algorithms either produce the output in scrambled order or the input data order must be prescrambled. Several methods for scrambling and unscrambling the DFT are presented. The new result in this paper is the observation that the radix-4, radix-8, or any radix-2mFFT can be modified to give the output in the same bit-reversed order as the radix-2 FFT.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121275400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169379
J. Justice, S. Dougherty
Generalized linear inversion (GLI) is a parameter estimation technique which shows great potential for use in solving inverse problems in many fields, including exploration seismology. We suggest a particular implementation of the procedure which may be used for simultaneous parameter estimation, and illustrate its use with the 1-D seismic deconvolution problem. The procedure is easily extended to the multidimensional case, and we illustrate this extension by computing depth and velocity structure in a flat-layer model using multiple offset data.
{"title":"Generalized linear inversion applied to seismic data in one and two dimensions","authors":"J. Justice, S. Dougherty","doi":"10.1109/ICASSP.1987.1169379","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169379","url":null,"abstract":"Generalized linear inversion (GLI) is a parameter estimation technique which shows great potential for use in solving inverse problems in many fields, including exploration seismology. We suggest a particular implementation of the procedure which may be used for simultaneous parameter estimation, and illustrate its use with the 1-D seismic deconvolution problem. The procedure is easily extended to the multidimensional case, and we illustrate this extension by computing depth and velocity structure in a flat-layer model using multiple offset data.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125191478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-04-06DOI: 10.1109/ICASSP.1987.1169628
T. Svendsen, F. Soong
For large vocabulary and continuous speech recognition, the sub-word-unit-based approach is a viable alternative to the whole-word-unit-based approach. For preparing a large inventory of subword units, an automatic segmentation is preferrable to manual segmentation as it substantially reduces the work associated with the generation of templates and gives more consistent results. In this paper we discuss some methods for automatically segmenting speech into phonetic units. Three different approaches are described, one based on template matching, one based on detecting the spectral changes that occur at the boundaries between phonetic units and one based on a constrained-clustering vector quantization approach. An evaluation of the performance of the automatic segmentation methods is given.
{"title":"On the automatic segmentation of speech signals","authors":"T. Svendsen, F. Soong","doi":"10.1109/ICASSP.1987.1169628","DOIUrl":"https://doi.org/10.1109/ICASSP.1987.1169628","url":null,"abstract":"For large vocabulary and continuous speech recognition, the sub-word-unit-based approach is a viable alternative to the whole-word-unit-based approach. For preparing a large inventory of subword units, an automatic segmentation is preferrable to manual segmentation as it substantially reduces the work associated with the generation of templates and gives more consistent results. In this paper we discuss some methods for automatically segmenting speech into phonetic units. Three different approaches are described, one based on template matching, one based on detecting the spectral changes that occur at the boundaries between phonetic units and one based on a constrained-clustering vector quantization approach. An evaluation of the performance of the automatic segmentation methods is given.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122302158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}