首页 > 最新文献

Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics最新文献

英文 中文
Digital Representation of Perceptual Criteria 感知标准的数字表示
J. Flanagan
Information signals are typically intended for human consumption. Human perception therefore contributes directly to fidelity criteria for digital representation. As computational capabilities increase and costs diminish, coding algorithms are able to iiicorporate more of the constraints that characterize perception. The incentive is still-greater economy for digital transmission and storage. Sight and sound are sensory modes favored by the human for information exchange. These modes are presently most central to humadmachine communications and multimedia systems. The intricacies of visual and auditory perception are therefore figuring more prominently in signal coding. For example, taking account of the eye's sensitivity to quantizing noise as a function of temporal and spatial frequencies leads to good-quality coding of color motion images at fractions of a bit per pixel. Similarly, the characteristics of auditory masking, in both time and frequency domains, provide leverage to identify signal components which are irrelevant to perception and which need not consume coding capacity. This discussion draws a perspective on recent coding advances and points up opportunities for increased sophistication in representing perceptual I y imp0 rtan t factors. It also indicates relations hips between economies gained by perceptual coding alone, and those where source coding can trade on signal-specific characteristics to achieve further reductions in bit rate. It COnChdeS with brief consideration of other sensory modalities, such as the tactile dimension, that might contribute to naturalness and ease of use in interactive multimedia information systems.
信息信号通常是供人类使用的。因此,人类感知直接有助于数字表示的保真度标准。随着计算能力的提高和成本的降低,编码算法能够结合更多表征感知的约束。这样做的动机是数字传输和存储更加经济。视觉和听觉是人类进行信息交换的两种感觉方式。这些模式是目前人机通信和多媒体系统的核心。因此,视觉和听觉感知的复杂性在信号编码中更加突出。例如,考虑到眼睛对量化噪声的敏感性作为时间和空间频率的函数,可以以每像素几分之一比特的速度对彩色运动图像进行高质量的编码。同样,在时域和频域上,听觉掩蔽的特征提供了识别与感知无关且不需要消耗编码容量的信号成分的杠杆。本讨论对最近的编码进展进行了展望,并指出了通过重要因素表示感知I的复杂性增加的机会。它还表明了仅通过感知编码获得的经济与源编码可以根据信号特定特征进行交易以进一步降低比特率的经济之间的关系。它还简要考虑了其他感官模式,如触觉维度,这可能有助于在交互式多媒体信息系统中自然和易于使用。
{"title":"Digital Representation of Perceptual Criteria","authors":"J. Flanagan","doi":"10.1109/ASPAA.1991.634087","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634087","url":null,"abstract":"Information signals are typically intended for human consumption. Human perception therefore contributes directly to fidelity criteria for digital representation. As computational capabilities increase and costs diminish, coding algorithms are able to iiicorporate more of the constraints that characterize perception. The incentive is still-greater economy for digital transmission and storage. Sight and sound are sensory modes favored by the human for information exchange. These modes are presently most central to humadmachine communications and multimedia systems. The intricacies of visual and auditory perception are therefore figuring more prominently in signal coding. For example, taking account of the eye's sensitivity to quantizing noise as a function of temporal and spatial frequencies leads to good-quality coding of color motion images at fractions of a bit per pixel. Similarly, the characteristics of auditory masking, in both time and frequency domains, provide leverage to identify signal components which are irrelevant to perception and which need not consume coding capacity. This discussion draws a perspective on recent coding advances and points up opportunities for increased sophistication in representing perceptual I y imp0 rtan t factors. It also indicates relations hips between economies gained by perceptual coding alone, and those where source coding can trade on signal-specific characteristics to achieve further reductions in bit rate. It COnChdeS with brief consideration of other sensory modalities, such as the tactile dimension, that might contribute to naturalness and ease of use in interactive multimedia information systems.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129242670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Narrowband Sound Localization Related To Acoustical Cues 与声学线索相关的窄带声音定位
J. C. Middlebrooks
When presented with narrowband sound sources, human subjects make characteristic errors in localization that are largely restricted to the vertical dimension. The current study attempts to account for this behavior in terms of the directional characteristics of the head arid external ears. A model is described that effectively predicts the errors in narrowband localization and that can be applied to localization of more general types of sounds.
当面对窄带声源时,人类受试者在定位时会产生特征性错误,这些错误很大程度上局限于垂直维度。目前的研究试图从头部和外耳的方向特征来解释这种行为。本文描述了一个能有效预测窄带定位误差的模型,该模型可以应用于更一般类型声音的定位。
{"title":"Narrowband Sound Localization Related To Acoustical Cues","authors":"J. C. Middlebrooks","doi":"10.1109/ASPAA.1991.634101","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634101","url":null,"abstract":"When presented with narrowband sound sources, human subjects make characteristic errors in localization that are largely restricted to the vertical dimension. The current study attempts to account for this behavior in terms of the directional characteristics of the head arid external ears. A model is described that effectively predicts the errors in narrowband localization and that can be applied to localization of more general types of sounds.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125705873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Aspects In Modeling And Real-time Synthesis Of The Acoustic Guitar 原声吉他建模与实时合成的几个方面
M. Karjalainen, U. Laine, V. Valimaki
This paper will address the problem of modeling the acoustic guitar for real-time synthesis on signal processairs. We will present a scheme for modeling the string for high-quality sound synthesis when the length of Ihe string is changing dynamically. We will focus also on the problem of modeling the body of the guitar for real-time synthesis. Filter-based approaches were experimented by LPC estimation, IIR-filter synthesis and FIR-filter approximation. Perceptual evaluation was used and taken into account. Real-time synthesis was implemented on the TMS32OC30 floating-point signal processor. The presentation includes audio examples. Introduction Computational modeling of musical instruments is an alternative to commonly used and more straightforward sound synthesis techniques like FA4 synthesis and waveform sampling. The traditional,approach 10 efficient modeling of a vibrating string has been to use proper digital filters or transmission lines, see e.g. Kauplus and Strong [l] and its extensions by Jaffe and Smith [2]. These represent "semiphysical" modeling where only some of the most fundamental features of the string, especially the transmission line property, are retained to achieve efficient computation. More complete finite element models and other kinds of physical modeling may lead to very realistic sounds but tend to be computationally too expensive for real-time purposes. Modeling of the guitar body for real-time sound synthesis seems too difficult unless a digital filter approach to approximate the transfer function is used. The derivation of the detailed transfer function from mechanical and acoustical parameters seems impossible. The remaining choice is to estimate the transfer function filter from measurements of a real guitar or to design a filter that approximates the general properties of the real guiltar body. In addition to strings and body the interactions between them (at least between the strings) should be included. String Modeling The natural way of modeling a guitar string is to describe it as a two-directional transmission or delay line (see Fig. la.) where the vibrational waves travel in both directions, reflecting at both ends. If all losses and other nonidealities are reduced to the reflection filters at the end points the computation of the ideal string is efficient by using two delay lines. The next problem is how to approximate the fractional part of the delay to achieve any (non-integer) length of the delay Wine. Allpass filters [2] are considered as a good solution if the string length is fixed. If the length is dynamically varying, however, it is very difficult to avoid transients and glitches when Ihe integer part of the delay line must change its length. E x c i t a t i d po in t
本文将讨论在信号处理上对原声吉他进行实时合成的建模问题。我们将提出一个方案,为高质量的声音合成的弦建模时,弦的长度是动态变化的。我们还将重点放在建模吉他的实时合成的身体的问题。通过LPC估计、IIR-filter合成和FIR-filter逼近实验了基于滤波器的方法。采用并考虑了感性评价。在TMS32OC30浮点信号处理器上实现了实时合成。该演示包括音频示例。乐器的计算建模是一种替代常用和更直接的声音合成技术,如FA4合成和波形采样。传统的、对振动弦进行有效建模的方法是使用适当的数字滤波器或传输线,例如Kauplus和Strong[1]以及Jaffe和Smith[2]的扩展。这些代表了“半物理”建模,其中仅保留了字符串的一些最基本特征,特别是传输线属性,以实现高效的计算。更完整的有限元模型和其他类型的物理建模可能会产生非常逼真的声音,但对于实时目的来说,计算成本往往太高。除非使用近似传递函数的数字滤波方法,否则吉他本体的实时声音合成建模似乎过于困难。从力学和声学参数推导出详细的传递函数似乎是不可能的。剩下的选择是根据真实吉他的测量值来估计传递函数滤波器,或者设计一个接近真实吉他本体一般特性的滤波器。除了字符串和主体之外,它们之间(至少字符串之间)的相互作用也应该包括在内。对吉他弦进行建模的自然方式是将其描述为双向传输或延迟线(见图1),其中振动波在两个方向上传播,在两端反射。如果将所有损耗和其他非理想性都简化为端点处的反射滤波器,则使用两条延迟线计算理想弦是有效的。下一个问题是如何近似延迟的小数部分,以实现延迟Wine的任意(非整数)长度。如果字符串长度固定,全通过滤器[2]被认为是一个很好的解决方案。但是,如果延迟线的长度是动态变化的,那么当延迟线的整数部分必须改变其长度时,很难避免瞬变和故障。E乘以c乘以t,它等于t
{"title":"Aspects In Modeling And Real-time Synthesis Of The Acoustic Guitar","authors":"M. Karjalainen, U. Laine, V. Valimaki","doi":"10.1109/ASPAA.1991.634150","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634150","url":null,"abstract":"This paper will address the problem of modeling the acoustic guitar for real-time synthesis on signal processairs. We will present a scheme for modeling the string for high-quality sound synthesis when the length of Ihe string is changing dynamically. We will focus also on the problem of modeling the body of the guitar for real-time synthesis. Filter-based approaches were experimented by LPC estimation, IIR-filter synthesis and FIR-filter approximation. Perceptual evaluation was used and taken into account. Real-time synthesis was implemented on the TMS32OC30 floating-point signal processor. The presentation includes audio examples. Introduction Computational modeling of musical instruments is an alternative to commonly used and more straightforward sound synthesis techniques like FA4 synthesis and waveform sampling. The traditional,approach 10 efficient modeling of a vibrating string has been to use proper digital filters or transmission lines, see e.g. Kauplus and Strong [l] and its extensions by Jaffe and Smith [2]. These represent \"semiphysical\" modeling where only some of the most fundamental features of the string, especially the transmission line property, are retained to achieve efficient computation. More complete finite element models and other kinds of physical modeling may lead to very realistic sounds but tend to be computationally too expensive for real-time purposes. Modeling of the guitar body for real-time sound synthesis seems too difficult unless a digital filter approach to approximate the transfer function is used. The derivation of the detailed transfer function from mechanical and acoustical parameters seems impossible. The remaining choice is to estimate the transfer function filter from measurements of a real guitar or to design a filter that approximates the general properties of the real guiltar body. In addition to strings and body the interactions between them (at least between the strings) should be included. String Modeling The natural way of modeling a guitar string is to describe it as a two-directional transmission or delay line (see Fig. la.) where the vibrational waves travel in both directions, reflecting at both ends. If all losses and other nonidealities are reduced to the reflection filters at the end points the computation of the ideal string is efficient by using two delay lines. The next problem is how to approximate the fractional part of the delay to achieve any (non-integer) length of the delay Wine. Allpass filters [2] are considered as a good solution if the string length is fixed. If the length is dynamically varying, however, it is very difficult to avoid transients and glitches when Ihe integer part of the delay line must change its length. E x c i t a t i d po in t","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114638829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
New technics based on the wavelet transform for the restoration of old recordings 基于小波变换的旧录音复原新技术
J. Valière, S. Montrésor, J. Allard, M. Baudry
Digital techniques used for the restoration of old recordings are presented in this paper. Three different flaws can be present in old recordings, these being, harmonic distortion, impulsive noise, and background noise. Only the cancellation of the impulsive noise and the reduction of the background noise are considered. In order to cancel the impulsive noise, the corrupted samples are replaced by interpolated samples. An interpolator that uses the information located near the impulsive noise must be achieved. In this paper, two different methods of interpalation are compared. For the reduction of the background noise, we have used a method worked out by Ephraim and Malah that dces riot crzate musical noise. In order to improve the filtering of transcients, a decomposition of the signal in several frequency channels is beforehand perf ormed.
本文介绍了用于恢复旧录音的数字技术。旧录音中有三种不同的缺陷,即谐波失真、脉冲噪声和背景噪声。只考虑了脉冲噪声的消除和背景噪声的抑制。为了消除脉冲噪声,将损坏的样本替换为插值样本。必须实现利用位于脉冲噪声附近的信息的插值器。本文比较了两种不同的插值方法。为了减少背景噪音,我们使用了由Ephraim和Malah提出的一种方法,该方法可以减少暴乱的音乐噪音。为了改善瞬态滤波,事先对信号在多个频率通道中进行分解。
{"title":"New technics based on the wavelet transform for the restoration of old recordings","authors":"J. Valière, S. Montrésor, J. Allard, M. Baudry","doi":"10.1109/ASPAA.1991.634140","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634140","url":null,"abstract":"Digital techniques used for the restoration of old recordings are presented in this paper. Three different flaws can be present in old recordings, these being, harmonic distortion, impulsive noise, and background noise. Only the cancellation of the impulsive noise and the reduction of the background noise are considered. In order to cancel the impulsive noise, the corrupted samples are replaced by interpolated samples. An interpolator that uses the information located near the impulsive noise must be achieved. In this paper, two different methods of interpalation are compared. For the reduction of the background noise, we have used a method worked out by Ephraim and Malah that dces riot crzate musical noise. In order to improve the filtering of transcients, a decomposition of the signal in several frequency channels is beforehand perf ormed.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128942355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Non-Subtractive Dither Non-Subtractive发抖
S. Lipshitz, R. Wannamaker, J. Vanderkooy, J. N. Wright
A mathematical investigation of quantizing systems using non-subtractive dither is presented. It is shown that with a suitably-chosen dither probability density function (pdf), certain moments of the total error can be made signal-independent and the error signal rendered white, but that statistical independence of the error and the input signal is not achievable. Some of these results are known but appear to be unpublished. The earliest references to many of these results are contained in manuscripts by one of the authors [JNW'] but they were later discovered independently by Stockham and Brinton2i3, Lipshitz and Vanderkooy4, and Gray5. In view of many widespread misunderstandings regarding non-subtractive dither, it seems that formal presentation of these results is long overdue.
对非减法抖动量化系统进行了数学研究。结果表明,选择适当的抖动概率密度函数(pdf),可以使总误差的某些矩与信号无关,使误差信号呈现白色,但不能实现误差与输入信号的统计无关。其中一些结果是已知的,但似乎没有发表。其中许多结果的最早参考文献都包含在其中一位作者的手稿中[JNW],但它们后来被斯托克汉姆和布林托恩,利普希茨和范德库伊以及格雷独立发现。鉴于对非减法抖动存在许多广泛的误解,这些结果似乎早就应该正式提出了。
{"title":"Non-Subtractive Dither","authors":"S. Lipshitz, R. Wannamaker, J. Vanderkooy, J. N. Wright","doi":"10.1109/ASPAA.1991.634129","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634129","url":null,"abstract":"A mathematical investigation of quantizing systems using non-subtractive dither is presented. It is shown that with a suitably-chosen dither probability density function (pdf), certain moments of the total error can be made signal-independent and the error signal rendered white, but that statistical independence of the error and the input signal is not achievable. Some of these results are known but appear to be unpublished. The earliest references to many of these results are contained in manuscripts by one of the authors [JNW'] but they were later discovered independently by Stockham and Brinton2i3, Lipshitz and Vanderkooy4, and Gray5. In view of many widespread misunderstandings regarding non-subtractive dither, it seems that formal presentation of these results is long overdue.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132766997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Localization of virtual sound sources synthesized from model HRTFs 由模型hrtf合成的虚拟声源的定位
F. Wightman, D. Kistler
Published data from our laboratory and others suggest that under laboratory conditions human listeners localize virtual sound sources with nearly the same accuracy as they do real sources. The virtual sources in these experiments are digitally synthesized and presented to listeners over headphones. Synthesis of a given virtual source is based on freefield to eardrum acoustical transfer functions ("head-related" transfer functions, or HRTFs) that are measured from both ears of each individual listener. It folllows that synthesis of a virtual auditory space of 265 source locations for each listener requires storage and processing of 530 complex, floating-point HRTFs. If each HRTF is represented by 256 complex spectral values, the total database consists of 271,360 floating-point numbers. Thus, while the perceptual data may argue for the viability of 3-dimensional auditory displays based on the virtual source techniques, the massive data storage and management requirements may impose some practical limitations.
来自我们实验室和其他实验室的公开数据表明,在实验室条件下,人类听众对虚拟声源的定位精度与对真实声源的定位精度几乎相同。这些实验中的虚拟源是数字合成的,并通过耳机呈现给听众。给定虚拟声源的合成是基于自由场到耳膜的声学传递函数(“头部相关”传递函数,或hrtf),该函数是从每个听众的双耳测量的。因此,为每个听众合成265个源位置的虚拟听觉空间需要存储和处理530个复杂的浮点hrtf。如果每个HRTF由256个复谱值表示,则整个数据库由271,360个浮点数组成。因此,虽然感知数据可能支持基于虚拟源技术的三维听觉显示的可行性,但大量数据存储和管理要求可能会施加一些实际限制。
{"title":"Localization of virtual sound sources synthesized from model HRTFs","authors":"F. Wightman, D. Kistler","doi":"10.1109/ASPAA.1991.634100","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634100","url":null,"abstract":"Published data from our laboratory and others suggest that under laboratory conditions human listeners localize virtual sound sources with nearly the same accuracy as they do real sources. The virtual sources in these experiments are digitally synthesized and presented to listeners over headphones. Synthesis of a given virtual source is based on freefield to eardrum acoustical transfer functions (\"head-related\" transfer functions, or HRTFs) that are measured from both ears of each individual listener. It folllows that synthesis of a virtual auditory space of 265 source locations for each listener requires storage and processing of 530 complex, floating-point HRTFs. If each HRTF is represented by 256 complex spectral values, the total database consists of 271,360 floating-point numbers. Thus, while the perceptual data may argue for the viability of 3-dimensional auditory displays based on the virtual source techniques, the massive data storage and management requirements may impose some practical limitations.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130926728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Auditory Images As Input For Speech Recognition Systems 听觉图像作为语音识别系统的输入
R. Patterson, J. Holdsworth, P. Thurston, T. Robinson
Over the past decade, hearing scientists have developed a number of time-domain models of the processing performed by the cochlea in an effort to develop a reasonably accurate multi-channel representation of the pattern of neural activity flowing from the cochlea up the auditory nerve to the cochlear nucleus [l]. It is often assumed that peripheral auditory processing ends at the output of the cochlea and that the pattern of activity in the auditory nerve is in some sense what we hear. In reality, this neural activity pattern (NAP) is not a good representation of our auditory sensations because it includes phase differences that we do riot hear and it does not include auditory temporal integration (TI). As a result, several of the models have been extended to include periodicity-sensitive TI [2], [3], [4] which converts the fast-flowing neural activity pattern into a form that is much more like the auditory images we experience in response to sounds. When these models are applied to speech sounds, the auditory images of vowels reveal an elaborate formant structure that is absent in the more traditional representation of speech -the spectrogram. An example is presented on the left in the figure; it is the auditory image of the stationary part of the vowel /ae/ as in 'bab' [4]. The abscissa of the auditory image is 'temporal integration interval' and each line of the image shows the activity in one frequency channel of the auditory model. In general terms, activity on a vertical line in the auditory image shows that there is a correlation in the sound at that temporal interval. The coincentrations of activity are the formants of the vowel.
在过去的十年里,听力科学家已经开发了许多耳蜗处理过程的时域模型,试图对从耳蜗沿听神经流向耳蜗核的神经活动模式建立一个合理准确的多通道表示[1]。人们通常认为外周听觉处理结束于耳蜗的输出,并且在某种意义上,听觉神经的活动模式就是我们听到的。实际上,这种神经活动模式(NAP)并不能很好地代表我们的听觉,因为它包含了我们听不到的相位差,而且它不包括听觉时间整合(TI)。因此,一些模型已经扩展到包括周期性敏感的TI[2],[3],[4],它将快速流动的神经活动模式转换成一种更像我们在响应声音时所经历的听觉图像的形式。当这些模型应用于语音时,元音的听觉图像揭示了一个复杂的形成峰结构,这在更传统的语音表示中是不存在的——谱图。在图的左边给出了一个例子;它是元音/ae/静止部分的听觉形象,如在'bab'[4]中。听觉图像的横坐标是“时间整合间隔”,图像的每一行表示听觉模型的一个频率通道的活动。一般来说,听觉图像中竖线上的活动表明在那个时间间隔内声音之间存在相关性。活动的集中是元音的共振峰。
{"title":"Auditory Images As Input For Speech Recognition Systems","authors":"R. Patterson, J. Holdsworth, P. Thurston, T. Robinson","doi":"10.1109/ASPAA.1991.634090","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634090","url":null,"abstract":"Over the past decade, hearing scientists have developed a number of time-domain models of the processing performed by the cochlea in an effort to develop a reasonably accurate multi-channel representation of the pattern of neural activity flowing from the cochlea up the auditory nerve to the cochlear nucleus [l]. It is often assumed that peripheral auditory processing ends at the output of the cochlea and that the pattern of activity in the auditory nerve is in some sense what we hear. In reality, this neural activity pattern (NAP) is not a good representation of our auditory sensations because it includes phase differences that we do riot hear and it does not include auditory temporal integration (TI). As a result, several of the models have been extended to include periodicity-sensitive TI [2], [3], [4] which converts the fast-flowing neural activity pattern into a form that is much more like the auditory images we experience in response to sounds. When these models are applied to speech sounds, the auditory images of vowels reveal an elaborate formant structure that is absent in the more traditional representation of speech -the spectrogram. An example is presented on the left in the figure; it is the auditory image of the stationary part of the vowel /ae/ as in 'bab' [4]. The abscissa of the auditory image is 'temporal integration interval' and each line of the image shows the activity in one frequency channel of the auditory model. In general terms, activity on a vertical line in the auditory image shows that there is a correlation in the sound at that temporal interval. The coincentrations of activity are the formants of the vowel.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125117711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A CD-Quality Audio and Color Still Image Multi-Media Platform using the DSP32C 基于DSP32C的cd级音频和彩色静态图像多媒体平台
S. Quackenbush
The paper describes a multi-media database browser based on a AT&T 386/SX PC with a DSP32C coprocessor. Each item in the multi-media database consists of a 512 by 480 pixel color still image and a 20 Hz to 20 kHz monophonic audio signal that can have arbitrary duration. Compressed image and audio signals are stored in a database and are retrieved through a communications channel, decoded using the DSP32C, and displayed and played. The channel could be the PC backplane, a local or wide area network, or a basic rate ISDN telecommunications link. Audio is compressed 6 1 (2.67 bits/sample at 48 lrHz sampling rate) and produces a reconstructed signal that is indistinguishable from the original when heard over a loudspeaker. Image compression is datadependent and ranges from 20:l to 5O:l (1.2 to 0.5 bits per pixel) for reconstructed images with negligible distortion.
本文介绍了一种基于AT&T 386/SX PC机和DSP32C协处理器的多媒体数据库浏览器。多媒体数据库中的每个项目由512 × 480像素的彩色静止图像和20 Hz至20 kHz的单音音频信号组成,该信号可以具有任意持续时间。压缩后的图像和音频信号存储在数据库中,通过通信通道检索,使用DSP32C解码,显示和播放。通道可以是PC背板,本地或广域网,或基本速率ISDN电信链路。音频被压缩6 1(2.67比特/采样在48 lrHz采样率),并产生一个重建的信号,是无法区分从扬声器听到的原始信号。图像压缩是数据相关的,对于失真可以忽略不计的重建图像,其范围从20:1到50:1(每像素1.2到0.5比特)。
{"title":"A CD-Quality Audio and Color Still Image Multi-Media Platform using the DSP32C","authors":"S. Quackenbush","doi":"10.1109/ASPAA.1991.634115","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634115","url":null,"abstract":"The paper describes a multi-media database browser based on a AT&T 386/SX PC with a DSP32C coprocessor. Each item in the multi-media database consists of a 512 by 480 pixel color still image and a 20 Hz to 20 kHz monophonic audio signal that can have arbitrary duration. Compressed image and audio signals are stored in a database and are retrieved through a communications channel, decoded using the DSP32C, and displayed and played. The channel could be the PC backplane, a local or wide area network, or a basic rate ISDN telecommunications link. Audio is compressed 6 1 (2.67 bits/sample at 48 lrHz sampling rate) and produces a reconstructed signal that is indistinguishable from the original when heard over a loudspeaker. Image compression is datadependent and ranges from 20:l to 5O:l (1.2 to 0.5 bits per pixel) for reconstructed images with negligible distortion.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122021790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auditory Processing with spatio-temporal codes 听觉加工与时空编码
F. Berthommier, J. Schwartz
A B S T R A C 7 We are currently developing a model of auditory processing including several specialized modules connected partly in series, partly in parallel. Signal is first decomposed between frequency channels in the cochlea, and transduced into spike trains which are then directed towards auditory centres where they are processed by neurons with various response characteristics, with either a preference for tonic behavior synchronized on the frequency components of the incident stimulation, or for phasic responses. A number of signal characteristics are exhaustively mapped, such as frequency, amplitude modulation, intensity, interaural delays or timing between acoustic events, and these intermediary representations further converge towards decoding networks. We insist all along this pathway on the necessity to cope with the intrinsic temporal characteristics of the spike trains, and we introduce processing mechanisms based on coincidence computations, which could dcal with both time and space in a natural sa)’.
我们目前正在开发一种听觉处理模型,包括几个部分串联,部分并行的专门模块。信号首先在耳蜗的频率通道之间分解,然后转导成尖峰序列,然后定向到听觉中心,在那里它们由具有各种反应特征的神经元处理,这些神经元要么偏爱与事件刺激的频率成分同步的紧张性行为,要么偏爱相位反应。许多信号特征被详尽地映射,如频率、幅度调制、强度、声间延迟或声学事件之间的时间,这些中间表示进一步收敛于解码网络。在这条路径上,我们始终坚持必须处理尖峰序列固有的时间特征,并且我们引入了基于巧合计算的处理机制,该机制可以在自然的时间和空间中调用。
{"title":"Auditory Processing with spatio-temporal codes","authors":"F. Berthommier, J. Schwartz","doi":"10.1109/ASPAA.1991.634093","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634093","url":null,"abstract":"A B S T R A C 7 We are currently developing a model of auditory processing including several specialized modules connected partly in series, partly in parallel. Signal is first decomposed between frequency channels in the cochlea, and transduced into spike trains which are then directed towards auditory centres where they are processed by neurons with various response characteristics, with either a preference for tonic behavior synchronized on the frequency components of the incident stimulation, or for phasic responses. A number of signal characteristics are exhaustively mapped, such as frequency, amplitude modulation, intensity, interaural delays or timing between acoustic events, and these intermediary representations further converge towards decoding networks. We insist all along this pathway on the necessity to cope with the intrinsic temporal characteristics of the spike trains, and we introduce processing mechanisms based on coincidence computations, which could dcal with both time and space in a natural sa)’.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127750800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Noise Cancellation for Hearing-Aid Application 助听器应用的自适应噪声消除
H. Levitt, T. Schwander, M. Weiss
Noise reduction systems using two or more microphones are generally more effectj-ve than single-microphone systems. Under ideal conditions, an adaptive two-microphone system with one microphone placed at, the noise source can achieve perfect cancellation. For hearing-aid applications it is not usually practical to place a microphone at or near the noise source. It is possible, however, to mount both microphones on the head with a directional microphone facing the noise source and an omnidirectional microphone picking up speech plus noise. In practice, there is continual movement of the head relative to the speech and noise sourc:es which may adversely affect the adaptive cancellation algorithm. Another practical problem is that of room reverberation. A head-mounted two microphone adaptive noise cancellation system was evaluated experimentally in an anechoic chamber and in rooms, with reverberation times of up to 0.6 seconds. Significant improvements in speech intelligibility were obtained with both normal-hearing and hearing-impaired listeners.
使用两个或更多麦克风的降噪系统通常比单麦克风系统更有效。在理想条件下,自适应双传声器系统在噪声源处放置一个传声器,可以实现完美的降噪。对于助听器应用来说,在噪声源处或附近放置麦克风通常是不实际的。但是,可以将两个麦克风都安装在头上,其中一个面向噪声源的方向麦克风和一个拾取语音和噪声的全向麦克风。在实践中,头部相对于语音和噪声源的持续运动可能会对自适应抵消算法产生不利影响。另一个实际问题是室内混响。对头戴式双麦克风自适应消噪系统进行了消声室和室内实验,混响时间可达0.6秒。听力正常和听力受损的听者在言语清晰度方面都有显著改善。
{"title":"Adaptive Noise Cancellation for Hearing-Aid Application","authors":"H. Levitt, T. Schwander, M. Weiss","doi":"10.1109/ASPAA.1991.634119","DOIUrl":"https://doi.org/10.1109/ASPAA.1991.634119","url":null,"abstract":"Noise reduction systems using two or more microphones are generally more effectj-ve than single-microphone systems. Under ideal conditions, an adaptive two-microphone system with one microphone placed at, the noise source can achieve perfect cancellation. For hearing-aid applications it is not usually practical to place a microphone at or near the noise source. It is possible, however, to mount both microphones on the head with a directional microphone facing the noise source and an omnidirectional microphone picking up speech plus noise. In practice, there is continual movement of the head relative to the speech and noise sourc:es which may adversely affect the adaptive cancellation algorithm. Another practical problem is that of room reverberation. A head-mounted two microphone adaptive noise cancellation system was evaluated experimentally in an anechoic chamber and in rooms, with reverberation times of up to 0.6 seconds. Significant improvements in speech intelligibility were obtained with both normal-hearing and hearing-impaired listeners.","PeriodicalId":146017,"journal":{"name":"Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121258435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Final Program and Paper Summaries 1991 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1