{"title":"视听虚拟3D场景的描述:听觉域的MPEG-4感知参数","authors":"A. Dantele, U. Reiter","doi":"10.1109/ISCE.2004.1375910","DOIUrl":null,"url":null,"abstract":"A high level of immei-siwi cart he provided f o r the user of virtrial irudiovis~rul erivirorimeiits when sorrnd and visuirl irnlwessiori get coordinated on 11 high quality level. Tlierefore. a coniprehensive scene rlescription Iangrruge is rieeiieil for both. the auditory and the visital purt. The mrrltiriredia sr(rfiilard MPEG-4 provides a powerfiil tool-set f o r the sceiie decription of 2D ond SD virtiiiil environments. fiir the undio part, apart from a coriventiorial yhysicul description, u novel approach is available which is based on perceptual pnrameters which hove been derived from psycho-acoustic e.rperirnents. The practical qualijcarion of this method is discussed when applied to auditory und audiovisual 30 scenes. Enhancements of-e proposed to an cxample application of the perceptrial upproacli which is included in the MPEG-4 stanrlurd arid an implementution f o r 30 rrrrdio rendering is introduced. Index Terms Auditory Scene Description, MPEG-4, Perceptual Parameters, Virtual Acoustics 1. AUDIOVISUAL SCENE DESCRIPTION I MPEG-4 Moving Picture Expens Group T E P E G ) has established novel approaches for the coding of multimedia content in the international standard MPEG-4. Auditory, visual and other content is subdivided into media objects which together build a 2D or 3D scene. Thus the most efficient coding scheme for each object can be chosen according to its type of media, e.g. video, audio, graphics, etc. [I]. For the combination of the objects MPEG-4 provides a powerful tool-set for scene description, the so-called BIFS (Binary Format for Scene Description) [ 2 ] . Here all the elements describing media objects and their properties are put together as nodes in a scene graph. The resulting structure reflects the mutual dependency of the single objects. This concept is based on the scene graph of the Virtual Reality Modeling Language (VRML) standard [3]. The audio part of this scene description (AudioBIFSj allows to specify the behavior of sound emitting objects in the scene (e.g. their position, level, directivity). These basic fuuctionalities have been extended in version 2 of AudioBIFS where new nodes. mainly for virtual acoustics in a 3D ‘This work was conductcd in the research group IAVAS (Intcmztivu AudioVirual Application Systmmr) which i s funded by lhr Thuringim Minisuy 01 Scicnce. Resrmh and thc Ans. Erlun. Germany. Andrcns Dnnlele and Ulnch Rcitcr are with 1hr Institute of Media Technology at 1he Technischc Univcrsicil Ilmcnau. 0.98614 Ilmenau. Germany (e-mail: andruas.dantrIr@lu-iImennu.dc. uhch.ruitcr@tuilnlmnu.duJ. environment, have been added [4]. These are often referred to as Advanced AudioBlFS (AABIFS) and are of main interest for the work described here. In general, the auralization of virtual scenes not only has to reproduce sound sources which are placed in the scenery but also to add ambient sound effects like reverberation. Thus the user can feel the surrounding virtual space by listening to the acoustic cues. The auditory impression for a listener in a real room can he described thoroughly by an impulse response. This represents the acoustic transfer function for a given pair of source and listener at their particular locations in a certain room [5]. When convolving a unreverberated source signal with an impulse response. the output yields a reverberated sound signal. The result sounds as if perceived at the specific location in the room, where the impulse response has been recorded. The characteristic features of any impulse response can he extracted from the temporal distribution and the energy content of its characteristic components, which are the direct sound, the early reflections, and the late reverberation. For a virtual space the task of auralization can he defined as the synthesis of these components in order to model an artificial impulse response. The main problem is to derive a suitable parameter set from virtual scene description which can he used to synthesize a desired impulse response. When using Advanced AudioBIFS the author of a virtual scene can choose among two different approaches of auditoly modelling: ( I ) Based on physical properties like frequency dependent directivity of sound sources and absorption coefficients of material, it is possible to specify what the resulting sound impression should he like. ( 2 ) Another approach is based on psychoacoustic parameters which express the acoustic sensation perceived by the user. Therefore, a set of perceptual parameters which iue based on psycho-experimental research has been introduced 161. 11. THE PERCEPTUAL APPROACH OF MPEG-4 This approach is quite challenging, since the scope is shifted towards the user and thus to the perception of the human senses. Therefore, parameters have to he found which satisfy human needs and which can easily he explained and understood, even with little theoretical knowledge of sound propagation or electroacoustics. Although the field of auditory scene analysis in psychoacoustics is well elaborated, a widely spread language of subjective attributes has not emerged from it yet (but the discussion is going on, e.g. [7]). Nevertheless, the underlying technique of MPEG-4 perceptual approach is already used, e.g. for creating a virtual acoustic space [XI. We want to take a closer look at the perceptual approach given in the MPEG-4 standard, because it can be useful 0-7803-8526-81041S20.00 02004 IEEE 87 especially for audiovisual environments: the reproduction of audiovisual applications is often exhausting in terms of processing power, and the visual and graphic p m consumes most of it. This is especially true for interactive systems, which have to react in real-time to every new user demand. So for the rendering of the auditory part it is sometimes not possible to render a very detailed description of frequency dependent behavior of sound sources or of reflections within virtual acoustics. Thus. instead of a definite and thus measurable description which can not be satisfied, a perceptual description with emphasis on the subjective quality of a perceived acoustic sensation seems to be more appropriate. A. Peir.eptua1 Parameters In the MPEG-4 perceptual approach a set of nine parameters has been chosen which should enable the author of an auditory scene to thoroughly describe the acoustic impression of a sound event. These are high level parameters which are based on human perception. As will be explained in the following, they are related to objective criteria which correspond to the characteristic features of an impulse response. The perceptual parameters can be divided into three groups: I ) Source-related attributes: 2) Room-related attributes: 3) SourceRoom interaction: SowcePresence, Sourre Warmth, and SourceBrilliance DlteReverberance, Heaviness, Liveness RoomPresencc. RunningReverl,erance, Envelopment The first group covers propelties which are directly connected to the source and thus to the impression of the direct sound this includes the amount of energy related to the direct sound (SonrcePresence. which gives the listener a clue about distance and directivity of the source) as well as the early amount of energy in low (SoarceWarmth) and high (SorrrceBrilliance) frequency bands. In the second group features of the surrounding acoustical space are put together. which cover the damping propenies during the late reverberation (LnteReverberance) and the relative damping properties for low (Heuvinrss) and high (Livencss) frequency bands. The third group contains parameters describing the behavior of a sound source within the room: the distribution of late energy in the room (RoomPrcsence), the early decay time (RuiiriingReverbernnce), and the early energy distribution within the room in relation to the direct sound (Enwlopment).","PeriodicalId":169376,"journal":{"name":"IEEE International Symposium on Consumer Electronics, 2004","volume":"30 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Description of audiovisual virtual 3D scenes: MPEG-4 perceptual parameters in the auditory domain\",\"authors\":\"A. Dantele, U. Reiter\",\"doi\":\"10.1109/ISCE.2004.1375910\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A high level of immei-siwi cart he provided f o r the user of virtrial irudiovis~rul erivirorimeiits when sorrnd and visuirl irnlwessiori get coordinated on 11 high quality level. Tlierefore. a coniprehensive scene rlescription Iangrruge is rieeiieil for both. the auditory and the visital purt. The mrrltiriredia sr(rfiilard MPEG-4 provides a powerfiil tool-set f o r the sceiie decription of 2D ond SD virtiiiil environments. fiir the undio part, apart from a coriventiorial yhysicul description, u novel approach is available which is based on perceptual pnrameters which hove been derived from psycho-acoustic e.rperirnents. The practical qualijcarion of this method is discussed when applied to auditory und audiovisual 30 scenes. Enhancements of-e proposed to an cxample application of the perceptrial upproacli which is included in the MPEG-4 stanrlurd arid an implementution f o r 30 rrrrdio rendering is introduced. Index Terms Auditory Scene Description, MPEG-4, Perceptual Parameters, Virtual Acoustics 1. AUDIOVISUAL SCENE DESCRIPTION I MPEG-4 Moving Picture Expens Group T E P E G ) has established novel approaches for the coding of multimedia content in the international standard MPEG-4. Auditory, visual and other content is subdivided into media objects which together build a 2D or 3D scene. Thus the most efficient coding scheme for each object can be chosen according to its type of media, e.g. video, audio, graphics, etc. [I]. For the combination of the objects MPEG-4 provides a powerful tool-set for scene description, the so-called BIFS (Binary Format for Scene Description) [ 2 ] . Here all the elements describing media objects and their properties are put together as nodes in a scene graph. The resulting structure reflects the mutual dependency of the single objects. This concept is based on the scene graph of the Virtual Reality Modeling Language (VRML) standard [3]. The audio part of this scene description (AudioBIFSj allows to specify the behavior of sound emitting objects in the scene (e.g. their position, level, directivity). These basic fuuctionalities have been extended in version 2 of AudioBIFS where new nodes. mainly for virtual acoustics in a 3D ‘This work was conductcd in the research group IAVAS (Intcmztivu AudioVirual Application Systmmr) which i s funded by lhr Thuringim Minisuy 01 Scicnce. Resrmh and thc Ans. Erlun. Germany. Andrcns Dnnlele and Ulnch Rcitcr are with 1hr Institute of Media Technology at 1he Technischc Univcrsicil Ilmcnau. 0.98614 Ilmenau. Germany (e-mail: andruas.dantrIr@lu-iImennu.dc. uhch.ruitcr@tuilnlmnu.duJ. environment, have been added [4]. These are often referred to as Advanced AudioBlFS (AABIFS) and are of main interest for the work described here. In general, the auralization of virtual scenes not only has to reproduce sound sources which are placed in the scenery but also to add ambient sound effects like reverberation. Thus the user can feel the surrounding virtual space by listening to the acoustic cues. The auditory impression for a listener in a real room can he described thoroughly by an impulse response. This represents the acoustic transfer function for a given pair of source and listener at their particular locations in a certain room [5]. When convolving a unreverberated source signal with an impulse response. the output yields a reverberated sound signal. The result sounds as if perceived at the specific location in the room, where the impulse response has been recorded. The characteristic features of any impulse response can he extracted from the temporal distribution and the energy content of its characteristic components, which are the direct sound, the early reflections, and the late reverberation. For a virtual space the task of auralization can he defined as the synthesis of these components in order to model an artificial impulse response. The main problem is to derive a suitable parameter set from virtual scene description which can he used to synthesize a desired impulse response. When using Advanced AudioBIFS the author of a virtual scene can choose among two different approaches of auditoly modelling: ( I ) Based on physical properties like frequency dependent directivity of sound sources and absorption coefficients of material, it is possible to specify what the resulting sound impression should he like. ( 2 ) Another approach is based on psychoacoustic parameters which express the acoustic sensation perceived by the user. Therefore, a set of perceptual parameters which iue based on psycho-experimental research has been introduced 161. 11. THE PERCEPTUAL APPROACH OF MPEG-4 This approach is quite challenging, since the scope is shifted towards the user and thus to the perception of the human senses. Therefore, parameters have to he found which satisfy human needs and which can easily he explained and understood, even with little theoretical knowledge of sound propagation or electroacoustics. Although the field of auditory scene analysis in psychoacoustics is well elaborated, a widely spread language of subjective attributes has not emerged from it yet (but the discussion is going on, e.g. [7]). Nevertheless, the underlying technique of MPEG-4 perceptual approach is already used, e.g. for creating a virtual acoustic space [XI. We want to take a closer look at the perceptual approach given in the MPEG-4 standard, because it can be useful 0-7803-8526-81041S20.00 02004 IEEE 87 especially for audiovisual environments: the reproduction of audiovisual applications is often exhausting in terms of processing power, and the visual and graphic p m consumes most of it. This is especially true for interactive systems, which have to react in real-time to every new user demand. So for the rendering of the auditory part it is sometimes not possible to render a very detailed description of frequency dependent behavior of sound sources or of reflections within virtual acoustics. Thus. instead of a definite and thus measurable description which can not be satisfied, a perceptual description with emphasis on the subjective quality of a perceived acoustic sensation seems to be more appropriate. A. Peir.eptua1 Parameters In the MPEG-4 perceptual approach a set of nine parameters has been chosen which should enable the author of an auditory scene to thoroughly describe the acoustic impression of a sound event. These are high level parameters which are based on human perception. As will be explained in the following, they are related to objective criteria which correspond to the characteristic features of an impulse response. The perceptual parameters can be divided into three groups: I ) Source-related attributes: 2) Room-related attributes: 3) SourceRoom interaction: SowcePresence, Sourre Warmth, and SourceBrilliance DlteReverberance, Heaviness, Liveness RoomPresencc. RunningReverl,erance, Envelopment The first group covers propelties which are directly connected to the source and thus to the impression of the direct sound this includes the amount of energy related to the direct sound (SonrcePresence. which gives the listener a clue about distance and directivity of the source) as well as the early amount of energy in low (SoarceWarmth) and high (SorrrceBrilliance) frequency bands. In the second group features of the surrounding acoustical space are put together. which cover the damping propenies during the late reverberation (LnteReverberance) and the relative damping properties for low (Heuvinrss) and high (Livencss) frequency bands. The third group contains parameters describing the behavior of a sound source within the room: the distribution of late energy in the room (RoomPrcsence), the early decay time (RuiiriingReverbernnce), and the early energy distribution within the room in relation to the direct sound (Enwlopment).\",\"PeriodicalId\":169376,\"journal\":{\"name\":\"IEEE International Symposium on Consumer Electronics, 2004\",\"volume\":\"30 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on Consumer Electronics, 2004\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCE.2004.1375910\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Consumer Electronics, 2004","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCE.2004.1375910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
当声音和视觉信息在11个高质量的水平上得到协调时,为用户提供了高水平的实时视觉体验。Tlierefore。全面的场景描述愤怒对两者都是有益的。听觉部分和视觉部分。多媒体标准和MPEG-4为科学地描述2D和SD虚拟环境提供了一个强大的工具集。在音频部分,除了传统的物理描述外,还提出了一种基于从心理声学实验中获得的感知参数的新方法。讨论了该方法应用于听觉和视听场景时的实用性。对MPEG-4标准中包含的感知提升方法的一个应用实例进行了改进,并介绍了一种30倍频的无线渲染的实现。索引术语听觉场景描述,MPEG-4,感知参数,虚拟声学视听场景描述MPEG-4运动图像费用组(MPEG-4)在国际标准MPEG-4中为多媒体内容的编码建立了新的方法。听觉、视觉和其他内容被细分为媒体对象,共同构建2D或3D场景。因此,可以根据每个对象的媒体类型选择最有效的编码方案,例如视频、音频、图形等[1]。对于对象的组合,MPEG-4提供了一个强大的场景描述工具集,即所谓的BIFS(场景描述二进制格式)[2]。在这里,描述媒体对象及其属性的所有元素都作为场景图中的节点放在一起。得到的结构反映了单个对象的相互依赖性。这个概念是基于虚拟现实建模语言(Virtual Reality Modeling Language, VRML)标准的场景图[3]。场景描述的音频部分(AudioBIFSj)允许指定场景中声音发射对象的行为(例如它们的位置,水平,方向性)。这些基本功能在AudioBIFS版本2中得到了扩展,其中新增了节点。这项工作是在IAVAS (Intcmztivu audiovirtual Application system,视听应用系统)研究小组进行的,该小组由图林海姆省01科学部资助。Resrmh and the Ans. Erlun。德国。andrns dnnele和ulncitr与1工业大学媒体技术研究所,0.98614 Ilmenau。德国(电子邮件:andruas.dantrIr@lu-iImennu.dc)。uhch.ruitcr@tuilnlmnu.duJ。环境,已被添加[4]。这些通常被称为高级audiblfs (AABIFS),并且是本文描述的工作的主要兴趣所在。一般来说,虚拟场景的听觉化不仅需要重现场景中的声源,还需要添加混响等环境音效。因此,用户可以通过听声音线索来感受周围的虚拟空间。在真实的房间里,听众的听觉印象可以用脉冲反应来完全描述。这表示在特定房间中给定的一对声源和听者在其特定位置的声学传递函数[5]。当将非混响源信号与脉冲响应进行卷积时。输出产生混响的声音信号。结果听起来像是在房间的特定位置感知到的,在那里脉冲响应被记录下来。任何脉冲响应的特征特征都可以从其特征分量的时间分布和能量含量中提取出来,这些特征分量是直接声、早期反射声和晚期混响声。对于虚拟空间,听觉化任务可以定义为这些组件的综合,以模拟人工脉冲响应。主要问题是如何从虚拟场景描述中得到一个合适的参数集来合成期望的脉冲响应。当使用Advanced AudioBIFS时,虚拟场景的作者可以在两种不同的听觉建模方法中进行选择:(I)基于声源的频率依赖指向性和材料的吸收系数等物理属性,可以指定他喜欢的声音印象。(2)另一种方法是基于心理声学参数,它表达了用户感知到的声学感觉。因此,一套基于心理实验研究的知觉参数被引入。11. MPEG-4的感知方法这种方法相当具有挑战性,因为范围转向了用户,从而转向了人类感官的感知。因此,即使没有多少声音传播或电声学的理论知识,也必须找到既能满足人类需求,又易于解释和理解的参数。 虽然心理声学中的听觉场景分析领域已经得到了很好的阐述,但尚未出现一种广泛传播的主观属性语言(但讨论仍在进行,例如[7])。尽管如此,MPEG-4感知方法的基础技术已经被使用,例如用于创建虚拟声学空间[11]。我们想仔细研究一下MPEG-4标准中给出的感知方法,因为它特别适用于视听环境:视听应用程序的再现通常会耗尽处理能力,而视觉和图形处理消耗了大部分处理能力。这对于必须实时响应每个新用户需求的交互式系统来说尤其如此。因此,对于听觉部分的渲染,有时不可能非常详细地描述声源或虚拟声学中反射的频率依赖行为。因此。而不是一个明确的,因此可测量的描述,不能满足,感性的描述,强调主观质量的感知的声音感觉似乎更合适。答:Peir。在MPEG-4感知方法中,选择了一组9个参数,这些参数应使听觉场景的作者能够彻底描述声音事件的声学印象。这些是基于人类感知的高级参数。正如下面将解释的那样,它们与与脉冲响应的特征相对应的客观标准有关。感知参数可分为三组:1)与源相关的属性;2)与房间相关的属性;3)与源交互:源的存在感、源的温暖度和源的亮度;第一组涵盖了与声源直接相关的属性,因此与直接声音的印象有关,这包括与直接声音相关的能量量(SonrcePresence)。这给听者提供了关于源的距离和方向性的线索,以及低(soarcewarm)和高(SorrrceBrilliance)频段的早期能量。在第二组中,周围声学空间的特征被放在一起。它涵盖了晚混响(intertereverance)期间的阻尼特性以及低(Heuvinrss)和高(Livencss)频段的相对阻尼特性。第三组包含描述房间内声源行为的参数:房间内晚期能量分布(roomprence),早期衰减时间(ruiiriingreverbernence),以及与直接声音相关的房间内早期能量分布(enwrap)。
Description of audiovisual virtual 3D scenes: MPEG-4 perceptual parameters in the auditory domain
A high level of immei-siwi cart he provided f o r the user of virtrial irudiovis~rul erivirorimeiits when sorrnd and visuirl irnlwessiori get coordinated on 11 high quality level. Tlierefore. a coniprehensive scene rlescription Iangrruge is rieeiieil for both. the auditory and the visital purt. The mrrltiriredia sr(rfiilard MPEG-4 provides a powerfiil tool-set f o r the sceiie decription of 2D ond SD virtiiiil environments. fiir the undio part, apart from a coriventiorial yhysicul description, u novel approach is available which is based on perceptual pnrameters which hove been derived from psycho-acoustic e.rperirnents. The practical qualijcarion of this method is discussed when applied to auditory und audiovisual 30 scenes. Enhancements of-e proposed to an cxample application of the perceptrial upproacli which is included in the MPEG-4 stanrlurd arid an implementution f o r 30 rrrrdio rendering is introduced. Index Terms Auditory Scene Description, MPEG-4, Perceptual Parameters, Virtual Acoustics 1. AUDIOVISUAL SCENE DESCRIPTION I MPEG-4 Moving Picture Expens Group T E P E G ) has established novel approaches for the coding of multimedia content in the international standard MPEG-4. Auditory, visual and other content is subdivided into media objects which together build a 2D or 3D scene. Thus the most efficient coding scheme for each object can be chosen according to its type of media, e.g. video, audio, graphics, etc. [I]. For the combination of the objects MPEG-4 provides a powerful tool-set for scene description, the so-called BIFS (Binary Format for Scene Description) [ 2 ] . Here all the elements describing media objects and their properties are put together as nodes in a scene graph. The resulting structure reflects the mutual dependency of the single objects. This concept is based on the scene graph of the Virtual Reality Modeling Language (VRML) standard [3]. The audio part of this scene description (AudioBIFSj allows to specify the behavior of sound emitting objects in the scene (e.g. their position, level, directivity). These basic fuuctionalities have been extended in version 2 of AudioBIFS where new nodes. mainly for virtual acoustics in a 3D ‘This work was conductcd in the research group IAVAS (Intcmztivu AudioVirual Application Systmmr) which i s funded by lhr Thuringim Minisuy 01 Scicnce. Resrmh and thc Ans. Erlun. Germany. Andrcns Dnnlele and Ulnch Rcitcr are with 1hr Institute of Media Technology at 1he Technischc Univcrsicil Ilmcnau. 0.98614 Ilmenau. Germany (e-mail: andruas.dantrIr@lu-iImennu.dc. uhch.ruitcr@tuilnlmnu.duJ. environment, have been added [4]. These are often referred to as Advanced AudioBlFS (AABIFS) and are of main interest for the work described here. In general, the auralization of virtual scenes not only has to reproduce sound sources which are placed in the scenery but also to add ambient sound effects like reverberation. Thus the user can feel the surrounding virtual space by listening to the acoustic cues. The auditory impression for a listener in a real room can he described thoroughly by an impulse response. This represents the acoustic transfer function for a given pair of source and listener at their particular locations in a certain room [5]. When convolving a unreverberated source signal with an impulse response. the output yields a reverberated sound signal. The result sounds as if perceived at the specific location in the room, where the impulse response has been recorded. The characteristic features of any impulse response can he extracted from the temporal distribution and the energy content of its characteristic components, which are the direct sound, the early reflections, and the late reverberation. For a virtual space the task of auralization can he defined as the synthesis of these components in order to model an artificial impulse response. The main problem is to derive a suitable parameter set from virtual scene description which can he used to synthesize a desired impulse response. When using Advanced AudioBIFS the author of a virtual scene can choose among two different approaches of auditoly modelling: ( I ) Based on physical properties like frequency dependent directivity of sound sources and absorption coefficients of material, it is possible to specify what the resulting sound impression should he like. ( 2 ) Another approach is based on psychoacoustic parameters which express the acoustic sensation perceived by the user. Therefore, a set of perceptual parameters which iue based on psycho-experimental research has been introduced 161. 11. THE PERCEPTUAL APPROACH OF MPEG-4 This approach is quite challenging, since the scope is shifted towards the user and thus to the perception of the human senses. Therefore, parameters have to he found which satisfy human needs and which can easily he explained and understood, even with little theoretical knowledge of sound propagation or electroacoustics. Although the field of auditory scene analysis in psychoacoustics is well elaborated, a widely spread language of subjective attributes has not emerged from it yet (but the discussion is going on, e.g. [7]). Nevertheless, the underlying technique of MPEG-4 perceptual approach is already used, e.g. for creating a virtual acoustic space [XI. We want to take a closer look at the perceptual approach given in the MPEG-4 standard, because it can be useful 0-7803-8526-81041S20.00 02004 IEEE 87 especially for audiovisual environments: the reproduction of audiovisual applications is often exhausting in terms of processing power, and the visual and graphic p m consumes most of it. This is especially true for interactive systems, which have to react in real-time to every new user demand. So for the rendering of the auditory part it is sometimes not possible to render a very detailed description of frequency dependent behavior of sound sources or of reflections within virtual acoustics. Thus. instead of a definite and thus measurable description which can not be satisfied, a perceptual description with emphasis on the subjective quality of a perceived acoustic sensation seems to be more appropriate. A. Peir.eptua1 Parameters In the MPEG-4 perceptual approach a set of nine parameters has been chosen which should enable the author of an auditory scene to thoroughly describe the acoustic impression of a sound event. These are high level parameters which are based on human perception. As will be explained in the following, they are related to objective criteria which correspond to the characteristic features of an impulse response. The perceptual parameters can be divided into three groups: I ) Source-related attributes: 2) Room-related attributes: 3) SourceRoom interaction: SowcePresence, Sourre Warmth, and SourceBrilliance DlteReverberance, Heaviness, Liveness RoomPresencc. RunningReverl,erance, Envelopment The first group covers propelties which are directly connected to the source and thus to the impression of the direct sound this includes the amount of energy related to the direct sound (SonrcePresence. which gives the listener a clue about distance and directivity of the source) as well as the early amount of energy in low (SoarceWarmth) and high (SorrrceBrilliance) frequency bands. In the second group features of the surrounding acoustical space are put together. which cover the damping propenies during the late reverberation (LnteReverberance) and the relative damping properties for low (Heuvinrss) and high (Livencss) frequency bands. The third group contains parameters describing the behavior of a sound source within the room: the distribution of late energy in the room (RoomPrcsence), the early decay time (RuiiriingReverbernnce), and the early energy distribution within the room in relation to the direct sound (Enwlopment).