IEEE International Symposium on Consumer Electronics, 2004最新文献_第10页

Hard disk drive enhancements for consumer electronics products 消费类电子产品的硬盘驱动器增强功能

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1376015

D. Singh, V. lyer

The use of hard disk drives for storing video and gaming content is an emerging,field. There is a lack qfjield data regarding the drive reliability in this new application. The drive industry is experienced in designing drives for the desktop and server arena where error,fiee data is of paramount importance. The usage profile and associated error recovety is signipcantlv diferent ,for the consumer electronics industW. Factors affecting reliability and customer sati.Tfaction are the acoustics noise, reliabiliy at elevated temperattire and ability to prorect against shock and vibration while delivering interrupt ,pee content. The @ects of acoustics, thermak and mechanical design is reviewed. Methods OJ measuring sound pouer and judging sound quality is described. The changes in drive design,for the consumer electronics environment are also described. ATA: A T Attachment, DE: Digital Entertainment. ECC: Error Correction Code, DUT: Device Under Test, FDB: Fluid Dynamic Beuring, GB: Gigubyte, HDD: Hard Disk Drive. STB: Set TOQ B m . Index Terms HDDs far AV applications, AV performance, HDD reliability, Performance Evaluation.

使用硬盘驱动器来存储视频和游戏内容是一个新兴的领域。在这个新应用中，缺乏关于驱动器可靠性的qfield数据。驱动器行业在为桌面和服务器领域设计驱动器方面经验丰富，在这些领域，错误和数据是至关重要的。对于消费电子行业来说，使用概要和相关的错误恢复有很大的不同。影响可靠性和客户满意度的因素。它们是声学噪声，高温下的可靠性以及在提供中断，尿液含量时防止冲击和振动的能力。综述了声学、热学和机械设计方面的内容。介绍了测量声功率和判断音质的方法。在驱动设计的变化，为消费电子环境也描述。ATA: AT附件，DE:数字娱乐。ECC:纠错码，DUT:待测设备，FDB:流体动力学，GB:千兆字节，HDD:硬盘驱动器。STB:设置TOQ B m。索引术语HDD远AV应用，AV性能，HDD可靠性，性能评估。

引用次数: 0

A novel data fusion technique for imaging devices 一种用于成像设备的新型数据融合技术

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1375924

A. Castorina, A. Capra, A. Bruna, S. Battiato

1 A. Bruna, A. Capra and A. Castorina work at STMicroelectronicsAST Catania Lab Catania, Italy (email name.surname@st.com) 2 Sebastiano Battiato works at Universita di Catania, Dipartimento di Matematica ed Informatica Catania, Italy, (email: battiato@dmi.unict.it) Abstract — The paper presents a complete system for building an improved picture with greater high dynamic range by using different pictures of the same scene acquired under different exposure settings. The image data fusion is achieved by merging the original data weighting each single contribute on pixel basis by suitable data function. Experiments confirm the effectiveness of such approach.

1一个。2 Sebastiano Battiato来自意大利卡塔尼亚大学(Universita di Catania, Dipartimento di Matematica and Informatica Catania, Italy, e - mail: battiato@dmi.unict.it)摘要-本文提出了一个完整的系统，通过使用在不同曝光设置下获取的同一场景的不同图像来构建具有更高动态范围的改进图像。图像数据融合是通过合适的数据函数将原始数据加权后以像素为基础进行合并来实现的。实验验证了该方法的有效性。

引用次数: 1

Chroma error analysis and compensation for heterogeneous video transcoding 异构视频转码的色度误差分析与补偿

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1375962

Yu Liu

D@rent video coding standards implement motion cawpensation (MC) algorithm with fine d@zrcnce. This niiaiice may introdrice serioiis chroma signal error. in heterogeneoia video tran.scoding. I n t1ri.s paper. the Chroma Error Drift is tested and analwed. We also proposed a transcoding architectiiw to currect this error. According to o w 1e.V resrdt. this algorithai can settle the Chroma Ewor D f q i witk high qnality and ejficiency'. Chroma. I n d e x Terms -Transcoding, Motion Compensation,

D@rent视频编码标准实现了精细的运动补偿(MC)算法d@zrcnce。这种方法可能会引入严重的色度信号误差。在异构视频转码。我在这里。年代。对色度误差漂移进行了测试和分析。我们还提出了一个转码架构来纠正这个错误。根据我们的说法。V resrdt。该算法能较好地解决色度问题，具有较高的质量和效率。浓度。I和e x术语-转码，运动补偿，

引用次数: 1

A new bit estimation scheme for H.264 rate control 一种用于H.264速率控制的比特估计新方案

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1375976

Hongtao Yu, F. Pan, Zhiping Lin

~ Rate control is a critical isstre in ff.264 video coding standard. This paper aims at improving video qualit?; at .scene changes and high motions hv acccrrately estimating the target hits in H.264 rate control. We define a neu, measure. nameLv motion comple.rit?;, to represent the amount oJ' motion content.s between two consecutive frames. Mulion complexit?; i s closely correlated to the bits !ha/ have been allocated to the p?evious/v encoded frames. Based on motion comple.riv. we propose a new and simple scheme to estimate the target bits in rate control. Experimental results show that our bit estimation scheme can effectively reduce /he sharp drops of peak signal-to-noi.se ratio (PSNR) ai scene changes and high motions as compared with the H.264 proposal'. alphabetical order, separated by commas. Index Terms Ahnut four key words or phrases in

速率控制是ff的一个关键问题。264视频编码标准。本文旨在提高视频质量;在H.264的速率控制中，在场景变化和高速运动中可以准确估计目标命中。我们定义了一个新的度量。nameLv motion complete .rit?;，表示运动内容的数量。S在两个连续帧之间。Mulion complexit ?;I与已分配给p?视距/视距编码帧。基于运动complete .riv。我们提出了一种新的、简单的估计速率控制中目标比特的方案。实验结果表明，我们的比特估计方案可以有效地降低峰值信噪比的急剧下降。与H.264方案相比，PSNR在场景变化和高运动中的应用。按字母顺序，以逗号分隔。索引术语包含四个关键词或短语

引用次数: 9

A room acoustics design tool for MPEG-4 conforming scene design 一个房间声学设计工具，用于符合MPEG-4的场景设计

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1375903

U. Reiter, F. Korner, M. Kootz, S. Ruffer

The,following paper disciisses issiies regarding scene composifion Jor 3D interactive aiidiovisiial applicafion systems. These sy fems genei-al/,v provide auditopfeedhack to flte iaer. so that a virtual room can be e-rperienced hoth visriall~ and acowficall,v. Uiifil now, the desgrt prucess ($the room acoiistic propei-tie.7 has always been separatedftom the process of vistral modelling of the room, main@ hecallst! the fools that scene de.signers normally iise do not provide the necessaiy feuhires Jar on acoustic descripfion. The work presented here is a snccessfiil approach fo solve this problem. It simp/,/;e.s the modelling task while consideidly shortening tile time iiecessar?, 10 &sign sirch scenes.' Index Terms Acoustic Design, Acoustic Texturing Tool, M P E G 4 AudioBIFS, Virtual Acoustics.

本文主要讨论了场景合成和三维交互式可视化应用系统的相关问题。这些系统一般都提供审计反馈给用户。因此，一个虚拟的房间可以在视觉上和视觉上进行体验。到目前为止，设计过程($房间声学属性)。7 .一向分离的过程中对房间的观景造型，主要是@他!场景设计签名者通常不会在声音描述上提供必要的功能。本文提出的工作是解决这一问题的一种成功方法。笨人/,/;e。在考虑缩短时间的同时，建模任务是否必要?10、签名搜索场景。声学设计，声学纹理工具，mpg4audiobifs，虚拟声学。

引用次数: 1

Rate adaptation with hybrid ARQ based on cross layer information for satellite communication systems 卫星通信系统中基于跨层信息的混合ARQ速率自适应

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1375927

Sunheui Ryoo, Sooyoung Kim, D. Ahn

We propose an efficient hvbrid ARQ scheme bused on cross laver information for satellite commzmication systems, and estimate its pei:fiirmance. We can improve tlie sy.~tem eJjiciency by zrsing adaptive rule code transmission with cross layer iiqbrmation. The ruff adaptation scheme improves pouer

提出了一种基于交叉信息的卫星通信系统混合ARQ方案，并对其可靠性进行了估计。我们可以改善飞行系统。采用自适应规则码跨层传输技术提高了系统效率。波纹自适应方案提高了功率效率，同时保持了最小的数据包开销。通过考虑这些。我们将混合ARQ与自适应编码方案相结合以提高性能。所提出的具有跨层信息和高效i -传输的Izylirid A - R - Q方案极大地提高了卫星通信系统的传输性能。索引术语混合ARQ, Rata兼容代码，跨层信息。

引用次数: 3

Description of audiovisual virtual 3D scenes: MPEG-4 perceptual parameters in the auditory domain 视听虚拟3D场景的描述:听觉域的MPEG-4感知参数

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1375910

A. Dantele, U. Reiter

A high level of immei-siwi cart he provided f o r the user of virtrial irudiovis~rul erivirorimeiits when sorrnd and visuirl irnlwessiori get coordinated on 11 high quality level. Tlierefore. a coniprehensive scene rlescription Iangrruge is rieeiieil for both. the auditory and the visital purt. The mrrltiriredia sr(rfiilard MPEG-4 provides a powerfiil tool-set f o r the sceiie decription of 2D ond SD virtiiiil environments. fiir the undio part, apart from a coriventiorial yhysicul description, u novel approach is available which is based on perceptual pnrameters which hove been derived from psycho-acoustic e.rperirnents. The practical qualijcarion of this method is discussed when applied to auditory und audiovisual 30 scenes. Enhancements of-e proposed to an cxample application of the perceptrial upproacli which is included in the MPEG-4 stanrlurd arid an implementution f o r 30 rrrrdio rendering is introduced. Index Terms Auditory Scene Description, MPEG-4, Perceptual Parameters, Virtual Acoustics 1. AUDIOVISUAL SCENE DESCRIPTION I MPEG-4 Moving Picture Expens Group T E P E G ) has established novel approaches for the coding of multimedia content in the international standard MPEG-4. Auditory, visual and other content is subdivided into media objects which together build a 2D or 3D scene. Thus the most efficient coding scheme for each object can be chosen according to its type of media, e.g. video, audio, graphics, etc. [I]. For the combination of the objects MPEG-4 provides a powerful tool-set for scene description, the so-called BIFS (Binary Format for Scene Description) [ 2 ] . Here all the elements describing media objects and their properties are put together as nodes in a scene graph. The resulting structure reflects the mutual dependency of the single objects. This concept is based on the scene graph of the Virtual Reality Modeling Language (VRML) standard [3]. The audio part of this scene description (AudioBIFSj allows to specify the behavior of sound emitting objects in the scene (e.g. their position, level, directivity). These basic fuuctionalities have been extended in version 2 of AudioBIFS where new nodes. mainly for virtual acoustics in a 3D ‘This work was conductcd in the research group IAVAS (Intcmztivu AudioVirual Application Systmmr) which i s funded by lhr Thuringim Minisuy 01 Scicnce. Resrmh and thc Ans. Erlun. Germany. Andrcns Dnnlele and Ulnch Rcitcr are with 1hr Institute of Media Technology at 1he Technischc Univcrsicil Ilmcnau. 0.98614 Ilmenau. Germany (e-mail: andruas.dantrIr@lu-iImennu.dc. uhch.ruitcr@tuilnlmnu.duJ. environment, have been added [4]. These are often referred to as Advanced AudioBlFS (AABIFS) and are of main interest for the work described here. In general, the auralization of virtual scenes not only has to reproduce sound sources which are placed in the scenery but also to add ambient sound effects like reverberation. Thus the user can feel the surrounding virtual space by listening to the acoustic cues

当声音和视觉信息在11个高质量的水平上得到协调时，为用户提供了高水平的实时视觉体验。Tlierefore。全面的场景描述愤怒对两者都是有益的。听觉部分和视觉部分。多媒体标准和MPEG-4为科学地描述2D和SD虚拟环境提供了一个强大的工具集。在音频部分，除了传统的物理描述外，还提出了一种基于从心理声学实验中获得的感知参数的新方法。讨论了该方法应用于听觉和视听场景时的实用性。对MPEG-4标准中包含的感知提升方法的一个应用实例进行了改进，并介绍了一种30倍频的无线渲染的实现。索引术语听觉场景描述，MPEG-4，感知参数，虚拟声学视听场景描述MPEG-4运动图像费用组(MPEG-4)在国际标准MPEG-4中为多媒体内容的编码建立了新的方法。听觉、视觉和其他内容被细分为媒体对象，共同构建2D或3D场景。因此，可以根据每个对象的媒体类型选择最有效的编码方案，例如视频、音频、图形等[1]。对于对象的组合，MPEG-4提供了一个强大的场景描述工具集，即所谓的BIFS(场景描述二进制格式)[2]。在这里，描述媒体对象及其属性的所有元素都作为场景图中的节点放在一起。得到的结构反映了单个对象的相互依赖性。这个概念是基于虚拟现实建模语言(Virtual Reality Modeling Language, VRML)标准的场景图[3]。场景描述的音频部分(AudioBIFSj)允许指定场景中声音发射对象的行为(例如它们的位置，水平，方向性)。这些基本功能在AudioBIFS版本2中得到了扩展，其中新增了节点。这项工作是在IAVAS (Intcmztivu audiovirtual Application system，视听应用系统)研究小组进行的，该小组由图林海姆省01科学部资助。Resrmh and the Ans. Erlun。德国。andrns dnnele和ulncitr与1工业大学媒体技术研究所，0.98614 Ilmenau。德国(电子邮件:andruas.dantrIr@lu-iImennu.dc)。uhch.ruitcr@tuilnlmnu.duJ。环境，已被添加[4]。这些通常被称为高级audiblfs (AABIFS)，并且是本文描述的工作的主要兴趣所在。一般来说，虚拟场景的听觉化不仅需要重现场景中的声源，还需要添加混响等环境音效。因此，用户可以通过听声音线索来感受周围的虚拟空间。在真实的房间里，听众的听觉印象可以用脉冲反应来完全描述。这表示在特定房间中给定的一对声源和听者在其特定位置的声学传递函数[5]。当将非混响源信号与脉冲响应进行卷积时。输出产生混响的声音信号。结果听起来像是在房间的特定位置感知到的，在那里脉冲响应被记录下来。任何脉冲响应的特征特征都可以从其特征分量的时间分布和能量含量中提取出来，这些特征分量是直接声、早期反射声和晚期混响声。对于虚拟空间，听觉化任务可以定义为这些组件的综合，以模拟人工脉冲响应。主要问题是如何从虚拟场景描述中得到一个合适的参数集来合成期望的脉冲响应。当使用Advanced AudioBIFS时，虚拟场景的作者可以在两种不同的听觉建模方法中进行选择:(I)基于声源的频率依赖指向性和材料的吸收系数等物理属性，可以指定他喜欢的声音印象。(2)另一种方法是基于心理声学参数，它表达了用户感知到的声学感觉。因此，一套基于心理实验研究的知觉参数被引入。11. MPEG-4的感知方法这种方法相当具有挑战性，因为范围转向了用户，从而转向了人类感官的感知。因此，即使没有多少声音传播或电声学的理论知识，也必须找到既能满足人类需求，又易于解释和理解的参数。虽然心理声学中的听觉场景分析领域已经得到了很好的阐述，但尚未出现一种广泛传播的主观属性语言(但讨论仍在进行，例如[7])。尽管如此，MPEG-4感知方法的基础技术已经被使用，例如用于创建虚拟声学空间[11]。我们想仔细研究一下MPEG-4标准中给出的感知方法，因为它特别适用于视听环境:视听应用程序的再现通常会耗尽处理能力，而视觉和图形处理消耗了大部分处理能力。这对于必须实时响应每个新用户需求的交互式系统来说尤其如此。因此，对于听觉部分的渲染，有时不可能非常详细地描述声源或虚拟声学中反射的频率依赖行为。因此。而不是一个明确的，因此可测量的描述，不能满足，感性的描述，强调主观质量的感知的声音感觉似乎更合适。答:Peir。在MPEG-4感知方法中，选择了一组9个参数，这些参数应使听觉场景的作者能够彻底描述声音事件的声学印象。这些是基于人类感知的高级参数。正如下面将解释的那样，它们与与脉冲响应的特征相对应的客观标准有关。感知参数可分为三组:1)与源相关的属性;2)与房间相关的属性;3)与源交互:源的存在感、源的温暖度和源的亮度;第一组涵盖了与声源直接相关的属性，因此与直接声音的印象有关，这包括与直接声音相关的能量量(SonrcePresence)。这给听者提供了关于源的距离和方向性的线索，以及低(soarcewarm)和高(SorrrceBrilliance)频段的早期能量。在第二组中，周围声学空间的特征被放在一起。它涵盖了晚混响(intertereverance)期间的阻尼特性以及低(Heuvinrss)和高(Livencss)频段的相对阻尼特性。第三组包含描述房间内声源行为的参数:房间内晚期能量分布(roomprence)，早期衰减时间(ruiiriingreverbernence)，以及与直接声音相关的房间内早期能量分布(enwrap)。

{"title":"Description of audiovisual virtual 3D scenes: MPEG-4 perceptual parameters in the auditory domain","authors":"A. Dantele, U. Reiter","doi":"10.1109/ISCE.2004.1375910","DOIUrl":"https://doi.org/10.1109/ISCE.2004.1375910","url":null,"abstract":"A high level of immei-siwi cart he provided f o r the user of virtrial irudiovis~rul erivirorimeiits when sorrnd and visuirl irnlwessiori get coordinated on 11 high quality level. Tlierefore. a coniprehensive scene rlescription Iangrruge is rieeiieil for both. the auditory and the visital purt. The mrrltiriredia sr(rfiilard MPEG-4 provides a powerfiil tool-set f o r the sceiie decription of 2D ond SD virtiiiil environments. fiir the undio part, apart from a coriventiorial yhysicul description, u novel approach is available which is based on perceptual pnrameters which hove been derived from psycho-acoustic e.rperirnents. The practical qualijcarion of this method is discussed when applied to auditory und audiovisual 30 scenes. Enhancements of-e proposed to an cxample application of the perceptrial upproacli which is included in the MPEG-4 stanrlurd arid an implementution f o r 30 rrrrdio rendering is introduced. Index Terms Auditory Scene Description, MPEG-4, Perceptual Parameters, Virtual Acoustics 1. AUDIOVISUAL SCENE DESCRIPTION I MPEG-4 Moving Picture Expens Group T E P E G ) has established novel approaches for the coding of multimedia content in the international standard MPEG-4. Auditory, visual and other content is subdivided into media objects which together build a 2D or 3D scene. Thus the most efficient coding scheme for each object can be chosen according to its type of media, e.g. video, audio, graphics, etc. [I]. For the combination of the objects MPEG-4 provides a powerful tool-set for scene description, the so-called BIFS (Binary Format for Scene Description) [ 2 ] . Here all the elements describing media objects and their properties are put together as nodes in a scene graph. The resulting structure reflects the mutual dependency of the single objects. This concept is based on the scene graph of the Virtual Reality Modeling Language (VRML) standard [3]. The audio part of this scene description (AudioBIFSj allows to specify the behavior of sound emitting objects in the scene (e.g. their position, level, directivity). These basic fuuctionalities have been extended in version 2 of AudioBIFS where new nodes. mainly for virtual acoustics in a 3D ‘This work was conductcd in the research group IAVAS (Intcmztivu AudioVirual Application Systmmr) which i s funded by lhr Thuringim Minisuy 01 Scicnce. Resrmh and thc Ans. Erlun. Germany. Andrcns Dnnlele and Ulnch Rcitcr are with 1hr Institute of Media Technology at 1he Technischc Univcrsicil Ilmcnau. 0.98614 Ilmenau. Germany (e-mail: andruas.dantrIr@lu-iImennu.dc. uhch.ruitcr@tuilnlmnu.duJ. environment, have been added [4]. These are often referred to as Advanced AudioBlFS (AABIFS) and are of main interest for the work described here. In general, the auralization of virtual scenes not only has to reproduce sound sources which are placed in the scenery but also to add ambient sound effects like reverberation. Thus the user can feel the surrounding virtual space by listening to the acoustic cues","PeriodicalId":169376,"journal":{"name":"IEEE International Symposium on Consumer Electronics, 2004","volume":"30 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124616184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Tied mixture modeling optimization for Korean-digit in the embedded ASR system 嵌入式ASR系统中韩文数字的捆绑混合建模优化

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1376017

Kihyeon Kim, Hanseok Ko

In the embedded Automatic Speech Recognition (ASR) system, Semi-Contimrorrs Hidrlen Markov Model (SCHMM) or Tierf-Mi.rtirre (TM) model is one of tlie most promisirig acoustic modeling metliods that solve the size problem of the existing Continirons Hirldcri Markov Model (CHMM) while minimizirig the recognition peifiinnancr rlegra(iation. Moreover. f o r a geiierul isolated n,ord task, coiite.rt rlepenrlent nior1el.v sirch us tri-phones are nsed to guarantee high recognition performance of the embedded sy tem. However. tu nse the models constructed only in this way alone cannot be siifJicienr to render improved recognition rate in Korean-digit speech task 4 w r e a lurge niirtrral similarin e.rists. Hence. w e consfrnct new deilicated HMM ' S f o r all or parts of Korean-digit that has exclusive srafes using the same Gaussian pool of previoirs tri-phone mode1.s. This remedial actiori allows rlie strncture qf entire HMM.s maintained while minimizing the occupied memory space. Representative esperiments are rrpecred to reduce worderror-rate on the Korean-digit task by about 56% in enniporison with using only general rr-plione models. ' Mixture Model, Embedded ASR System. Index Terms Exclusive HMM's, Korean Digits, Tied

在嵌入式自动语音识别(ASR)系统中，半自适应的Hidrlen Markov模型(SCHMM)或Tierf-Mi。TM模型解决了现有连续马尔可夫模型(Continirons Hirldcri Markov model, CHMM)的尺寸问题，同时使识别误差最小化，是目前最有前途的声学建模方法之一。此外。对于一个非常孤立的词，一个任务，一个集合。我很后悔。为了保证嵌入式系统的高识别性能，采用了V搜索和三手机。然而。但是，仅以这种方式构建的模型并不足以提高韩语数字语音任务的识别率。因此。我们使用以前的三部手机模型的相同高斯池，为所有或部分具有独家结构的韩国数字构建新的精细HMM ' S。这种补救措施使整个HMM的结构更加稳定。S维护，同时尽量减少占用的内存空间。有代表性的实验表明，与只使用一般rr-plione模型相比，韩语数字任务的错误率将降低约56%。混合模型，嵌入式ASR系统。索引术语专有HMM，韩国数字，并列

{"title":"Tied mixture modeling optimization for Korean-digit in the embedded ASR system","authors":"Kihyeon Kim, Hanseok Ko","doi":"10.1109/ISCE.2004.1376017","DOIUrl":"https://doi.org/10.1109/ISCE.2004.1376017","url":null,"abstract":"In the embedded Automatic Speech Recognition (ASR) system, Semi-Contimrorrs Hidrlen Markov Model (SCHMM) or Tierf-Mi.rtirre (TM) model is one of tlie most promisirig acoustic modeling metliods that solve the size problem of the existing Continirons Hirldcri Markov Model (CHMM) while minimizirig the recognition peifiinnancr rlegra(iation. Moreover. f o r a geiierul isolated n,ord task, coiite.rt rlepenrlent nior1el.v sirch us tri-phones are nsed to guarantee high recognition performance of the embedded sy tem. However. tu nse the models constructed only in this way alone cannot be siifJicienr to render improved recognition rate in Korean-digit speech task 4 w r e a lurge niirtrral similarin e.rists. Hence. w e consfrnct new deilicated HMM ' S f o r all or parts of Korean-digit that has exclusive srafes using the same Gaussian pool of previoirs tri-phone mode1.s. This remedial actiori allows rlie strncture qf entire HMM.s maintained while minimizing the occupied memory space. Representative esperiments are rrpecred to reduce worderror-rate on the Korean-digit task by about 56% in enniporison with using only general rr-plione models. ' Mixture Model, Embedded ASR System. Index Terms Exclusive HMM's, Korean Digits, Tied","PeriodicalId":169376,"journal":{"name":"IEEE International Symposium on Consumer Electronics, 2004","volume":"79 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130666498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Web-based query engine for content-based and semantic retrieval of audio 基于web的基于内容和语义的音频检索查询引擎

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1375994

Mustafa Sert, B. Baykal

In this stirdv, we have developed a iveh-hased qiier?, engine called AudinCBR to enahle conrenr-bused arid semunlic queries 011 midito? data. The interface is ahle to uncwer the Qitery-by-E.rample (QBE) and te.rtrra1 qiieries us in truditional Information Retrieval (IR) s,vsterns. Tlie relevancef~edhack. which is missing in man,v similar .sv.stem.s. is also covered in oiir system. hi QBE queries, matching process is pef:formed based on pre-.selected law-level andio .featitres. which are standardized in MPEG-7 &rts. Semantic qireries are perfirmed in thefbrm c~/te.rtiial qneries bv tising /he ohjecl and event concepts, as i d 1 u.s their temporul and conceptid relationships ,for an audio. The originality of o w appruach 10 /he retrieval r-dies on the provided iiser inrerfhce, the,/orm o/descrip/ion, and the utilized data model. The mer intet-firce is an important aspect in emerging ,fields like audio IR to retrieve desired elements. Thewfore, we de.scrihe snme new graphical iiser interfaces /hat accommodate d(fferen/ modes of intermtion witli the iiser. I n nrder to e-rtrnct the andio .semantics and low-level .fiu/nre.s ,ftonr uti aiidio. an unllofutiorl tool i s int!-odlrced. Finalll’. %re gi1.e e.runl/,le.~ I?/ seninntic qnerie.s that oiir .s,v.stem .srippnrt.v’. Index Terms Audio annotation tool, audio data model, MPEC-7, query interface for audio-IR.

在这种情况下，我们开发了一种五层结构的qiier?，称为auditcbr的引擎，以启用基于并发的语义查询。数据。该接口可以按字母顺序打开。样本(QBE)和。rtrra1在传统的信息检索(IR)系统中使用。外国的relevancef ~ edhack。这在人类中是不存在的。也覆盖在石油系统中。在QBE查询中，匹配过程是基于pre-形成的。选定的法律级和音频功能。在MPEG-7和rts中标准化。语义查询在数据库中执行。通过组织物体和事件概念来进行初始查询，因为我认为它们的时间和概念关系是音频的。该方法的原创性在于提供的检索器、描述的形式和所使用的数据模型。在音频信息检索等新兴领域中，集成接口是检索所需元素的一个重要方面。因此，我们描述了一些新的图形界面，以适应不同的交互模式。它试图连接音频语义和低级的。fiu/nre。S，我的意思是，我的意思是。这是一种无用的工具。Finalll”。% gi1。e.run / run / run / run / run / run / run / run / run / run / run ?/敏感的问题。S, S,v。茎.srippnrt.v”。音频标注工具，音频数据模型，MPEC-7, Audio - ir查询接口。

{"title":"Web-based query engine for content-based and semantic retrieval of audio","authors":"Mustafa Sert, B. Baykal","doi":"10.1109/ISCE.2004.1375994","DOIUrl":"https://doi.org/10.1109/ISCE.2004.1375994","url":null,"abstract":"In this stirdv, we have developed a iveh-hased qiier?, engine called AudinCBR to enahle conrenr-bused arid semunlic queries 011 midito? data. The interface is ahle to uncwer the Qitery-by-E.rample (QBE) and te.rtrra1 qiieries us in truditional Information Retrieval (IR) s,vsterns. Tlie relevancef~edhack. which is missing in man,v similar .sv.stem.s. is also covered in oiir system. hi QBE queries, matching process is pef:formed based on pre-.selected law-level andio .featitres. which are standardized in MPEG-7 &rts. Semantic qireries are perfirmed in thefbrm c~/te.rtiial qneries bv tising /he ohjecl and event concepts, as i d 1 u.s their temporul and conceptid relationships ,for an audio. The originality of o w appruach 10 /he retrieval r-dies on the provided iiser inrerfhce, the,/orm o/descrip/ion, and the utilized data model. The mer intet-firce is an important aspect in emerging ,fields like audio IR to retrieve desired elements. Thewfore, we de.scrihe snme new graphical iiser interfaces /hat accommodate d(fferen/ modes of intermtion witli the iiser. I n nrder to e-rtrnct the andio .semantics and low-level .fiu/nre.s ,ftonr uti aiidio. an unllofutiorl tool i s int!-odlrced. Finalll’. %re gi1.e e.runl/,le.~ I?/ seninntic qnerie.s that oiir .s,v.stem .srippnrt.v’. Index Terms Audio annotation tool, audio data model, MPEC-7, query interface for audio-IR.","PeriodicalId":169376,"journal":{"name":"IEEE International Symposium on Consumer Electronics, 2004","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115292926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Speech interactive agent system for car navigation using embedded ASR/TTS and DSR 基于嵌入式ASR/TTS和DSR的汽车导航语音交互智能体系统

IEEE International Symposium on Consumer Electronics, 2004

Pub Date : 2004-09-01 DOI: 10.1109/ISCE.2004.1376022

Heun-Ji Lee, O. Kwon, Hanseok Ko

This paper presents an efficient speech interactive agent rendering smooth car navigation and Telematics scrvices. by employing embedded automatic speech recognition (eASR), distributed speech recognition (DSR) and cmbeddcd text-to-speech (eTTS) modulcs, all while enabling safe driving. A speech interactive agcnt is essentially a conversational tool providing command and control functions to drivers such as enabling navigation task, audiolvideo manipulation, and E-commerce services through natural voicclresponse interactions between user and interface. To cope with the multiplc random inputs from mute buttons, hands-free buttons, push-to-talk buttons and events occurred by service applications on car navigation system, this provides resource ncgotiation rulcs using priority control based on inter-process communication, speech intcractivc helper function, multi-thread process and cxception handling. In addition, involved hardware resources are often limitcd and intemal comniunication protocols are complex to achieve real time responses. Thus, the hardware dependent architectural and algorithmic code optimization is applied to improve the perfomlance. The proposed system is tested and optimized on real car environments '. Index Terms About four key words o r phrases in alphabetical order, separated by commas.

本文提出了一种高效的语音交互智能体，可实现流畅的汽车导航和车载信息处理服务。通过采用嵌入式自动语音识别(eASR)、分布式语音识别(DSR)和混合文本到语音(ts)模块，同时实现安全驾驶。语音交互代理本质上是一种会话工具，为驱动程序提供命令和控制功能，例如通过用户和界面之间的自然语音响应交互启用导航任务、音视频操作和电子商务服务。针对车载导航系统中静音键、免提键、一键通键和服务应用程序发生的事件的多plc随机输入，提出了基于进程间通信优先级控制、语音交互辅助功能、多线程进程和异常处理的资源协商规则。此外，所涉及的硬件资源往往是有限的，内部通信协议是复杂的，以实现实时响应。因此，采用硬件相关的体系结构和算法代码优化来提高性能。该系统在实际汽车环境中进行了测试和优化。索引词大约四个关键词或短语按字母顺序排列，以逗号分隔。

{"title":"Speech interactive agent system for car navigation using embedded ASR/TTS and DSR","authors":"Heun-Ji Lee, O. Kwon, Hanseok Ko","doi":"10.1109/ISCE.2004.1376022","DOIUrl":"https://doi.org/10.1109/ISCE.2004.1376022","url":null,"abstract":"This paper presents an efficient speech interactive agent rendering smooth car navigation and Telematics scrvices. by employing embedded automatic speech recognition (eASR), distributed speech recognition (DSR) and cmbeddcd text-to-speech (eTTS) modulcs, all while enabling safe driving. A speech interactive agcnt is essentially a conversational tool providing command and control functions to drivers such as enabling navigation task, audiolvideo manipulation, and E-commerce services through natural voicclresponse interactions between user and interface. To cope with the multiplc random inputs from mute buttons, hands-free buttons, push-to-talk buttons and events occurred by service applications on car navigation system, this provides resource ncgotiation rulcs using priority control based on inter-process communication, speech intcractivc helper function, multi-thread process and cxception handling. In addition, involved hardware resources are often limitcd and intemal comniunication protocols are complex to achieve real time responses. Thus, the hardware dependent architectural and algorithmic code optimization is applied to improve the perfomlance. The proposed system is tested and optimized on real car environments '. Index Terms About four key words o r phrases in alphabetical order, separated by commas.","PeriodicalId":169376,"journal":{"name":"IEEE International Symposium on Consumer Electronics, 2004","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116290379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

4
5
6
7
8
9
10