5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

英文中文

Using automatic speech recognition and its possible effects on the voice 使用自动语音识别及其对声音可能产生的影响

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-778

C. D. Bruijn, S. Whiteside, P. Cudd, D. Syder, K. Rosen, L. Nord

Literature and individual reports contain indications that the use of speech recognition based human computer interfaces could potentially lead to vocal fatigue, or even to symptoms associated with dysphonia. While more and more people opt for a speech driven computer interface as an alternative input method to the keyboard, and these speech recognition systems become more and widely used, both in the home and office environment, it has become necessary to qualify any potential risks of voice damage. This study reports about ongoing research that investigates acoustic changes in the voice, after use of a discrete speech recognition system. Acoustic analyses were carried out on two Swedish users of such a system. So far, for one of the users, two of the acoustic parameters under investigation that could be an indicator of vocal fatigue, show a significant difference directly before and after use of a speech recognition system.

文献和个别报告都有迹象表明，使用基于语音识别的人机界面可能会导致声音疲劳，甚至出现与发声障碍相关的症状。随着越来越多的人选择语音驱动的计算机界面作为键盘的另一种输入法，这些语音识别系统在家庭和办公室环境中得到越来越广泛的应用，有必要对语音损伤的任何潜在风险进行评估。本研究报告了一项正在进行的研究，该研究调查了使用离散语音识别系统后声音的声学变化。对这种系统的两个瑞典用户进行了声学分析。到目前为止，对于其中一位用户，正在调查的两个声学参数可能是声音疲劳的指标，在使用语音识别系统之前和之后直接显示出显著的差异。

引用次数: 0

The intellimedia workbench - a generic environment for multimodal systems 智能媒体工作台——多模式系统的通用环境

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-266

T. Brøndsted, Lars Bo Larsen, Michael Manthey, P. Kevitt, T. Moeslund, Kristian G. Olesen

The present paper presents a generic environment for intelligent multi media applications, denoted “The Intellimedia Work-Bench”. The aim of the workbench is to facilitate development and research within the field of multi modal user interaction. Physically it is a table with various devices mounted above and around. These include: A camera and a laser projector mounted above the workbench, a microphone array mounted on the walls of the room, a speech recogniser and a speech synthesiser. The camera is attached to a vision system capable of locating various objects placed on the workbench. The paper presents two applications utilising the workbench. One is a campus information system, allowing the user to ask for directions within a part of the university campus. The second application is a pool trainer, intended to provide guidance to novice players.

本文提出了一种智能多媒体应用的通用环境，称为“智能多媒体工作台”。工作台的目的是促进多模态用户交互领域的开发和研究。物理上，它是一张桌子，上面和周围安装着各种设备。这些设备包括:安装在工作台上方的摄像头和激光投影仪，安装在房间墙壁上的麦克风阵列，语音识别器和语音合成器。摄像机连接到一个视觉系统，该系统能够定位放置在工作台上的各种物体。本文介绍了利用该工作台的两个应用。一个是校园信息系统，允许用户在大学校园的一部分内问路。第二个应用程序是泳池教练，旨在为新手提供指导。

引用次数: 0

Dynamical spectrogram, an aid for the deaf 动态谱图，聋哑人的辅助工具

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-788

A. Soltani-Farani, E. Chilton, R. Shirley

Visual perception of speech through spectrogram reading has long been a subject of research, as an aid for the deaf or hearing im-paired. Attributing the lack of success in this type of visual aids mainly to the static form of information presented by the spectrograms, this paper proposes a system of dynamic visualisation for speech sounds. This system samples a high resolved, auditory-based spectrogram, with a window of 20 milliseconds duration, so that exploiting the periodicity of the input sound, it produces a phase-locked sequence of images. This sequence is then animated at a rate of 50 images per second to produce a movie-like image displaying both the time-varying and time-independent information of the underlying sound. Results of several preliminary experiments for evaluation of the potential usefulness of the system for the deaf, undertaken by normal-hearing subjects, support the quick learning and persistence of the gestures for small sets of single words and motivate further investigations.

通过谱图读取语音的视觉感知作为聋人或听障人士的辅助手段，一直是研究的课题。由于这种类型的视觉辅助工具缺乏成功，主要是由于频谱图所呈现的静态信息形式，本文提出了一种语音动态可视化系统。该系统采样高分辨率、基于听觉的频谱图，窗口持续时间为20毫秒，因此利用输入声音的周期性，它产生锁相序列的图像。这个序列然后以每秒50张图像的速度动画，以产生一个类似电影的图像，显示潜在声音的时变和不依赖于时间的信息。由听力正常的受试者进行的几项初步实验评估了该系统对聋人的潜在有用性，结果支持对小组单个单词的快速学习和持续的手势，并激发了进一步的研究。

引用次数: 2

The acoustic and perceptual features of tone in the tibeto-burman language ao naga 藏缅语阿那迦语声调的声学与感性特征

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-102

A. Coupe

The tonemes of the Waromung Mongsen dialect of Ao Naga, a Tibeto-Burman of northeast India, are described with respect to their auditory and acoustic features. Even though rather small FO differences are found to separate each contrasting toneme, the results of a perception test nevertheless demonstrate that these small differences are perceptually salient to a native speaker and are readily identifiable. Each two demonstrate varying degrees of phonological, morphological and lexical divergence. A preliminary survey suggests that every village speaks its own variety; native speakers report that the unique village-specific characteristics of each variety serve as shibboleths to identify their speakers’ origin. Tonal across

奥那迦语是印度东北部的一种藏缅语，本文描述了奥那迦语Waromung Mongsen方言的音素及其听觉和声学特征。尽管发现了相当小的音素差异来区分每个对比音素，但感知测试的结果表明，这些小差异对母语人士来说在感知上是显著的，并且很容易识别。这两种语言都表现出不同程度的语音、形态和词汇差异。初步调查表明，每个村庄都有自己的方言;以母语为母语的人报告说，每个品种的独特的村庄特有的特征作为识别其使用者来源的准则。色调在

引用次数: 11

On the effects of speech rate upon parameters of the command-response model for the fundamental frequency contours of speech 语音率对语音基频轮廓命令响应模型参数的影响

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-131

S. Ohno, H. Fujisaki, Yoshikazu Hara

A command-response model for the process of F0 contour generation has been presented by Fujisaki and his coworkers. The present paper describes the results of a study on the variabilty and speech rate dependency of the model’s parameters in utterances of a speaker of Japanese. It was found that parameters α and β can be considered to be practically constant at a given speech rate, while Fb may vary slightly from utterance to utterance. Among these three parameters, only α was found to have a small but systematic tendency to increase with the speech rate.

Fujisaki和他的同事提出了F0轮廓生成过程的命令响应模型。本文描述了一个日语说话者话语中模型参数的变异性和语速依赖性的研究结果。结果表明，在给定的语速下，参数α和β几乎是恒定的，而参数Fb在不同的语速下变化不大。在这三个参数中，只有α随语速的增加有一个小而系统的趋势。

引用次数: 4

Lexical activation by assimilated and reduced tokens 同化词块和缩减词块对词汇的激活作用

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-440

M. L. Kelly, E. Bard, Catherine Sotillo

Running speech contains abundant assimilated and phonologically reduced tokens, but there is considerable debate about how such varied pronunciations disrupt access to the corresponding words in listeners’ mental lexicons. While previous studies have examined the effects of carefully produced or electronically edited reductions, we present two experiments which compare cross-modal repetition priming for lexical decision by more reduced spontaneous forms and less reduced read forms of the same words uttered by the same speakers in the same phrases. Though less priming is found for the more reduced spontaneous tokens, both versions of words produce significant priming effects, whether the majority of stimuli are taken from spontaneous speech (Experiment 1) or from read speech (Experiment 2). Priming is more robust if the tokens themselves contain the context licensing the reduction.

行云流水般的语音中包含大量同化和语音还原的词块，但关于这些不同的发音如何干扰听者心理词典中相应词汇的获取，还存在相当大的争议。以往的研究考察了经过精心制作或电子编辑的还原形式的效果，而我们的两项实验则比较了同一说话者在同一短语中说出的相同词语的还原程度较高的自发形式和还原程度较低的朗读形式对词汇决定的跨模态重复引物。虽然缩小的自发词条的引诱作用较弱，但无论刺激物大多来自自发言语（实验 1）还是阅读言语（实验 2），两种版本的词语都能产生显著的引诱效应。如果词块本身包含授权缩减的上下文，那么引物效应就会更强。

引用次数: 3

Towards a Chinese text-to-speech system with higher naturalness 迈向自然度更高的中文文本转语音系统

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-47

Ren-Hua Wang, Qingfeng Liu, Yongsheng Teng, Deyu Xia

This paper presents our research efforts on Chinese text-to-speech towards higher naturalness, the main results can be summarized as follows: 1. In the proposed TTS system the syllable-sized units were cut out from the real recorded speech, the synthetic speech was generated by concatenating these units back together. 2. The integration of units synthesized by rules with natural units was tested. A LMA filter based synthesizer was developed successfully to test and generate those units, which were difficult to be collected from the speech corpus. 3. A new efficient Chinese character coding scheme - "Yin Xu Code"(YX Code) has been developed to assist the GB Code. Based on above results, a Chinese text-to-speech system named as "KD-863" has been developed. In the national assessment of Chinese TTS systems held at the end of March 1998 in Beijing, the system achieved a first of the naturalness MOS (Mean Opinion Score).

本文介绍了我们在汉语文本到语音的更高自然度方面所做的研究工作，主要成果可以总结如下:1.中文文本到语音的自然度。在提出的TTS系统中，从实际录制的语音中截取音节大小的单元，通过将这些单元连接在一起生成合成语音。2. 对规则合成单元与自然单元的积分进行了测试。开发了一种基于LMA滤波器的合成器，成功地测试和生成了难以从语音语料库中收集到的单元。3.一种新的高效的汉字编码方案——“殷墟码”(YX码)已被开发出来，以辅助GB码。在此基础上，开发了一个中文文本转语音系统“KD-863”。在1998年3月底在北京举行的中国TTS系统国家评估中，该系统获得了自然度MOS(平均意见得分)的第一名。

引用次数: 4

A linguistic and prosodic database for data-driven Japanese TTS synthesis 数据驱动日语TTS合成的语言韵律数据库

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-57

A. Sakurai, Takashi Natsume, K. Hirose

We propose a method to generate a database that contains a parametric representation of F0 contours associated with linguistic and acoustic information, to be used by data-driven Japanese text-to-speech (TTS) systems. The configuration of the database includes recorded speech, F0 contours and their parametric labels, phonetic transcription with durations, and other linguistic information such as orthographic transcription, part-of-speech (POS) tags, and accent types. All information that is not available by dictionary lookup is obtained automatically. In this paper, we propose a method to automatically obtain parametric labels that describe F0 contours based on a superpositional model. Preliminary tests on a small data set show that the method can find the parametric representation of F0 contours with acceptable accuracy, and that accuracy can be improved by introducing additional linguistic information.

我们提出了一种方法来生成一个数据库，其中包含与语言和声学信息相关的F0轮廓的参数表示，用于数据驱动的日语文本到语音(TTS)系统。数据库的配置包括记录的语音、F0轮廓及其参数标签、带持续时间的语音转录，以及其他语言信息，如正字法转录、词性(POS)标记和重音类型。通过字典查找无法获得的所有信息都是自动获得的。在本文中，我们提出了一种基于叠加模型的自动获取描述F0轮廓的参数标签的方法。在一个小数据集上的初步测试表明，该方法能够以可接受的精度找到F0轮廓的参数化表示，并且可以通过引入额外的语言信息来提高精度。

引用次数: 4

Categorical perception: important phenomenon or lasting myth? 范畴知觉:重要现象还是持久的神话?

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-463

D. Massaro

Categorical perception, or the perceived equality of instances within a phoneme category, has been a central concept in the experimental and theoretical investigation of speech perception. It can be found as fact in most introductory textbooks in perception, cognition, linguistics and cognitive science. This paper analyzes the reasons for the persistent endurance of this concept. A variety of empirical and theoretical research findings are described in order to inform and hopefully to provide a more critical look at this pervasive concept. Given the demise of categorical perception, it is necessary to shift our theoretical focus to how multiple sources of continuous information are processed to support the perception of spoken language.

范畴知觉，或在一个音素范畴内的感知实例的平等，一直是语音知觉实验和理论研究的中心概念。在大多数知觉、认知、语言学和认知科学的入门教科书中都可以找到这一事实。本文分析了这一概念经久不衰的原因。本文描述了各种实证和理论研究结果，以告知并希望提供对这个普遍概念的更批判性的看法。鉴于范畴感知的消亡，有必要将我们的理论重点转移到如何处理多个连续信息源以支持口语感知。

引用次数: 21

Transform coding of LSF parameters using wavelets 利用小波变换编码LSF参数

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-394

D. Petrinović

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

5th International Conference on Spoken Language Processing (ICSLP 1998)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀