{"title":"An analysis of natural vs. synthetic speech intelligibility: A preliminary appraisal of a reading machine for the blind","authors":"P. W. Nye, J. H. Gaitenby, J. D. Hankins","doi":"10.1145/800192.805741","DOIUrl":null,"url":null,"abstract":"With Veterans Administration support, Haskins Laboratories has been developing a method of speech synthesis production for automatic reading aloud of printed text, with the goal of applying this technique to a practical reading machine for blind people. The laboratory prototype, as it exists today, uses for input a low cost Optical Character Recognition (OCR) device capable of reading (i.e., recognizing the print of) typewritten pages. The machine-readable orthographic text created by the OCR reader is then processed by a dictionary program which converts the input words to phonetic form. This program assigns stress and intonation symbols according to rules based on word type, context and sentence punctuation. The resulting phonetic code is then made visible to an editor who can insert corrections, if deemed necessary, before synthesis of the sentences begins. (Eventually the program will operate with no editorial intervention.)\n A series of intelligibility tests have been administered to both blind and sighted students at the University of Connecticut* in circumstances which allowed comparison of their listening performances with synthetic speech and—with natural speech.\n The tests, which are still in progress, have yielded results which indicate that the perception of synthetic speech places somewhat heavier demands on a listener's language processing capacity than does natural speech. However, this increased load appears to interact strongly with the subject content of the material, the syntactic structure, the punctuation provided in the text, and the speaking rate used in the output. An analysis of the results of this continuing evaluation study will be presented at the Conference.","PeriodicalId":72321,"journal":{"name":"ASSETS. Annual ACM Conference on Assistive Technologies","volume":"12 1","pages":"392-393"},"PeriodicalIF":0.0000,"publicationDate":"1973-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASSETS. Annual ACM Conference on Assistive Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/800192.805741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With Veterans Administration support, Haskins Laboratories has been developing a method of speech synthesis production for automatic reading aloud of printed text, with the goal of applying this technique to a practical reading machine for blind people. The laboratory prototype, as it exists today, uses for input a low cost Optical Character Recognition (OCR) device capable of reading (i.e., recognizing the print of) typewritten pages. The machine-readable orthographic text created by the OCR reader is then processed by a dictionary program which converts the input words to phonetic form. This program assigns stress and intonation symbols according to rules based on word type, context and sentence punctuation. The resulting phonetic code is then made visible to an editor who can insert corrections, if deemed necessary, before synthesis of the sentences begins. (Eventually the program will operate with no editorial intervention.)
A series of intelligibility tests have been administered to both blind and sighted students at the University of Connecticut* in circumstances which allowed comparison of their listening performances with synthetic speech and—with natural speech.
The tests, which are still in progress, have yielded results which indicate that the perception of synthetic speech places somewhat heavier demands on a listener's language processing capacity than does natural speech. However, this increased load appears to interact strongly with the subject content of the material, the syntactic structure, the punctuation provided in the text, and the speaking rate used in the output. An analysis of the results of this continuing evaluation study will be presented at the Conference.
在退伍军人管理局的支持下,哈斯金斯实验室一直在开发一种语音合成生产方法,用于自动大声朗读印刷文本,目标是将这种技术应用于盲人的实用阅读机器。实验室的原型,就像它今天存在的那样,使用一种低成本的光学字符识别(OCR)设备作为输入,能够读取(即识别打印的)打字页面。由OCR阅读器生成的机器可读的正字法文本然后由字典程序处理,该程序将输入的单词转换为语音形式。该程序根据单词类型、上下文和句子标点符号的规则分配重音和语调符号。然后,编辑可以看到生成的语音代码,如果认为有必要,编辑可以在句子合成开始之前插入更正。(最终,该项目将在没有编辑干预的情况下运行。)康涅狄格大学(University of Connecticut)对盲人和视力正常的学生*进行了一系列的可理解性测试,将他们的听力表现与合成语音和自然语音进行比较。这些仍在进行中的测试结果表明,与自然语音相比,感知合成语音对听者的语言处理能力提出了更大的要求。然而,这种增加的负荷似乎与材料的主题内容、句法结构、文本中提供的标点符号以及输出中使用的语速有强烈的相互作用。将在会议上提出对这项持续评价研究结果的分析。