Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341526
A. Gorin, H. Hanek, R. C. Rose, L. Miller
Considers the task of automated call routing in a telecommunications network. When a customer desires some service, they should proceed by dialing a single universal number, which prompts them with "Hello, how may I help you.?" The person then responds to this prompt via unconstrained fluent speech, on which basis the call is automatically routed to an appropriate destination. We report on an on-line experimental system which explores the feasibility of this concept, comprising an information-theoretic connectionist network. Embedded in a feedback control system. Experimental results are reported for a database of recorded customer/operator dialogs.<>
{"title":"Automated call routing in a telecommunications network","authors":"A. Gorin, H. Hanek, R. C. Rose, L. Miller","doi":"10.1109/IVTTA.1994.341526","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341526","url":null,"abstract":"Considers the task of automated call routing in a telecommunications network. When a customer desires some service, they should proceed by dialing a single universal number, which prompts them with \"Hello, how may I help you.?\" The person then responds to this prompt via unconstrained fluent speech, on which basis the call is automatically routed to an appropriate destination. We report on an on-line experimental system which explores the feasibility of this concept, comprising an information-theoretic connectionist network. Embedded in a feedback control system. Experimental results are reported for a database of recorded customer/operator dialogs.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116225321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341524
T.I. Boogaart, L. Bos, L. Boves
The paper gives a short description of the creation of the Dutch POLYPHONE corpus. It then proceeds to show how that corpus has already been put to use in the development of a number of practical applications and how it is used for research purposes. Applications that are touched upon include a fully automatic train time table information system and the automation of collect call and phone card calls. The paper concludes with remarks on and recommendations for future corpus development.<>
{"title":"Use of the Dutch POLYPHONE corpus for application development","authors":"T.I. Boogaart, L. Bos, L. Boves","doi":"10.1109/IVTTA.1994.341524","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341524","url":null,"abstract":"The paper gives a short description of the creation of the Dutch POLYPHONE corpus. It then proceeds to show how that corpus has already been put to use in the development of a number of practical applications and how it is used for research purposes. Applications that are touched upon include a fully automatic train time table information system and the automation of collect call and phone card calls. The paper concludes with remarks on and recommendations for future corpus development.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132520707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341548
N. Sugamura, T. Hirokawa, S. Sagayama, S. Furui
The paper describes major research and development in speech recognition and synthesis technologies at NTT from the telecommunications applications viewpoint. Technologies include speaker-dependent, speaker-independent word recognition based on DP matching, speaker-independent word spotting based on HMM, large vocabulary speaker-independent continuous speech recognition based on HMM-LR and high-quality Japanese text-to-speech synthesis. A commercial ANSER system that uses speech recognition and synthesis technologies is also introduced.<>
{"title":"Speech processing technologies and telecommunications applications at NTT","authors":"N. Sugamura, T. Hirokawa, S. Sagayama, S. Furui","doi":"10.1109/IVTTA.1994.341548","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341548","url":null,"abstract":"The paper describes major research and development in speech recognition and synthesis technologies at NTT from the telecommunications applications viewpoint. Technologies include speaker-dependent, speaker-independent word recognition based on DP matching, speaker-independent word spotting based on HMM, large vocabulary speaker-independent continuous speech recognition based on HMM-LR and high-quality Japanese text-to-speech synthesis. A commercial ANSER system that uses speech recognition and synthesis technologies is also introduced.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125240412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341541
N. Hataoka, T. Odaka, A. Amano
The paper proposes a new speech recognition system based on CSS (client and server system) architecture and describes a telephone application which combines an existing telephone PBX system and an OA (office automation) system consisting of personal computers and fax machines, The server, which is separated from application software, is mainly for speech recognition, and the client is an application-driven processor which includes a speech input part and application software. The CSS-based speech recognition system makes it much easier to use the installed speech recognition server to various applications without any big changes of the processing algorithm and architecture. The authors also propose a new method for making acoustic models which cover speakers' speech feature variety. This method is based on an automatic generation of speech recognition units from a large speech database.<>
本文提出了一种基于CSS (client and server system)架构的新型语音识别系统,描述了一种将现有的电话PBX系统与由个人计算机和传真机组成的OA (office automation)系统相结合的电话应用程序,其中服务器端与应用软件分离,主要用于语音识别,客户端是应用驱动的处理器,包括语音输入部分和应用软件。基于css的语音识别系统使得安装好的语音识别服务器在不改变处理算法和体系结构的情况下,更容易用于各种应用。作者还提出了一种新的方法来制作涵盖说话人语音特征多样性的声学模型。该方法基于从大型语音数据库中自动生成语音识别单元。
{"title":"Speech recognition system for automatic telephone operator based on CSS architecture","authors":"N. Hataoka, T. Odaka, A. Amano","doi":"10.1109/IVTTA.1994.341541","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341541","url":null,"abstract":"The paper proposes a new speech recognition system based on CSS (client and server system) architecture and describes a telephone application which combines an existing telephone PBX system and an OA (office automation) system consisting of personal computers and fax machines, The server, which is separated from application software, is mainly for speech recognition, and the client is an application-driven processor which includes a speech input part and application software. The CSS-based speech recognition system makes it much easier to use the installed speech recognition server to various applications without any big changes of the processing algorithm and architecture. The authors also propose a new method for making acoustic models which cover speakers' speech feature variety. This method is based on an automatic generation of speech recognition units from a large speech database.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122112431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341544
Y. Yamazaki, T. Morimoto
ATR Interpreting Telecommunications Research Laboratories(ATR-ITL) was establish in 1993, aiming at the establishment of fundamental technologies for spontaneous speech translation. In order to overcome many phenomena which make spontaneous speech translation difficult, the authors are carrying out research on spontaneous speech recognition, prosody processing and synthesis of natural-sounding speech, language translation and system integration. Furthermore they are constructing a large-scale integrated speech and language database which will be used for analysis of acoustic and linguistic characteristics.<>
{"title":"ATR research activities on speech translation","authors":"Y. Yamazaki, T. Morimoto","doi":"10.1109/IVTTA.1994.341544","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341544","url":null,"abstract":"ATR Interpreting Telecommunications Research Laboratories(ATR-ITL) was establish in 1993, aiming at the establishment of fundamental technologies for spontaneous speech translation. In order to overcome many phenomena which make spontaneous speech translation difficult, the authors are carrying out research on spontaneous speech recognition, prosody processing and synthesis of natural-sounding speech, language translation and system integration. Furthermore they are constructing a large-scale integrated speech and language database which will be used for analysis of acoustic and linguistic characteristics.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341521
D. J. Krasinski, R.A. Sukkar
AT&T has introduced a network call routing service that uses automatic speech recognition (ASR) to let callers select from a menu of choices by voice. The requirements of the service posed a number of challenges for the technology to meet. The paper describes the evolution of the service over time and discusses a number of key issues and how they were addressed.<>
{"title":"Automatic speech recognition for network call routing","authors":"D. J. Krasinski, R.A. Sukkar","doi":"10.1109/IVTTA.1994.341521","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341521","url":null,"abstract":"AT&T has introduced a network call routing service that uses automatic speech recognition (ASR) to let callers select from a menu of choices by voice. The requirements of the service posed a number of challenges for the technology to meet. The paper describes the evolution of the service over time and discusses a number of key issues and how they were addressed.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127318819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341520
W. E. Longenbaker, R. J. Perdue, S. M. Salchenberger
[BThe challenges and design decisions associated with the successful deployment of speech recognition technology in a telephone network-based system providing automated operator services are described. The relevant advances in operator automation that had preceded the present work are briefly reviewed, and the requirements for the introduction of automated speech recognition (ASR) into the automated alternate billing service (AABS) are outlined. The enabling technology advances, including word spotting and barge-in, are highlighted, and the overall deployment strategy for the application is discussed. The billing type and billing acceptance recognition tasks, using speaker-independent whole word models, are described and field results reported. The "bootstrapping" approach to data collection and model improvements is summarized. Next, the current results of the billing detail recognition trials using connected digit ASR are described. These trials are ongoing and involve the recognition of 10-digit telephone numbers and 14-digit calling card numbers. The AABS-ASR service has been fully deployed across the United States and handles over 50 million calls a month for AT&T alone.<>
{"title":"Automation of operator services: a successful application of speech recognition technology","authors":"W. E. Longenbaker, R. J. Perdue, S. M. Salchenberger","doi":"10.1109/IVTTA.1994.341520","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341520","url":null,"abstract":"[BThe challenges and design decisions associated with the successful deployment of speech recognition technology in a telephone network-based system providing automated operator services are described. The relevant advances in operator automation that had preceded the present work are briefly reviewed, and the requirements for the introduction of automated speech recognition (ASR) into the automated alternate billing service (AABS) are outlined. The enabling technology advances, including word spotting and barge-in, are highlighted, and the overall deployment strategy for the application is discussed. The billing type and billing acceptance recognition tasks, using speaker-independent whole word models, are described and field results reported. The \"bootstrapping\" approach to data collection and model improvements is summarized. Next, the current results of the billing detail recognition trials using connected digit ASR are described. These trials are ongoing and involve the recognition of 10-digit telephone numbers and 14-digit calling card numbers. The AABS-ASR service has been fully deployed across the United States and handles over 50 million calls a month for AT&T alone.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116742724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341549
M. Koo, I. Sohn, Che-Hong Ahn
Presents a large vocabulary, speaker independent speech recognition system (KT-STOCK) and describes its experimental field trial over the telephone network. KT-STOCK is an information retrieval system with which one can obtain the current price of a stock by saying a stock name among 712 stock names listed on the Korea stock exchange. The system is a HMM (hidden Markov model)-based isolated speech recognizer which uses the phoneme-like unit as a basic unit. KT-STOCK has been put under experimental field trials since June 24 1994. Preliminary results show that the performance in the simulation environment is different from that in the real environment. Currently, the authors have achieved the recognition rate of 61.9% in the real environment.<>
{"title":"An experimental field trial of a large vocabulary speaker independent recognition system","authors":"M. Koo, I. Sohn, Che-Hong Ahn","doi":"10.1109/IVTTA.1994.341549","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341549","url":null,"abstract":"Presents a large vocabulary, speaker independent speech recognition system (KT-STOCK) and describes its experimental field trial over the telephone network. KT-STOCK is an information retrieval system with which one can obtain the current price of a stock by saying a stock name among 712 stock names listed on the Korea stock exchange. The system is a HMM (hidden Markov model)-based isolated speech recognizer which uses the phoneme-like unit as a basic unit. KT-STOCK has been put under experimental field trials since June 24 1994. Preliminary results show that the performance in the simulation environment is different from that in the real environment. Currently, the authors have achieved the recognition rate of 61.9% in the real environment.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129231753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341527
M. Kitai, H. Nishi
We developed an experimental voice activated telephone intermediary system in 1993. It is intended to accept the caller's message and to transfer the call to an appropriate number according to the callee's schedule and the callee's setting of services. A caller can use such services by speaking his name, callee's name, confirmation words, his phone number, and his message, in that order, in response to system prompts. An experiment was conducted using 11 callees, laboratory researchers. 139 calls were logged within the 22 day experiment. System usage ratio and service completion ratio were 53 percent and 86 percent, respectively. The paper describes the experiment and it's results, and discusses the dialog designs that minimize recognition error and encourage callers to start/continue to use the system.<>
{"title":"The evaluation of trial results for a voice activated telephone intermediary system","authors":"M. Kitai, H. Nishi","doi":"10.1109/IVTTA.1994.341527","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341527","url":null,"abstract":"We developed an experimental voice activated telephone intermediary system in 1993. It is intended to accept the caller's message and to transfer the call to an appropriate number according to the callee's schedule and the callee's setting of services. A caller can use such services by speaking his name, callee's name, confirmation words, his phone number, and his message, in that order, in response to system prompts. An experiment was conducted using 11 callees, laboratory researchers. 139 calls were logged within the 22 day experiment. System usage ratio and service completion ratio were 53 percent and 86 percent, respectively. The paper describes the experiment and it's results, and discusses the dialog designs that minimize recognition error and encourage callers to start/continue to use the system.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341547
R. Billi, F. Canavesio, A. Ciaramella, L. Nebbia
The paper is a survey of the speech technologies and applications developed at CSELT, some of which are employed in real services on the Italian telephone network. This represents a major extension from the previous activity of the centre, which was essentially oriented to research, to a now broader set of activities which range from defining and experimenting new algorithmic approaches to speech product engineering and application development. In particular the paper describes two applications in the field trial phase, one related to an automatic operator, providing voice selection of large name directories, the other related to an automated network service for directory assistance, which is now accessible to all the Italian telephone customers.<>
{"title":"Interactive voice technology at work: the CSELT experience","authors":"R. Billi, F. Canavesio, A. Ciaramella, L. Nebbia","doi":"10.1109/IVTTA.1994.341547","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341547","url":null,"abstract":"The paper is a survey of the speech technologies and applications developed at CSELT, some of which are employed in real services on the Italian telephone network. This represents a major extension from the previous activity of the centre, which was essentially oriented to research, to a now broader set of activities which range from defining and experimenting new algorithmic approaches to speech product engineering and application development. In particular the paper describes two applications in the field trial phase, one related to an automatic operator, providing voice selection of large name directories, the other related to an automated network service for directory assistance, which is now accessible to all the Italian telephone customers.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130055601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}