Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341545
J.E. Tschirgi
The paper provides an overview of AT&T's collective efforts to develop voice and audio processing (VAP) for network applications. The first section of the paper discusses AT&T's focused efforts to manage technology resources from across the company. Next, key technology challenges within the public network market are discussed. These include required advances in technologies for new and existing markets, and the interaction of VAP technologies with specific applications. Finally, the paper discusses key challenges for managing the technology, including the transition from low-risk to higher-risk markets and the importance of setting expectations for technology performance.<>
{"title":"Voice and audio processing for telephony applications in AT&T","authors":"J.E. Tschirgi","doi":"10.1109/IVTTA.1994.341545","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341545","url":null,"abstract":"The paper provides an overview of AT&T's collective efforts to develop voice and audio processing (VAP) for network applications. The first section of the paper discusses AT&T's focused efforts to manage technology resources from across the company. Next, key technology challenges within the public network market are discussed. These include required advances in technologies for new and existing markets, and the interaction of VAP technologies with specific applications. Finally, the paper discusses key challenges for managing the technology, including the transition from low-risk to higher-risk markets and the importance of setting expectations for technology performance.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130647532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341552
C. Kamm, C. Shamieh, S. Singhal
Telephone companies in the United States handle over 6 billion directory assistance (DA) calls each year. Automation of even a portion of DA calls could significantly reduce the cost of DA services. The paper explores two factors affecting successful automation of DA: a) the effect of directory size on speech recognition performance, and b) the complexity of existing DA call interactions. Speech recognition performance for a set of 200 spoken names was measured for directories ranging from 200 to 1.5 million unique names. Recognition accuracy decreased from 82.5 percent for a 200-name directory to 18.5 percent for a 1.5 million name directory. In part because high recognition accuracy is not easily achievable for these very large, low-context directories, it is likely that initial implementations of DA automation will focus on a small percentage of calls, requiring a smaller vocabulary. To maximize the potential savings, listings that are most frequently requested appear to be the optimal vocabulary. To identify critical issues in automating frequent DA requests, approximately 13,000 DA calls from an office near a major metropolitan area in the United States were studied. In this sample, 245 listings covered 10 percent of the call volume, and 870 listings covered 20 percent of the call volume.<>
{"title":"Speech recognition issues for directory assistance applications","authors":"C. Kamm, C. Shamieh, S. Singhal","doi":"10.1109/IVTTA.1994.341552","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341552","url":null,"abstract":"Telephone companies in the United States handle over 6 billion directory assistance (DA) calls each year. Automation of even a portion of DA calls could significantly reduce the cost of DA services. The paper explores two factors affecting successful automation of DA: a) the effect of directory size on speech recognition performance, and b) the complexity of existing DA call interactions. Speech recognition performance for a set of 200 spoken names was measured for directories ranging from 200 to 1.5 million unique names. Recognition accuracy decreased from 82.5 percent for a 200-name directory to 18.5 percent for a 1.5 million name directory. In part because high recognition accuracy is not easily achievable for these very large, low-context directories, it is likely that initial implementations of DA automation will focus on a small percentage of calls, requiring a smaller vocabulary. To maximize the potential savings, listings that are most frequently requested appear to be the optimal vocabulary. To identify critical issues in automating frequent DA requests, approximately 13,000 DA calls from an office near a major metropolitan area in the United States were studied. In this sample, 245 listings covered 10 percent of the call volume, and 870 listings covered 20 percent of the call volume.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122714333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-26DOI: 10.1109/IVTTA.1994.341538
T. Matsumura, S. Matsunaga
A novel acoustic modeling algorithm that generates non-uniform unit HMMs to effectively cope with spectral variations in fluent speech is proposed. The algorithm is devised for the automatic iterative generation of long-span units for the non-uniform modeling. This generation algorithm is based on an entropy reduction criterion using text data and a maximum likelihood criterion using speech data. The effectiveness of the non-uniform models was confirmed by comparing likelihood values between the long-span unit HMMs and the conventional phoneme-unit HMMs. Preliminary results suggest that non-uniform unit HMMs achieve higher performance than phoneme-unit HMMs.<>
{"title":"Towards non-uniform unit HMMs for speech recognition","authors":"T. Matsumura, S. Matsunaga","doi":"10.1109/IVTTA.1994.341538","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341538","url":null,"abstract":"A novel acoustic modeling algorithm that generates non-uniform unit HMMs to effectively cope with spectral variations in fluent speech is proposed. The algorithm is devised for the automatic iterative generation of long-span units for the non-uniform modeling. This generation algorithm is based on an entropy reduction criterion using text data and a maximum likelihood criterion using speech data. The effectiveness of the non-uniform models was confirmed by comparing likelihood values between the long-span unit HMMs and the conventional phoneme-unit HMMs. Preliminary results suggest that non-uniform unit HMMs achieve higher performance than phoneme-unit HMMs.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127596486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-07-07DOI: 10.1109/IVTTA.1994.341533
H. Nishi, M. Kitai
New dialog promoting methods for voice storage dialog systems are discussed. The paper describes two hypotheses that attempt to put at ease those callers who hesitate or are shy by designing an interactive dialog or offering attractive information to the caller. The following two hypotheses are introduced. The first is the less information per utterance required by the system, the more comfortable the user feels. The second is the more attractive the information is that is given after an utterance by the system, the more the caller will want to have a message. The experimental conditions are explained to evaluate the previous hypotheses using an opinion score. The system consists of a PC, telephone line interface board, and control software. For eliminating the effect of recognition accuracy, dialogs are designed without speech recognition. Finally, the experimental results are described which indicate the usefulness of each hypothesis, and in addition these results show an increase of 0.7-0.8 points in a mean opinion score.<>
{"title":"Utterance promoting methods on speech dialog systems","authors":"H. Nishi, M. Kitai","doi":"10.1109/IVTTA.1994.341533","DOIUrl":"https://doi.org/10.1109/IVTTA.1994.341533","url":null,"abstract":"New dialog promoting methods for voice storage dialog systems are discussed. The paper describes two hypotheses that attempt to put at ease those callers who hesitate or are shy by designing an interactive dialog or offering attractive information to the caller. The following two hypotheses are introduced. The first is the less information per utterance required by the system, the more comfortable the user feels. The second is the more attractive the information is that is given after an utterance by the system, the more the caller will want to have a message. The experimental conditions are explained to evaluate the previous hypotheses using an opinion score. The system consists of a PC, telephone line interface board, and control software. For eliminating the effect of recognition accuracy, dialogs are designed without speech recognition. Finally, the experimental results are described which indicate the usefulness of each hypothesis, and in addition these results show an increase of 0.7-0.8 points in a mean opinion score.<<ETX>>","PeriodicalId":435907,"journal":{"name":"Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132552284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}