Building interfaces where the user-system communication relies on speech is often motivated by the added value expected from speech: a more natural, efficient communication that also frees the hands (and the eyes) of the user. However, when developing such an interface, one has to remember that just like other systems, spoken interfaces require a proper design, implying an adequate analysis of the user's needs throughout the dialogue. The VODIS project has led to the design and development of a spoken interface for the control of car equipment. Due to the workload caused by the task of driving the vehicle, spoken communication provides a potentially safe and efficient mode of controlling the car equipment. Here we report the main characteristics of the central module of the system, the Dialogue Manager, designed and implemented by IPO, which comprises the necessary components aimed at achieving the goal of a robust and efficient dialogue system. We mainly concentrate on two important activities carried out in the project: the integration of the various modules into a task model taking into account the characteristics of a spoken command dialogue, and the necessity of customizing parts of the system to spoken communication.
{"title":"Speech and the user interface of consumer products : the VODIS system","authors":"Xhg Xavier Pouteau","doi":"10.1037/e496092004-001","DOIUrl":"https://doi.org/10.1037/e496092004-001","url":null,"abstract":"Building interfaces where the user-system communication relies on speech is often motivated by the added value expected from speech: a more natural, efficient communication that also frees the hands (and the eyes) of the user. However, when developing such an interface, one has to remember that just like other systems, spoken interfaces require a proper design, implying an adequate analysis of the user's needs throughout the dialogue. The VODIS project has led to the design and development of a spoken interface for the control of car equipment. Due to the workload caused by the task of driving the vehicle, spoken communication provides a potentially safe and efficient mode of controlling the car equipment. Here we report the main characteristics of the central module of the system, the Dialogue Manager, designed and implemented by IPO, which comprises the necessary components aimed at achieving the goal of a robust and efficient dialogue system. We mainly concentrate on two important activities carried out in the project: the integration of the various modules into a task model taking into account the characteristics of a spoken command dialogue, and the necessity of customizing parts of the system to spoken communication.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127880381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper a description is given of an experiment in which the feasibility of a non-visual GUI device was explored. Both blind and blindfolded subjects participated in the experiment. Because experienced blind GUI us.ers are scarce, a playing card metaphor was used as the basis for the screens presented to the subjects. During the experiment, subjects were asked to locate and select/drag a specific object with the help of the non-visual interaction device. The results of the experiment show that the interaction device is suited for use in a non-visual GUI access system. However, the results also indicated that the addition of an auditory/tactile localization aid is desirable.
{"title":"GUI access for blind users : a sound initiative","authors":"L.H.D. Poll, J. H. Eggen","doi":"10.1037/e492272004-001","DOIUrl":"https://doi.org/10.1037/e492272004-001","url":null,"abstract":"In this paper a description is given of an experiment in which the feasibility of a non-visual GUI device was explored. Both blind and blindfolded subjects participated in the experiment. Because experienced blind GUI us.ers are scarce, a playing card metaphor was used as the basis for the screens presented to the subjects. During the experiment, subjects were asked to locate and select/drag a specific object with the help of the non-visual interaction device. The results of the experiment show that the interaction device is suited for use in a non-visual GUI access system. However, the results also indicated that the addition of an auditory/tactile localization aid is desirable.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117184298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Within the area of broadcasting and entertainment, stereoscopic displays are used to heighten the viewer's sense of excitement and qual ity. To evaluate these subjective experiences, an appreciation-oriented approach may be appropriate. Within this context, the current study investigates the influence of image disparity, convergence distance and focal length on the subjective assessment of depth, naturalness of depth and quality of depth. Twelve observers viewed a set of stereoscopic still images varying in image disparity, convergence distance and focal length. Each observer was asked to rate his/her impression of depth, naturalness of depth and quality of depth, in separate counterbalanced sessions. Results indicate that observers prefer a stereoscopic presentation of images over a monoscopic presentation. A clear optimum was found at 4 em image disparity for both subjective judgments of naturalness and of quality. A focal length effect was only found for extreme image disparities. Although there was a strong linear relationship between naturalness and quality (a correlation of r=O.96), a small but systematic deviation could be observed. This quality-naturalness shift is discussed in relation to similar, yet more pronounced findings in the domain of colour perception.
{"title":"The effect of image disparity, convergence distance and focal length on perceived quality in stereoscopic displays","authors":"W. IJsselsteijn, de H Huib Ridder, R. Hamberg","doi":"10.1037/e492392004-001","DOIUrl":"https://doi.org/10.1037/e492392004-001","url":null,"abstract":"Within the area of broadcasting and entertainment, stereoscopic displays are used to heighten the viewer's sense of excitement and qual ity. To evaluate these subjective experiences, an appreciation-oriented approach may be appropriate. Within this context, the current study investigates the influence of image disparity, convergence distance and focal length on the subjective assessment of depth, naturalness of depth and quality of depth. Twelve observers viewed a set of stereoscopic still images varying in image disparity, convergence distance and focal length. Each observer was asked to rate his/her impression of depth, naturalness of depth and quality of depth, in separate counterbalanced sessions. Results indicate that observers prefer a stereoscopic presentation of images over a monoscopic presentation. A clear optimum was found at 4 em image disparity for both subjective judgments of naturalness and of quality. A focal length effect was only found for extreme image disparities. Although there was a strong linear relationship between naturalness and quality (a correlation of r=O.96), a small but systematic deviation could be observed. This quality-naturalness shift is discussed in relation to similar, yet more pronounced findings in the domain of colour perception.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124554227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the design of pragmatic-interpretation and dialogue-management modules in an automatic enquiry system that can be consulted through spoken natural language over the telephone. The system is designed around a central multi-level data structure representing the discourse that has unfolded during the dialogue. At the highest level of this discourse representation the information exchange is represented as a series of information-state changes or updates. Several conditions in the information state itself give rise to actions of the dialogue manager. The dialogue manager is designed to achieve the user's goal in a manner that is understandable to the user, efficient and correct. This is not a trivial problem because natural language and, in particular, speech understanding lead to many uncertainties. To deal with uncertain information, we have designed feedback and verification mechanisms and means for contextual understanding, underspecification and pragmatic inferencing.
{"title":"Pragmatic interpretation and dialogue management in spoken-language systems","authors":"G. V. V. Zanten","doi":"10.1037/E492692004-001","DOIUrl":"https://doi.org/10.1037/E492692004-001","url":null,"abstract":"This paper describes the design of pragmatic-interpretation and dialogue-management modules in an automatic enquiry system that can be consulted through spoken natural language over the telephone. The system is designed around a central multi-level data structure representing the discourse that has unfolded during the dialogue. At the highest level of this discourse representation the information exchange is represented as a series of information-state changes or updates. Several conditions in the information state itself give rise to actions of the dialogue manager. The dialogue manager is designed to achieve the user's goal in a manner that is understandable to the user, efficient and correct. This is not a trivial problem because natural language and, in particular, speech understanding lead to many uncertainties. To deal with uncertain information, we have designed feedback and verification mechanisms and means for contextual understanding, underspecification and pragmatic inferencing.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133400312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modelling rhythmic characteristics of speech is expected to contribute to the acceptability of synthetic speech. However, before rules for the control of speech rhythm in synthetic speech can be developed, we need to know which properties of speech give rise to the perception of speech rhythm. An experiment is described which investigates how the distributions of stressed syllables and pitch accents contribute to the perceived rhythmicity of speech. The outcomes show that the perception of rhythm is related to the distribution of locally prominent syllables: primarily to accents, but also to stressed syllables in long stretches of speech without accented syllables. Furthermore, it appears that, once a rhythmic pattern has been established by the initial part of an utterance, listeners are quite tolerant of local deviations from this pattern later on in the utterance.
{"title":"The role of stress and accent in the perception of speech rhythm","authors":"Cn Grover, J. Terken","doi":"10.1037/e495112004-001","DOIUrl":"https://doi.org/10.1037/e495112004-001","url":null,"abstract":"Modelling rhythmic characteristics of speech is expected to contribute to the acceptability of synthetic speech. However, before rules for the control of speech rhythm in synthetic speech can be developed, we need to know which properties of speech give rise to the perception of speech rhythm. An experiment is described which investigates how the distributions of stressed syllables and pitch accents contribute to the perceived rhythmicity of speech. The outcomes show that the perception of rhythm is related to the distribution of locally prominent syllables: primarily to accents, but also to stressed syllables in long stretches of speech without accented syllables. Furthermore, it appears that, once a rhythmic pattern has been established by the initial part of an utterance, listeners are quite tolerant of local deviations from this pattern later on in the utterance.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121595603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the expanding possibilities of computer-controlled consumer products and the accordingly increasing complexity of manipulation, ease of use becomes an attribute of growing importance. Since it is a rather vague concept covering a variety of aspects, this paper describes a method which permits quantitative assessment of ease of use and its deciding factors. In particular, the preference subjects show for using specific entry devices as a function of their communication efficiency has been investigated. Users' preference for performing a given data-entry task by means of one of the available input devices could be manipulated by experimental variation of the relative speed and accuracy of the interaction. Our preliminary results show the proposed method to be a promising means of quantitative assessment of the determinants of ease of use. Accordingly, suggestions are given for further improvement of the experiments.
{"title":"Quantitative assessment of communication efficiency and users' preference: experiments worth improving","authors":"F. L. Engel, R. Haakma, J. V. D. Vijver","doi":"10.1037/e492452004-001","DOIUrl":"https://doi.org/10.1037/e492452004-001","url":null,"abstract":"With the expanding possibilities of computer-controlled consumer products and the accordingly increasing complexity of manipulation, ease of use becomes an attribute of growing importance. Since it is a rather vague concept covering a variety of aspects, this paper describes a method which permits quantitative assessment of ease of use and its deciding factors. In particular, the preference subjects show for using specific entry devices as a function of their communication efficiency has been investigated. Users' preference for performing a given data-entry task by means of one of the available input devices could be manipulated by experimental variation of the relative speed and accuracy of the interaction. Our preliminary results show the proposed method to be a promising means of quantitative assessment of the determinants of ease of use. Accordingly, suggestions are given for further improvement of the experiments.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129562314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bm Richard van de Sluis, Jh Berry Eggen, J. Rypkema
This study explores the end-user benefits of using nonspeech audio in television user interfaces. A prototype of an Electronic Programme Guide (EPG) served as a carrier for the research. One of the features of this EPG is the possibility to search for TV programmes in a category-based way. The EPG prototype was 'sonically-enhanced' with so-called category sounds. These category sounds were also used as auditory reminders indicating that a TV programme from a given category is about to start. Furthermore, certain characteristics of the category sound were manipulated to represent the urgency of a reminder. Two experiments are described. In the first experiment, the usability of category sounds was evaluated. In the second experiment, it was tested whether 'listener-source distance' is an appropriate metaphor to inform users about the urgency of an auditory reminder. The results showed that people can easily learn to match the category sounds to the corresponding TV programme categories, that the use of category sounds is effective, and that the category sounds were appreciated by a large part of the subjects. In the second experiment, it was found that the distance of a sound source is a useful metaphor to use in an auditory reminder to indicate the distance in time before a programme is going to start.
{"title":"Nonspeech audio in user interfaces for TV","authors":"Bm Richard van de Sluis, Jh Berry Eggen, J. Rypkema","doi":"10.1037/e491952004-001","DOIUrl":"https://doi.org/10.1037/e491952004-001","url":null,"abstract":"This study explores the end-user benefits of using nonspeech audio in television user interfaces. A prototype of an Electronic Programme Guide (EPG) served as a carrier for the research. One of the features of this EPG is the possibility to search for TV programmes in a category-based way. The EPG prototype was 'sonically-enhanced' with so-called category sounds. These category sounds were also used as auditory reminders indicating that a TV programme from a given category is about to start. Furthermore, certain characteristics of the category sound were manipulated to represent the urgency of a reminder. Two experiments are described. In the first experiment, the usability of category sounds was evaluated. In the second experiment, it was tested whether 'listener-source distance' is an appropriate metaphor to inform users about the urgency of an auditory reminder. The results showed that people can easily learn to match the category sounds to the corresponding TV programme categories, that the use of category sounds is effective, and that the category sounds were appreciated by a large part of the subjects. In the second experiment, it was found that the distance of a sound source is a useful metaphor to use in an auditory reminder to indicate the distance in time before a programme is going to start.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122279477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new approach for user navigation and guidance is described in which initiative and topic selection by an interactive instruction system can be combined with initiative and topic selection by a student. The topic selection of the system is based on foreknowledge, goals and the capabilities of the individual student. In an experiment, we found that topic selection by the system had an advantage for students who were unable to monitor their own learning process.
{"title":"User navigation and guidance","authors":"J. Masthoff","doi":"10.1037/e490972004-001","DOIUrl":"https://doi.org/10.1037/e490972004-001","url":null,"abstract":"A new approach for user navigation and guidance is described in which initiative and topic selection by an interactive instruction system can be combined with initiative and topic selection by a student. The topic selection of the system is based on foreknowledge, goals and the capabilities of the individual student. In an experiment, we found that topic selection by the system had an advantage for students who were unable to monitor their own learning process.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"319 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132793693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical specialists are interested in applying Automatic Speech Recognition to save time and money spent on medical reporting. A number of issues need to be resolved in order to apply this technology successfully. This paper concentrates on the issue of feedback. An experiment is described in which the most appropriate feedback modality for presenting the recognition result in the situation of the pathologist is investigated. Although at this moment visual feedback seems the safest solution, error-detection performance is still poor. A totally different approach to medical reporting is presented which may prove to be the best solution.
{"title":"Automatic speech recognition in the medical environment","authors":"E. Verheijen","doi":"10.1037/e491832004-001","DOIUrl":"https://doi.org/10.1037/e491832004-001","url":null,"abstract":"Medical specialists are interested in applying Automatic Speech Recognition to save time and money spent on medical reporting. A number of issues need to be resolved in order to apply this technology successfully. This paper concentrates on the issue of feedback. An experiment is described in which the most appropriate feedback modality for presenting the recognition result in the situation of the pathologist is investigated. Although at this moment visual feedback seems the safest solution, error-detection performance is still poor. A totally different approach to medical reporting is presented which may prove to be the best solution.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134038834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the results of an experiment designed to understand task-directed human explorative behaviour in a large music collection. The subject's task was to compile a music programme preferred in a specific context-of-use, e.g., romantic evening, party. Experimental conditions were defined in which subjects were provided with no music recommendations, randomly drawn recommendations, or algorithmically determined recommendations while carrying out the task. The provision of recommendations meant to improve performance in the compilation task. When recommendations were provided, subjects systematically selected, played back, and compiled fewer items by themselves, but instead made use of the recommendations. This observation was not coupled with a reduction in the amount of time spent on the compilation task. But when asked for their preference, subjects chose the provision of algorithmically determined recommendations above the provision of randomly drawn recommendations or no recommendations.
{"title":"Explorative strategies while compiling music","authors":"S. Pauws, J. H. Eggen, D. Bouwhuis","doi":"10.1037/e491082004-001","DOIUrl":"https://doi.org/10.1037/e491082004-001","url":null,"abstract":"This paper describes the results of an experiment designed to understand task-directed human explorative behaviour in a large music collection. The subject's task was to compile a music programme preferred in a specific context-of-use, e.g., romantic evening, party. Experimental conditions were defined in which subjects were provided with no music recommendations, randomly drawn recommendations, or algorithmically determined recommendations while carrying out the task. The provision of recommendations meant to improve performance in the compilation task. When recommendations were provided, subjects systematically selected, played back, and compiled fewer items by themselves, but instead made use of the recommendations. This observation was not coupled with a reduction in the amount of time spent on the compilation task. But when asked for their preference, subjects chose the provision of algorithmically determined recommendations above the provision of randomly drawn recommendations or no recommendations.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133283285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}