Increasingly, information is presented to users in a spatial domain where distances and orientation between objects imply some meaning. One's perception of distances between objects may be influenced by actual movement through space. Distances may be represented by visual, tactual, or auditory means. The current paper considers the relationship between the judgment of linear path distances which were presented either tactually, visually, or visually and tactually to subjects. Tactual paths were virtually created using force feedback fields. Additionally, the influence was examined of a constant simulated-friction force in terms of distance judgments. Based on the method of direct estimation ofmagnitude, a high correlation between tactual and visual estimates for eight path lengths was found. The results of the tactual condition with simulated friction indicated that the perceived distance between tactual objects can be manipulated without requiring longer movements of an input device. In general, results indicated that the spatial relations between objects can be accurately communicated by virtual tactual paths, which allows for the creation of dynamic spatial relations between user-interface elements.
{"title":"Estimation of virtually perceived distance","authors":"Dv David Keyson","doi":"10.1037/e491492004-001","DOIUrl":"https://doi.org/10.1037/e491492004-001","url":null,"abstract":"Increasingly, information is presented to users in a spatial domain where distances and orientation between objects imply some meaning. One's perception of distances between objects may be influenced by actual movement through space. Distances may be represented by visual, tactual, or auditory means. The current paper considers the relationship between the judgment of linear path distances which were presented either tactually, visually, or visually and tactually to subjects. Tactual paths were virtually created using force feedback fields. Additionally, the influence was examined of a constant simulated-friction force in terms of distance judgments. Based on the method of direct estimation ofmagnitude, a high correlation between tactual and visual estimates for eight path lengths was found. The results of the tactual condition with simulated friction indicated that the perceived distance between tactual objects can be manipulated without requiring longer movements of an input device. In general, results indicated that the spatial relations between objects can be accurately communicated by virtual tactual paths, which allows for the creation of dynamic spatial relations between user-interface elements.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116903258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a system for the automatic labelling of prosodic events. A rule-besed approach was chosen in order to make explicit the generally implicit knowledge and classification strategy applied in prosodic labelling. The purpose of the system is to label accents and phrase boudaries. The language chosen was French. First, the psychoacoustic parameters, pitch, duriation and loudness are extracted from the signal and pre-processed in order to find out clues related to the prosodic ecents. Then a set of rules is used to examine the results of this analysis and to determine the placemnet of the labels.
{"title":"Automatic labelling of prosodic events","authors":"F. Beaugendre, D. Hermes, G. Leenhardt","doi":"10.1037/e490122004-001","DOIUrl":"https://doi.org/10.1037/e490122004-001","url":null,"abstract":"This paper describes a system for the automatic labelling of prosodic events. A rule-besed approach was chosen in order to make explicit the generally implicit knowledge and classification strategy applied in prosodic labelling. The purpose of the system is to label accents and phrase boudaries. The language chosen was French. First, the psychoacoustic parameters, pitch, duriation and loudness are extracted from the signal and pre-processed in order to find out clues related to the prosodic ecents. Then a set of rules is used to examine the results of this analysis and to determine the placemnet of the labels.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121071909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique.
{"title":"On the performance of speech output in a practical setting","authors":"E. Klabbers, R. Collier","doi":"10.1037/e496512004-001","DOIUrl":"https://doi.org/10.1037/e496512004-001","url":null,"abstract":"In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128659838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In image display, the numerical data in the computer memory are mapped to luminance val ues on the screen. These luminances are transformed into a brightness impression by the observer. For medical images, the data often represent information with a quantitative meaning, such as the radiation absorption by tissues. It is, therefore, crucial for this information to be accurately transferred to brightness. Perceptual linearization means that equal steps in the data evoke equal steps in brightness sensation. Perceptually linear grey scales were formed by using magnitude estimation of brightness in simple stimuli. The resulting linearized lookup tables were then applied to complex images. Brightness matching was used to determine grey levels at specified image locations, prior to and after linearization. Results show that the accuracy in the matching task is not worsened after perceptual linearization. Hence, this method could be considered for standardization of the display.
{"title":"Perceptual linearization as a standard for displays","authors":"N. Belaïd, van Wmcj Ineke Overveld, J. B. Martens","doi":"10.1037/e494572004-001","DOIUrl":"https://doi.org/10.1037/e494572004-001","url":null,"abstract":"In image display, the numerical data in the computer memory are mapped to luminance val ues on the screen. These luminances are transformed into a brightness impression by the observer. For medical images, the data often represent information with a quantitative meaning, such as the radiation absorption by tissues. It is, therefore, crucial for this information to be accurately transferred to brightness. Perceptual linearization means that equal steps in the data evoke equal steps in brightness sensation. Perceptually linear grey scales were formed by using magnitude estimation of brightness in simple stimuli. The resulting linearized lookup tables were then applied to complex images. Brightness matching was used to determine grey levels at specified image locations, prior to and after linearization. Results show that the accuracy in the matching task is not worsened after perceptual linearization. Hence, this method could be considered for standardization of the display.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128052158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A music compilation strategy, named Personalized Automatic Track Selection (PATS), has been implemented which aims at automatically compiling music programmes that are preferred in a specific context-of-use. It is additionally assumed that subsequent PATS-compiled programmes adapt to the designated context-of-use. The paper reports the results of an experiment designed to evaluate the appreciation of PATS-compiled programmes. More concretely, tests with naive users assessed the appreciation of PATS-compiled programmes compared to randomly assembled programmes in two different contexts-of-use employing a within-subject paradigm. Appreciation was expressed by the measures precision and coverage. The results of the experiment demonstrate that the PATS-compiled programmes contained more preferred and more varied musical content than randomly assembled programmes in both contexts-of-use. In addition, PATS-compiled programmes appeared to contain more preferred content over trials.
{"title":"A comparative evaluation of strategies for compiling music programmes","authors":"S. Pauws, D. Ober, J. H. Eggen, D. Bouwhuis","doi":"10.1037/e490542004-001","DOIUrl":"https://doi.org/10.1037/e490542004-001","url":null,"abstract":"A music compilation strategy, named Personalized Automatic Track Selection (PATS), has been implemented which aims at automatically compiling music programmes that are preferred in a specific context-of-use. It is additionally assumed that subsequent PATS-compiled programmes adapt to the designated context-of-use. The paper reports the results of an experiment designed to evaluate the appreciation of PATS-compiled programmes. More concretely, tests with naive users assessed the appreciation of PATS-compiled programmes compared to randomly assembled programmes in two different contexts-of-use employing a within-subject paradigm. Appreciation was expressed by the measures precision and coverage. The results of the experiment demonstrate that the PATS-compiled programmes contained more preferred and more varied musical content than randomly assembled programmes in both contexts-of-use. In addition, PATS-compiled programmes appeared to contain more preferred content over trials.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121918869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vision can be regarded as 'inverse optics', i.e., the process in which measured characteristics of an optical image of the environment are used to reconstruct the material properties of this environment. Depending on the type of measurement that is being performed, flexibility of the metric upon which the measurement results are represented can be exploited to optimize the discriminability of the items whose characteristics are measured. We will derive approximate analytical expressions for such optimal metrics and discuss the implications of this idea to (the optimization of) the quality of displayed images.
{"title":"Towards the optimization of visual inputs","authors":"Tjwm Ruud Janssen, Fjj Frans Blommaert","doi":"10.1037/e494692004-001","DOIUrl":"https://doi.org/10.1037/e494692004-001","url":null,"abstract":"Vision can be regarded as 'inverse optics', i.e., the process in which measured characteristics of an optical image of the environment are used to reconstruct the material properties of this environment. Depending on the type of measurement that is being performed, flexibility of the metric upon which the measurement results are represented can be exploited to optimize the discriminability of the items whose characteristics are measured. We will derive approximate analytical expressions for such optimal metrics and discuss the implications of this idea to (the optimization of) the quality of displayed images.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133036725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conventional image decomposition techniques are limited in their redundancy reduction properties due to their inability to detect essential structural aspects in images. By using image analysis tools that are capable of detecting and extracting important image structures, however, more efficient coding algorithms can be developed. Also, content-based coding enables compression algorithms to be tuned more specifically to visually important image elements. To demonstrate the efficiency of such a content-based approach, we present an image compression scheme based on a Hermite transform that adapts to local image orientations. Simulations show that orientation adaptivity results in a significant reduction of redundancy. Comparisons with other compression techniques such as JPEG indicate that the proposed scheme performs very well for high compression ratios, not only in terms of peak-signal-tonoise ratio but also in terms of perceptual image quality.
{"title":"Transform coding of images using local orientation adaptivity","authors":"A. M. V. Dijk","doi":"10.1037/E493772004-001","DOIUrl":"https://doi.org/10.1037/E493772004-001","url":null,"abstract":"Conventional image decomposition techniques are limited in their redundancy reduction properties due to their inability to detect essential structural aspects in images. By using image analysis tools that are capable of detecting and extracting important image structures, however, more efficient coding algorithms can be developed. Also, content-based coding enables compression algorithms to be tuned more specifically to visually important image elements. To demonstrate the efficiency of such a content-based approach, we present an image compression scheme based on a Hermite transform that adapts to local image orientations. Simulations show that orientation adaptivity results in a significant reduction of redundancy. Comparisons with other compression techniques such as JPEG indicate that the proposed scheme performs very well for high compression ratios, not only in terms of peak-signal-tonoise ratio but also in terms of perceptual image quality.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115200608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has been shown that in Dutch a rising or fal1ing pitch movement can unambiguously accent a syllable Sn when the onset of the movement is positioned in an interval starting some tens of milliseconds before the onset of the vowel and ending somewhat before the offset of the vowel of SO' As the start of the movement is gradually shifted to positions later in the syllable, the percept of accentuation gradual1y shifts from syllable Sn to the fol1owing syllable Sn+1' This was shown in experiments with reiterant, resynthesized speech utterances like /.a.a.a.a.a/ and /mamamamama/ in which the position of the onset of rising and falling pitch movements was systematical1y varied, and subjects were asked to indicate which syllable they perceived as accented. In most cases the number of responses "syllable Sn accented" gradually decreases sigmoidally as the onset of the pitch movement starts later, but, in some cases, especially for falls in /.a.a.a.a.a/ stimuli with relatively long silent periods between the vowels, a plateau can be distinguished in the response distribution during which the proportion of responses "syllable Sn accented" does not continue to decrease, but remains constant. In these experiments, every stimulus was presented only twice to each subject, so that no clear response distributions of individual subjects were available, which did not allow for an interpretation of the nature of this plateau. To find out more about the nature of this plateau, response distributions of individual subjects were determined and compared. It is shown that the response distributions of single subjects do not show a plateau. The plateau arises from the fact that some subjects have an early accentuation boundary, while others have a late accentuation boundary. The implications of these findings are discussed.
{"title":"Individual differences in accentuation boundaries in Dutch","authors":"D. Hermes, F. Beaugendre, D. House","doi":"10.1037/e495132004-001","DOIUrl":"https://doi.org/10.1037/e495132004-001","url":null,"abstract":"It has been shown that in Dutch a rising or fal1ing pitch movement can unambiguously accent a syllable Sn when the onset of the movement is positioned in an interval starting some tens of milliseconds before the onset of the vowel and ending somewhat before the offset of the vowel of SO' As the start of the movement is gradually shifted to positions later in the syllable, the percept of accentuation gradual1y shifts from syllable Sn to the fol1owing syllable Sn+1' This was shown in experiments with reiterant, resynthesized speech utterances like /.a.a.a.a.a/ and /mamamamama/ in which the position of the onset of rising and falling pitch movements was systematical1y varied, and subjects were asked to indicate which syllable they perceived as accented. In most cases the number of responses \"syllable Sn accented\" gradually decreases sigmoidally as the onset of the pitch movement starts later, but, in some cases, especially for falls in /.a.a.a.a.a/ stimuli with relatively long silent periods between the vowels, a plateau can be distinguished in the response distribution during which the proportion of responses \"syllable Sn accented\" does not continue to decrease, but remains constant. In these experiments, every stimulus was presented only twice to each subject, so that no clear response distributions of individual subjects were available, which did not allow for an interpretation of the nature of this plateau. To find out more about the nature of this plateau, response distributions of individual subjects were determined and compared. It is shown that the response distributions of single subjects do not show a plateau. The plateau arises from the fact that some subjects have an early accentuation boundary, while others have a late accentuation boundary. The implications of these findings are discussed.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127535865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Two experiments are reported in which subjects identified the perceived material of synthetic impact sounds varying in two parameters: centre frequency and decay time. In one experiment, subjects were asked to write down the material they thought had produced the sound. The stimulus uncertainty was kept low, as the subjects could listen to a stimulus as often as they liked and listened to the sounds over headphones. In the other experiment, stimulus uncertainty was high, as subjects had to choose the material they thought produced the sound within three seconds after one stimulus presentation among five alternatives. In this forcedchoice condition, the five alternatives were the five materials most often mentioned on the response sheets in the first experiment. These were 'metal', 'wood', 'glass', 'rubber', and 'plastic'. It appears that the parameter values for which 'metal', 'wood', and 'glass' are perceived are similar for the two experimental conditions. For 'rubber' and even more so for 'plastic' the responses are more variable.
{"title":"Auditory material perception","authors":"D. Hermes","doi":"10.1037/e492042004-001","DOIUrl":"https://doi.org/10.1037/e492042004-001","url":null,"abstract":"Two experiments are reported in which subjects identified the perceived material of synthetic impact sounds varying in two parameters: centre frequency and decay time. In one experiment, subjects were asked to write down the material they thought had produced the sound. The stimulus uncertainty was kept low, as the subjects could listen to a stimulus as often as they liked and listened to the sounds over headphones. In the other experiment, stimulus uncertainty was high, as subjects had to choose the material they thought produced the sound within three seconds after one stimulus presentation among five alternatives. In this forcedchoice condition, the five alternatives were the five materials most often mentioned on the response sheets in the first experiment. These were 'metal', 'wood', 'glass', 'rubber', and 'plastic'. It appears that the parameter values for which 'metal', 'wood', and 'glass' are perceived are similar for the two experimental conditions. For 'rubber' and even more so for 'plastic' the responses are more variable.","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129478198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1007/978-3-642-85104-9_37
G. Spaai, A. Storm, D. Hermes
{"title":"A visual display system for the teaching of intonation to deaf persons","authors":"G. Spaai, A. Storm, D. Hermes","doi":"10.1007/978-3-642-85104-9_37","DOIUrl":"https://doi.org/10.1007/978-3-642-85104-9_37","url":null,"abstract":"","PeriodicalId":369207,"journal":{"name":"IPO Annual Progress Report","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124972730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}