Kazuya Otake, S. Okamoto, Yasuhiro Akiyama, Yoji Yamada
There is increasing demand for tactile feedback functions for touch panels. We investigated whether virtual roughness texture quality can be improved through simultaneous use of vibrotactile and electrostatic-friction stimuli. This conjunctive use is expected to improve the perceptual quality of texture stimuli, because vibrotactile and electrostatic-friction stimuli have complementary characteristics. Our previous studies confirmed that these conjunct stimuli yield enhanced realism for simple grating roughness. In this study, we conducted experiments using simple and complex sinusoidal surface profiles consisting of one or two spatial wave components. Three different evaluation criteria were employed. The first criterion concerned the subjective realism, i.e., similarity with actual roughness textures, of virtual roughness textures. Participants compared the following three stimulus conditions: vibrotactile stimuli only, electrostatic-friction stimuli only, and their conjunct stimuli. The conjunct stimuli yielded the greatest realism. The second criterion concerned roughness texture identification under each of the three stimulus conditions for five different roughness textures. The highest identification accuracy rate was achieved under the conjunct stimulus condition; however, the performance difference was marginal. The third criterion concerned the discrimination threshold of the grating-scale spatial wavelength. There were no marked differences among the results for the three conditions. The findings of this study will improve virtual texture quality for touch-panel-type surface tactile displays.
{"title":"Tactile Texture Display Combining Vibrotactile and Electrostatic-friction Stimuli: Substantial Effects on Realism and Moderate Effects on Behavioral Responses","authors":"Kazuya Otake, S. Okamoto, Yasuhiro Akiyama, Yoji Yamada","doi":"10.1145/3539733","DOIUrl":"https://doi.org/10.1145/3539733","url":null,"abstract":"There is increasing demand for tactile feedback functions for touch panels. We investigated whether virtual roughness texture quality can be improved through simultaneous use of vibrotactile and electrostatic-friction stimuli. This conjunctive use is expected to improve the perceptual quality of texture stimuli, because vibrotactile and electrostatic-friction stimuli have complementary characteristics. Our previous studies confirmed that these conjunct stimuli yield enhanced realism for simple grating roughness. In this study, we conducted experiments using simple and complex sinusoidal surface profiles consisting of one or two spatial wave components. Three different evaluation criteria were employed. The first criterion concerned the subjective realism, i.e., similarity with actual roughness textures, of virtual roughness textures. Participants compared the following three stimulus conditions: vibrotactile stimuli only, electrostatic-friction stimuli only, and their conjunct stimuli. The conjunct stimuli yielded the greatest realism. The second criterion concerned roughness texture identification under each of the three stimulus conditions for five different roughness textures. The highest identification accuracy rate was achieved under the conjunct stimulus condition; however, the performance difference was marginal. The third criterion concerned the discrimination threshold of the grating-scale spatial wavelength. There were no marked differences among the results for the three conditions. The findings of this study will improve virtual texture quality for touch-panel-type surface tactile displays.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":" ","pages":"1 - 18"},"PeriodicalIF":1.6,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45590983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Driving simulators are established tools used during automotive development and research. Most simulators use either monitors or projectors as their primary display system. However, the emergence of a new generation of head-mounted displays has triggered interest in using these as the primary display type. The general benefits and drawbacks of head-mounted displays are well researched, but their effect on driving behavior in a simulator has not been sufficiently quantified. This article presents a study of driving behavior differences between projector-based graphics and head-mounted display in a large dynamic driving simulator. This study has selected five specific driving maneuvers suspected of affecting driving behavior differently depending on the choice of display technology. Some of these maneuvers were chosen to reveal changes in lateral and longitudinal driving behavior. Others were picked for their ability to highlight the benefits and drawbacks of head-mounted displays in a driving context. The results show minor changes in lateral and longitudinal driver behavior changes when comparing projectors and a head-mounted display. The most noticeable difference in favor of projectors was seen when the display resolution is critical to the driving task. The choice of display type did not affect simulator sickness nor the realism rated by the subjects.
{"title":"The Effects on Driving Behavior When Using a Head-mounted Display in a Dynamic Driving Simulator","authors":"Björn Blissing, F. Bruzelius, Olle Eriksson","doi":"10.1145/3483793","DOIUrl":"https://doi.org/10.1145/3483793","url":null,"abstract":"Driving simulators are established tools used during automotive development and research. Most simulators use either monitors or projectors as their primary display system. However, the emergence of a new generation of head-mounted displays has triggered interest in using these as the primary display type. The general benefits and drawbacks of head-mounted displays are well researched, but their effect on driving behavior in a simulator has not been sufficiently quantified.\u0000 This article presents a study of driving behavior differences between projector-based graphics and head-mounted display in a large dynamic driving simulator. This study has selected five specific driving maneuvers suspected of affecting driving behavior differently depending on the choice of display technology. Some of these maneuvers were chosen to reveal changes in lateral and longitudinal driving behavior. Others were picked for their ability to highlight the benefits and drawbacks of head-mounted displays in a driving context.\u0000 The results show minor changes in lateral and longitudinal driver behavior changes when comparing projectors and a head-mounted display. The most noticeable difference in favor of projectors was seen when the display resolution is critical to the driving task. The choice of display type did not affect simulator sickness nor the realism rated by the subjects.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"21 1","pages":"4:1-4:18"},"PeriodicalIF":1.6,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91026721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anca Salagean, Jacob Hadnett-Hunter, Daniel J. Finnegan, A. A. Sousa, M. Proulx
Ultrasonic mid-air haptic technologies, which provide haptic feedback through airwaves produced using ultrasound, could be employed to investigate the sense of body ownership and immersion in virtual reality (VR) by inducing the virtual hand illusion (VHI). Ultrasonic mid-air haptic perception has solely been investigated for glabrous (hairless) skin, which has higher tactile sensitivity than hairy skin. In contrast, the VHI paradigm typically targets hairy skin without comparisons to glabrous skin. The aim of this article was to investigate illusory body ownership, the applicability of ultrasonic mid-air haptics, and perceived immersion in VR using the VHI. Fifty participants viewed a virtual hand being stroked by a feather synchronously and asynchronously with the ultrasonic stimulation applied to the glabrous skin on the palmar surface and the hairy skin on the dorsal surface of their hands. Questionnaire responses revealed that synchronous stimulation induced a stronger VHI than asynchronous stimulation. In synchronous conditions, the VHI was stronger for palmar stimulation than dorsal stimulation. The ultrasonic stimulation was also perceived as more intense on the palmar surface compared to the dorsal surface. Perceived immersion was not related to illusory body ownership per se but was enhanced by the provision of synchronous stimulation.
{"title":"A Virtual Reality Application of the Rubber Hand Illusion Induced by Ultrasonic Mid-air Haptic Stimulation","authors":"Anca Salagean, Jacob Hadnett-Hunter, Daniel J. Finnegan, A. A. Sousa, M. Proulx","doi":"10.1145/3487563","DOIUrl":"https://doi.org/10.1145/3487563","url":null,"abstract":"Ultrasonic mid-air haptic technologies, which provide haptic feedback through airwaves produced using ultrasound, could be employed to investigate the sense of body ownership and immersion in virtual reality (VR) by inducing the virtual hand illusion (VHI). Ultrasonic mid-air haptic perception has solely been investigated for glabrous (hairless) skin, which has higher tactile sensitivity than hairy skin. In contrast, the VHI paradigm typically targets hairy skin without comparisons to glabrous skin. The aim of this article was to investigate illusory body ownership, the applicability of ultrasonic mid-air haptics, and perceived immersion in VR using the VHI. Fifty participants viewed a virtual hand being stroked by a feather synchronously and asynchronously with the ultrasonic stimulation applied to the glabrous skin on the palmar surface and the hairy skin on the dorsal surface of their hands. Questionnaire responses revealed that synchronous stimulation induced a stronger VHI than asynchronous stimulation. In synchronous conditions, the VHI was stronger for palmar stimulation than dorsal stimulation. The ultrasonic stimulation was also perceived as more intense on the palmar surface compared to the dorsal surface. Perceived immersion was not related to illusory body ownership per se but was enhanced by the provision of synchronous stimulation.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"31 1","pages":"3:1-3:19"},"PeriodicalIF":1.6,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76230536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanyu Liu, Michelle Agnes Magalhaes, W. Mackay, M. Beaudouin-Lafon, Frédéric Bevilacqua
With the increasing interest in movement sonification and expressive gesture-based interaction, it is important to understand which factors contribute to movement learning and how. We explore the effects of movement sonification and users’ musical background on motor variability in complex gesture learning. We contribute an empirical study in which musicians and non-musicians learn two gesture sequences over three days, with and without movement sonification. Results show the interlaced interaction effects of these factors and how they unfold in the three-day learning process. For gesture 1, which is fast and dynamic with a direct “action-sound” sonification, movement sonification induces higher variability for both musicians and non-musicians on day 1. While musicians reduce this variability to a similar level as no auditory feedback condition on day 2 and day 3, non-musicians remain to have significantly higher variability. Across three days, musicians also have significantly lower variability than non-musicians. For gesture 2, which is slow and smooth with an “action-music” metaphor, there are virtually no effects. Based on these findings, we recommend future studies to take into account participants’ musical background, consider longitudinal study to examine these effects on complex gestures, and use awareness when interpreting the results given a specific design of gesture and sound.
{"title":"Motor Variability in Complex Gesture Learning: Effects of Movement Sonification and Musical Background","authors":"Wanyu Liu, Michelle Agnes Magalhaes, W. Mackay, M. Beaudouin-Lafon, Frédéric Bevilacqua","doi":"10.1145/3482967","DOIUrl":"https://doi.org/10.1145/3482967","url":null,"abstract":"With the increasing interest in movement sonification and expressive gesture-based interaction, it is important to understand which factors contribute to movement learning and how. We explore the effects of movement sonification and users’ musical background on motor variability in complex gesture learning. We contribute an empirical study in which musicians and non-musicians learn two gesture sequences over three days, with and without movement sonification. Results show the interlaced interaction effects of these factors and how they unfold in the three-day learning process. For gesture 1, which is fast and dynamic with a direct “action-sound” sonification, movement sonification induces higher variability for both musicians and non-musicians on day 1. While musicians reduce this variability to a similar level as no auditory feedback condition on day 2 and day 3, non-musicians remain to have significantly higher variability. Across three days, musicians also have significantly lower variability than non-musicians. For gesture 2, which is slow and smooth with an “action-music” metaphor, there are virtually no effects. Based on these findings, we recommend future studies to take into account participants’ musical background, consider longitudinal study to examine these effects on complex gestures, and use awareness when interpreting the results given a specific design of gesture and sound.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"6 1","pages":"2:1-2:21"},"PeriodicalIF":1.6,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85414360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laban Movement Analysis (LMA) and its Effort element provide a conceptual framework through which we can observe, describe, and interpret the intention of movement. Effort attributes provide a link between how people move and how their movement communicates to others. It is crucial to investigate the perceptual characteristics of Effort to validate whether it can serve as an effective framework to support a wide range of applications in animation and robotics that require a system for creating or perceiving expressive variation in motion. To this end, we first constructed an Effort motion database of short video clips of five different motions: walk, sit down, pass, put, wave performed in eight ways corresponding to the extremes of the Effort elements. We then performed a perceptual evaluation to examine the perceptual consistency and perceived associations among Effort elements: Space (Indirect/Direct), Time (Sustained/Sudden), Weight (Light/Strong), and Flow (Free/Bound) that appeared in the motion stimuli. The results of the perceptual consistency evaluation indicate that although the observers do not perceive the LMA Effort element 100% as intended, true response rates of seven Effort elements are higher than false response rates except for light Effort. The perceptual consistency results showed varying tendencies by motion. The perceptual association between LMA Effort elements showed that a single LMA Effort element tends to co-occur with the elements of other factors, showing significant correlation with one or two factors (e.g., indirect and free, light and free).
{"title":"The Perceptual Consistency and Association of the LMA Effort Elements","authors":"Hyejin Kim, Michael Neff, Sung-Hee Lee","doi":"10.1145/3473041","DOIUrl":"https://doi.org/10.1145/3473041","url":null,"abstract":"\u0000 Laban Movement Analysis (LMA) and its Effort element provide a conceptual framework through which we can observe, describe, and interpret the intention of movement. Effort attributes provide a link between how people move and how their movement communicates to others. It is crucial to investigate the perceptual characteristics of Effort to validate whether it can serve as an effective framework to support a wide range of applications in animation and robotics that require a system for creating or perceiving expressive variation in motion. To this end, we first constructed an Effort motion database of short video clips of five different motions:\u0000 walk, sit down, pass, put, wave\u0000 performed in eight ways corresponding to the extremes of the Effort elements. We then performed a perceptual evaluation to examine the perceptual\u0000 consistency\u0000 and perceived\u0000 associations\u0000 among Effort elements:\u0000 Space (Indirect/Direct), Time (Sustained/Sudden), Weight (Light/Strong),\u0000 and\u0000 Flow (Free/Bound)\u0000 that appeared in the motion stimuli. The results of the perceptual consistency evaluation indicate that although the observers do not perceive the LMA Effort element 100% as intended, true response rates of seven Effort elements are higher than false response rates except for\u0000 light\u0000 Effort. The perceptual consistency results showed varying tendencies by motion. The perceptual association between LMA Effort elements showed that a single LMA Effort element tends to co-occur with the elements of other factors, showing significant correlation with one or two factors (e.g., indirect and free, light and free).\u0000","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"74 1","pages":"1:1-1:17"},"PeriodicalIF":1.6,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91300781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Issue on SAP 2021","authors":"Eakta Jain, A. Olivier","doi":"10.1145/3486577","DOIUrl":"https://doi.org/10.1145/3486577","url":null,"abstract":"","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"2013 1","pages":"18:1-18:2"},"PeriodicalIF":1.6,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86294913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Adkins, Lorraine Lin, Aline Normoyle, Ryan Canales, Yuting Ye, S. Jörg
A primary goal of the Virtual Reality ( VR ) community is to build fully immersive and presence-inducing environments with seamless and natural interactions. To reach this goal, researchers are investigating how to best directly use our hands to interact with a virtual environment using hand tracking. Most studies in this field require participants to perform repetitive tasks. In this article, we investigate if results of such studies translate into a real application and game-like experience. We designed a virtual escape room in which participants interact with various objects to gather clues and complete puzzles. In a between-subjects study, we examine the effects of two input modalities (controllers vs. hand tracking) and two grasping visualizations (continuously tracked hands vs. virtual hands that disappear when grasping) on ownership, realism, efficiency, enjoyment, and presence. Our results show that ownership, realism, enjoyment, and presence increased when using hand tracking compared to controllers. Visualizing the tracked hands during grasps leads to higher ratings in one of our ownership questions and one of our enjoyment questions compared to having the virtual hands disappear during grasps as is common in many applications. We also confirm some of the main results of two studies that have a repetitive design in a more realistic gaming scenario that might be closer to a typical user experience.
{"title":"Evaluating Grasping Visualizations and Control Modes in a VR Game","authors":"A. Adkins, Lorraine Lin, Aline Normoyle, Ryan Canales, Yuting Ye, S. Jörg","doi":"10.1145/3486582","DOIUrl":"https://doi.org/10.1145/3486582","url":null,"abstract":"\u0000 A primary goal of the\u0000 Virtual Reality\u0000 (\u0000 VR\u0000 ) community is to build fully immersive and presence-inducing environments with seamless and natural interactions. To reach this goal, researchers are investigating how to best directly use our hands to interact with a virtual environment using hand tracking. Most studies in this field require participants to perform repetitive tasks. In this article, we investigate if results of such studies translate into a real application and game-like experience. We designed a virtual escape room in which participants interact with various objects to gather clues and complete puzzles. In a between-subjects study, we examine the effects of two input modalities (controllers vs. hand tracking) and two grasping visualizations (continuously tracked hands vs. virtual hands that disappear when grasping) on ownership, realism, efficiency, enjoyment, and presence.\u0000 \u0000 Our results show that ownership, realism, enjoyment, and presence increased when using hand tracking compared to controllers. Visualizing the tracked hands during grasps leads to higher ratings in one of our ownership questions and one of our enjoyment questions compared to having the virtual hands disappear during grasps as is common in many applications. We also confirm some of the main results of two studies that have a repetitive design in a more realistic gaming scenario that might be closer to a typical user experience.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"18 1","pages":"19:1-19:14"},"PeriodicalIF":1.6,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81438789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Previous perceptual studies on human faces have shown that specific facial features have consistent effects on perceived personality and appeal, but it remains unclear if and how findings relate to perception of virtual characters. For example, wider human faces have been found to appear more aggressive and dominant, whereas studies on virtual characters have shown opposite trends but have suffered from significant eeriness of exaggerated features. In this study, we use highly realistic virtual faces obtained from 3D scanning, as well as cartoon-rendered counterparts retaining facial proportions. We assess the effects of facial width and eye size on perceptions of appeal, trustworthiness, aggressiveness, dominance, and eeriness. Our manipulations did not affect eeriness, and we find the same perceptual trends previously reported for human faces.
{"title":"Facial Feature Manipulation for Trait Portrayal in Realistic and Cartoon-Rendered Characters","authors":"Ylva Ferstl, Michael McKay, R. Mcdonnell","doi":"10.1145/3486579","DOIUrl":"https://doi.org/10.1145/3486579","url":null,"abstract":"Previous perceptual studies on human faces have shown that specific facial features have consistent effects on perceived personality and appeal, but it remains unclear if and how findings relate to perception of virtual characters. For example, wider human faces have been found to appear more aggressive and dominant, whereas studies on virtual characters have shown opposite trends but have suffered from significant eeriness of exaggerated features. In this study, we use highly realistic virtual faces obtained from 3D scanning, as well as cartoon-rendered counterparts retaining facial proportions. We assess the effects of facial width and eye size on perceptions of appeal, trustworthiness, aggressiveness, dominance, and eeriness. Our manipulations did not affect eeriness, and we find the same perceptual trends previously reported for human faces.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"50 1","pages":"22:1-22:8"},"PeriodicalIF":1.6,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86614626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual reality ( VR ) displays have factors such as vergence-accommodation conflicts that negatively impact depth perception and cause users to misjudge distances to select objects. In addition, popular large-screen immersive displays present the depth of any target rendered through screen parallax information of points, which are encapsulated within stereoscopic voxels that are a distinct unit of space dictating how far an object is placed in front of or behind the screen. As they emanate from the viewers’ eyes (left and right center of projection), the density of voxels is higher in front of the screen (in regions of negative screen parallax) than it is behind the screen (in regions of positive screen parallax), implying a higher spatial resolution of depth in front of the screen than behind the screen. Our experiment implements a near-field fine-motor pick-and-place task in which users pick up a ring and place it around a targeted peg. The targets are arranged in a linear configuration of 3, 5, and 7 pegs along the front-and-back axis with the center peg placed in the same depth as the screen. We use this to evaluate how users manipulate objects in positive versus negative screen parallax space by the metrics of efficiency, accuracy, and economy of movement. In addition, we evaluate how users’ performance is moderated by haptic feedback and mismatch between visual and proprioceptive information. Our results reveal that users perform more efficiently in negative screen parallax space and that haptic feedback and visuo-proprioceptive mismatch have effects on placement efficiency. The implications of these findings are described in the later sections of the article.
{"title":"An Evaluation of Screen Parallax, Haptic Feedback, and Sensory-Motor Mismatch on Near-Field Perception-Action Coordination in VR","authors":"David Brickler, Sabarish V. Babu","doi":"10.1145/3486583","DOIUrl":"https://doi.org/10.1145/3486583","url":null,"abstract":"\u0000 Virtual reality\u0000 (\u0000 VR\u0000 ) displays have factors such as vergence-accommodation conflicts that negatively impact depth perception and cause users to misjudge distances to select objects. In addition, popular large-screen immersive displays present the depth of any target rendered through screen parallax information of points, which are encapsulated within stereoscopic voxels that are a distinct unit of space dictating how far an object is placed in front of or behind the screen. As they emanate from the viewers’ eyes (left and right center of projection), the density of voxels is higher in front of the screen (in regions of negative screen parallax) than it is behind the screen (in regions of positive screen parallax), implying a higher spatial resolution of depth in front of the screen than behind the screen. Our experiment implements a near-field fine-motor pick-and-place task in which users pick up a ring and place it around a targeted peg. The targets are arranged in a linear configuration of 3, 5, and 7 pegs along the front-and-back axis with the center peg placed in the same depth as the screen. We use this to evaluate how users manipulate objects in positive versus negative screen parallax space by the metrics of efficiency, accuracy, and economy of movement. In addition, we evaluate how users’ performance is moderated by haptic feedback and mismatch between visual and proprioceptive information. Our results reveal that users perform more efficiently in negative screen parallax space and that haptic feedback and visuo-proprioceptive mismatch have effects on placement efficiency. The implications of these findings are described in the later sections of the article.\u0000","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"35 1","pages":"20:1-20:16"},"PeriodicalIF":1.6,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74655159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Ehret, A. Bönsch, Lukas Aspöck, Christine T. Röhr, S. Baumann, M. Grice, J. Fels, T. Kuhlen
For conversational agents’ speech, either all possible sentences have to be prerecorded by voice actors or the required utterances can be synthesized. While synthesizing speech is more flexible and economic in production, it also potentially reduces the perceived naturalness of the agents among others due to mistakes at various linguistic levels. In our article, we are interested in the impact of adequate and inadequate prosody, here particularly in terms of accent placement, on the perceived naturalness and aliveness of the agents. We compare (1) inadequate prosody, as generated by off-the-shelf text-to-speech (TTS) engines with synthetic output; (2) the same inadequate prosody imitated by trained human speakers; and (3) adequate prosody produced by those speakers. The speech was presented either as audio-only or by embodied, anthropomorphic agents, to investigate the potential masking effect by a simultaneous visual representation of those virtual agents. To this end, we conducted an online study with 40 participants listening to four different dialogues each presented in the three Speech levels and the two Embodiment levels. Results confirmed that adequate prosody in human speech is perceived as more natural (and the agents are perceived as more alive) than inadequate prosody in both human (2) and synthetic speech (1). Thus, it is not sufficient to just use a human voice for an agents’ speech to be perceived as natural—it is decisive whether the prosodic realisation is adequate or not. Furthermore, and surprisingly, we found no masking effect by speaker embodiment, since neither a human voice with inadequate prosody nor a synthetic voice was judged as more natural, when a virtual agent was visible compared to the audio-only condition. On the contrary, the human voice was even judged as less “alive” when accompanied by a virtual agent. In sum, our results emphasize, on the one hand, the importance of adequate prosody for perceived naturalness, especially in terms of accents being placed on important words in the phrase, while showing, on the other hand, that the embodiment of virtual agents plays a minor role in the naturalness ratings of voices.
{"title":"Do Prosody and Embodiment Influence the Perceived Naturalness of Conversational Agents' Speech?","authors":"Jonathan Ehret, A. Bönsch, Lukas Aspöck, Christine T. Röhr, S. Baumann, M. Grice, J. Fels, T. Kuhlen","doi":"10.1145/3486580","DOIUrl":"https://doi.org/10.1145/3486580","url":null,"abstract":"\u0000 For conversational agents’ speech, either all possible sentences have to be prerecorded by voice actors or the required utterances can be synthesized. While synthesizing speech is more flexible and economic in production, it also potentially reduces the perceived naturalness of the agents among others due to mistakes at various linguistic levels. In our article, we are interested in the impact of adequate and inadequate prosody, here particularly in terms of accent placement, on the perceived naturalness and aliveness of the agents. We compare (1) inadequate prosody, as generated by off-the-shelf text-to-speech (TTS) engines with synthetic output; (2) the same inadequate prosody imitated by trained human speakers; and (3) adequate prosody produced by those speakers. The speech was presented either as audio-only or by embodied, anthropomorphic agents, to investigate the potential masking effect by a simultaneous visual representation of those virtual agents. To this end, we conducted an online study with 40 participants listening to four different dialogues each presented in the three\u0000 Speech\u0000 levels and the two\u0000 Embodiment\u0000 levels. Results confirmed that adequate prosody in human speech is perceived as more natural (and the agents are perceived as more alive) than inadequate prosody in both human (2) and synthetic speech (1). Thus, it is not sufficient to just use a human voice for an agents’ speech to be perceived as natural—it is decisive whether the\u0000 prosodic realisation\u0000 is adequate or not. Furthermore, and surprisingly, we found no masking effect by speaker embodiment, since neither a human voice with inadequate prosody nor a synthetic voice was judged as more natural, when a virtual agent was visible compared to the audio-only condition. On the contrary, the human voice was even judged as less “alive” when accompanied by a virtual agent. In sum, our results emphasize, on the one hand, the importance of adequate prosody for perceived naturalness, especially in terms of accents being placed on important words in the phrase, while showing, on the other hand, that the embodiment of virtual agents plays a minor role in the naturalness ratings of voices.\u0000","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"50 1","pages":"21:1-21:15"},"PeriodicalIF":1.6,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79129288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}