In this article, we present a content-aware method for generating a word painting. Word painting is a composite artwork made from the assemblage of words extracted from a given text, which carries similar semantics and visual features to a given source image. However, word painting, usually created by skilled artists, involves tedious manual processes, especially when generating streamlines and laying out text. Hence, we provide an easy method to create word paintings for users. How to design textural layout that simultaneously conveys the input image and enables easy access to the semantic theme is the key challenge to generating a visually pleasing word painting. To address this issue, given an image and its content-related text, we first decompose the input image into several regions and approximate each region with a smooth vector field. At the same time, by analyzing the input text, we extract some weighted keywords as the graphic elements. Then, to measure the likelihood of positions in the input image that attract the observers’ attention, we generate a saliency map with our trained visual attention model. Finally, jointly considering visual attention and aesthetic rules, we propose an energy-based optimization framework to arrange extracted keywords into the decomposed regions and synthesize a word painting. Experimental results and user studies show that this method is able to generate a fashionable and appealing word painting.
{"title":"Creating Word Paintings Jointly Considering Semantics, Attention, and Aesthetics","authors":"Junsong Zhang, Zuyi Yang, Linchengyu Jin, Zhitang Lu, Jinhui Yu","doi":"https://dl.acm.org/doi/10.1145/3539610","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3539610","url":null,"abstract":"<p>In this article, we present a content-aware method for generating a word painting. Word painting is a composite artwork made from the assemblage of words extracted from a given text, which carries similar semantics and visual features to a given source image. However, word painting, usually created by skilled artists, involves tedious manual processes, especially when generating streamlines and laying out text. Hence, we provide an easy method to create word paintings for users. How to design textural layout that simultaneously conveys the input image and enables easy access to the semantic theme is the key challenge to generating a visually pleasing word painting. To address this issue, given an image and its content-related text, we first decompose the input image into several regions and approximate each region with a smooth vector field. At the same time, by analyzing the input text, we extract some weighted keywords as the graphic elements. Then, to measure the likelihood of positions in the input image that attract the observers’ attention, we generate a saliency map with our trained visual attention model. Finally, jointly considering visual attention and aesthetic rules, we propose an energy-based optimization framework to arrange extracted keywords into the decomposed regions and synthesize a word painting. Experimental results and user studies show that this method is able to generate a fashionable and appealing word painting.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"54 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article presents an investigation into the perceptual vibrotactile thresholds for a range of frequencies on both the inside and outside areas of the wrist when exciting the skin with parallel vibrations, realized using the L5 actuator made by Lofelt GmbH. The vibrotactile threshold of 30 participants was measured using a modified audiometry test for the frequency range of 25–1,000 Hz. The average threshold across the respective frequencies was then ultimately determined from acceleration minima. The results show that maximum sensitivity lies in the range of 100–275 Hz (peaking at 200 Hz) for the inside and 75–250 Hz (peaking at 125 Hz) for the outside of the wrist and that thresholds are overall higher for the hairy skin on the outside of the wrist than for the glabrous skin on the inside. The results also show that the vibrotactile thresholds varied highly between individuals. Hence, personalized threshold measurements at the actuator locations will be required to fine-tune a device for the user. This study is a part of an ongoing research and development project where the aim is to develop a tactile display device and a music encoding scheme with the purpose of augmenting the musical enjoyment of cochlear implant recipients. These results, along with results from planned follow-up experiments, will be used to determine the appropriate frequency range and to cast light on the dynamic range on offer for the tactile device.
{"title":"Vibrotactile Threshold Measurements at the Wrist Using Parallel Vibration Actuators","authors":"Elvar Atli Ævarsson, Thórhildur Ásgeirsdóttir, Finnur Pind, Árni Kristjánsson, Runar Unnthorsson","doi":"https://dl.acm.org/doi/10.1145/3529259","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3529259","url":null,"abstract":"<p>This article presents an investigation into the perceptual vibrotactile thresholds for a range of frequencies on both the inside and outside areas of the wrist when exciting the skin with parallel vibrations, realized using the L5 actuator made by Lofelt GmbH. The vibrotactile threshold of 30 participants was measured using a modified audiometry test for the frequency range of 25–1,000 Hz. The average threshold across the respective frequencies was then ultimately determined from acceleration minima. The results show that maximum sensitivity lies in the range of 100–275 Hz (peaking at 200 Hz) for the inside and 75–250 Hz (peaking at 125 Hz) for the outside of the wrist and that thresholds are overall higher for the hairy skin on the outside of the wrist than for the glabrous skin on the inside. The results also show that the vibrotactile thresholds varied highly between individuals. Hence, personalized threshold measurements at the actuator locations will be required to fine-tune a device for the user. This study is a part of an ongoing research and development project where the aim is to develop a tactile display device and a music encoding scheme with the purpose of augmenting the musical enjoyment of cochlear implant recipients. These results, along with results from planned follow-up experiments, will be used to determine the appropriate frequency range and to cast light on the dynamic range on offer for the tactile device.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"53 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial auditory cues are important for many tasks in immersive virtual environments, especially guidance tasks. However, due to the limited fidelity of spatial sounds rendered by generic Head-Related Transfer Functions (HRTFs), sound localization usually has a limited accuracy, especially in elevation, which can potentially impact the effectiveness of auditory guidance. To address this issue, we explored whether integrating sonification with spatial audio can enhance the perceptions of auditory guidance cues so user performance in auditory guidance tasks can be improved. Specifically, we investigated the effects of sonification mapping strategy using a controlled experiment that compared four elevation sonification mapping strategies: absolute elevation mapping, unsigned relative elevation mapping, signed relative elevation mapping, and binary relative elevation mapping. In addition, we examined whether azimuth sonification mapping can further benefit the perception of spatial sounds. The results demonstrate that spatial auditory cues can be effectively enhanced by integrating elevation and azimuth sonification, where the accuracy and speed of guidance tasks can be significantly improved. In particular, the overall results suggest that binary relative elevation mapping is generally the most effective strategy among four elevation sonification mapping strategies, which indicates that auditory cues with clear directional information are key to efficient auditory guidance.
{"title":"Exploring Sonification Mapping Strategies for Spatial Auditory Guidance in Immersive Virtual Environments","authors":"Zihan Gao, Huiqiang Wang, Guangsheng Feng, Hongwu Lv","doi":"https://dl.acm.org/doi/10.1145/3528171","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3528171","url":null,"abstract":"<p>Spatial auditory cues are important for many tasks in immersive virtual environments, especially guidance tasks. However, due to the limited fidelity of spatial sounds rendered by generic Head-Related Transfer Functions (HRTFs), sound localization usually has a limited accuracy, especially in elevation, which can potentially impact the effectiveness of auditory guidance. To address this issue, we explored whether integrating sonification with spatial audio can enhance the perceptions of auditory guidance cues so user performance in auditory guidance tasks can be improved. Specifically, we investigated the effects of sonification mapping strategy using a controlled experiment that compared four elevation sonification mapping strategies: absolute elevation mapping, unsigned relative elevation mapping, signed relative elevation mapping, and binary relative elevation mapping. In addition, we examined whether azimuth sonification mapping can further benefit the perception of spatial sounds. The results demonstrate that spatial auditory cues can be effectively enhanced by integrating elevation and azimuth sonification, where the accuracy and speed of guidance tasks can be significantly improved. In particular, the overall results suggest that binary relative elevation mapping is generally the most effective strategy among four elevation sonification mapping strategies, which indicates that auditory cues with clear directional information are key to efficient auditory guidance.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"52 4","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-02DOI: https://dl.acm.org/doi/10.1145/3534966
Jun Su, Peng Zhou
Human emotion is one of the most complex psychophysiological phenomena and has been reported to be affected significantly by music listening. It is supposed that there is an intrinsic relationship between human emotion and music, which can be modeled and predicted quantitatively in a supervised manner. Here, a heuristic clustering analysis is carried out on large-scale free music archive to derive a genre-diverse music library, to which the emotional response of participants is measured using a standard protocol, consequently resulting in a systematic emotion-to-music profile. Eight machine learning methods are employed to statistically correlate the basic sound features of music audio tracks in the library with the measured emotional response of tested people to the music tracks in a training set and to blindly predict the emotional response from sound features in a test set.
This study found that nonlinear methods are more robust and predictable but considerably more time-consuming than linear approaches. The neural networks have strong internal fittability but are associated with a significant overfitting issue. The support vector machine and Gaussian process exhibit both high internal stability and satisfactory external predictability in all used methods; they are considered as promising tools to model, predict, and explain the intrinsic relationship between human emotion and music. The psychological basis and perceptional implication underlying the built machine learning models are also discussed to find out the key music factors that affect human emotion.
{"title":"Machine Learning–based Modeling and Prediction of the Intrinsic Relationship between Human Emotion and Music","authors":"Jun Su, Peng Zhou","doi":"https://dl.acm.org/doi/10.1145/3534966","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3534966","url":null,"abstract":"<p>Human emotion is one of the most complex psychophysiological phenomena and has been reported to be affected significantly by music listening. It is supposed that there is an intrinsic relationship between human emotion and music, which can be modeled and predicted quantitatively in a supervised manner. Here, a heuristic clustering analysis is carried out on large-scale free music archive to derive a genre-diverse music library, to which the emotional response of participants is measured using a standard protocol, consequently resulting in a systematic emotion-to-music profile. Eight machine learning methods are employed to statistically correlate the basic sound features of music audio tracks in the library with the measured emotional response of tested people to the music tracks in a training set and to blindly predict the emotional response from sound features in a test set.</p><p>This study found that nonlinear methods are more robust and predictable but considerably more time-consuming than linear approaches. The neural networks have strong internal fittability but are associated with a significant overfitting issue. The support vector machine and Gaussian process exhibit both high internal stability and satisfactory external predictability in all used methods; they are considered as promising tools to model, predict, and explain the intrinsic relationship between human emotion and music. The psychological basis and perceptional implication underlying the built machine learning models are also discussed to find out the key music factors that affect human emotion.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"53 4","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-02DOI: https://dl.acm.org/doi/10.1145/3531526
Joshua J. Scott, Neil A. Dodgson
We report two studies that investigate the use of subjective believability in the assessment of objective realism of terrain. The first demonstrates that there is a clear subjective feature bias that depends on the types of terrain being evaluated: Our participants found certain natural terrains to be more believable than others. This confounding factor means that any comparison experiment must not ask participants to compare terrains with different types of features. Our second experiment assesses four methods of example-based terrain synthesis, comparing them against each other and against real terrain. Our results show that, while all tested methods can produce terrain that is indistinguishable from reality, all also can produce poor terrain; that there is no one method that is consistently better than the others; and that those who have professional expertise in geology, cartography, or image analysis are better able to distinguish real terrain from synthesized terrain than the general population, but those who have professional expertise in the visual arts are not.
{"title":"Evaluating Realism in Example-based Terrain Synthesis","authors":"Joshua J. Scott, Neil A. Dodgson","doi":"https://dl.acm.org/doi/10.1145/3531526","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3531526","url":null,"abstract":"<p>We report two studies that investigate the use of subjective believability in the assessment of objective realism of terrain. The first demonstrates that there is a clear subjective feature bias that depends on the types of terrain being evaluated: Our participants found certain natural terrains to be more believable than others. This confounding factor means that any comparison experiment must not ask participants to compare terrains with different types of features. Our second experiment assesses four methods of example-based terrain synthesis, comparing them against each other and against real terrain. Our results show that, while all tested methods can produce terrain that is indistinguishable from reality, all also can produce poor terrain; that there is no one method that is consistently better than the others; and that those who have professional expertise in geology, cartography, or image analysis are better able to distinguish real terrain from synthesized terrain than the general population, but those who have professional expertise in the visual arts are not.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"53 2","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roshan Venkatakrishnan, R. Venkatakrishnan, Chih-Han Chung, Yu-Shuen Wang, Sabarish V. Babu
Humans communicate by writing, often taking notes that assist thinking. With the growing popularity of collaborative Virtual Reality (VR) applications, it is imperative that we better understand aspects that affect writing in these virtual experiences. On-air writing in VR is a popular writing paradigm due to its simplicity in implementation without any explicit needs for specialized hardware. A host of factors can affect the efficacy of this writing paradigm and in this work, we delved into investigating the same. Along these lines, we investigated the effects of a combination of factors on users’ on-air writing performance, aiming to understand the circumstances under which users can both effectively and efficiently write in VR. We were interested in studying the effects of the following factors: (1) input modality: brush vs. near-field raycast vs. pointing gesture, (2) inking trigger method: haptic feedback vs. button based trigger, and (3) canvas geometry: plane vs. hemisphere. To evaluate the writing performance, we conducted an empirical evaluation with thirty participants, requiring them to write the words we indicated under different combinations of these factors. Dependent measures including the writing speed, accuracy rates, perceived workloads, and so on, were analyzed. Results revealed that the brush based input modality produced the best results in writing performance, that haptic feedback was not always effective over button based triggering, and that there are trade-offs associated with the different types of canvas geometries used. This work attempts at laying a foundation for future investigations that seek to understand and further improve the on-air writing experience in immersive virtual environments.
{"title":"Investigating a Combination of Input Modalities, Canvas Geometries, and Inking Triggers on On-Air Handwriting in Virtual Reality","authors":"Roshan Venkatakrishnan, R. Venkatakrishnan, Chih-Han Chung, Yu-Shuen Wang, Sabarish V. Babu","doi":"10.1145/3560817","DOIUrl":"https://doi.org/10.1145/3560817","url":null,"abstract":"Humans communicate by writing, often taking notes that assist thinking. With the growing popularity of collaborative Virtual Reality (VR) applications, it is imperative that we better understand aspects that affect writing in these virtual experiences. On-air writing in VR is a popular writing paradigm due to its simplicity in implementation without any explicit needs for specialized hardware. A host of factors can affect the efficacy of this writing paradigm and in this work, we delved into investigating the same. Along these lines, we investigated the effects of a combination of factors on users’ on-air writing performance, aiming to understand the circumstances under which users can both effectively and efficiently write in VR. We were interested in studying the effects of the following factors: (1) input modality: brush vs. near-field raycast vs. pointing gesture, (2) inking trigger method: haptic feedback vs. button based trigger, and (3) canvas geometry: plane vs. hemisphere. To evaluate the writing performance, we conducted an empirical evaluation with thirty participants, requiring them to write the words we indicated under different combinations of these factors. Dependent measures including the writing speed, accuracy rates, perceived workloads, and so on, were analyzed. Results revealed that the brush based input modality produced the best results in writing performance, that haptic feedback was not always effective over button based triggering, and that there are trade-offs associated with the different types of canvas geometries used. This work attempts at laying a foundation for future investigations that seek to understand and further improve the on-air writing experience in immersive virtual environments.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"19 1","pages":"1 - 19"},"PeriodicalIF":1.6,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45223407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Near-eye display systems for augmented reality (AR) aim to seamlessly merge virtual content with the user’s view of the real-world. A substantial limitation of current systems is that they only present virtual content over a limited portion of the user’s natural field of view (FOV). This limitation reduces the immersion and utility of these systems. Thus, it is essential to quantify FOV coverage in AR systems and understand how to maximize it. It is straightforward to determine the FOV coverage for monocular AR systems based on the system architecture. However, stereoscopic AR systems that present 3D virtual content create a more complicated scenario because the two eyes’ views do not always completely overlap. The introduction of partial binocular overlap in stereoscopic systems can potentially expand the perceived horizontal FOV coverage, but it can also introduce perceptual nonuniformity artifacts. In this arrticle, we first review the principles of binocular FOV overlap for natural vision and for stereoscopic display systems. We report the results of a set of perceptual studies that examine how different amounts and types of horizontal binocular overlap in stereoscopic AR systems influence the perception of nonuniformity across the FOV. We then describe how to quantify the horizontal FOV in stereoscopic AR when taking 3D content into account. We show that all stereoscopic AR systems result in a variable horizontal FOV coverage and variable amounts of binocular overlap depending on fixation distance. Taken together, these results provide a framework for optimizing perceived FOV coverage and minimizing perceptual artifacts in stereoscopic AR systems for different use cases.
{"title":"Perceptual Guidelines for Optimizing Field of View in Stereoscopic Augmented Reality Displays","authors":"Minqi Wang, Emily A. Cooper","doi":"10.1145/3554921","DOIUrl":"https://doi.org/10.1145/3554921","url":null,"abstract":"Near-eye display systems for augmented reality (AR) aim to seamlessly merge virtual content with the user’s view of the real-world. A substantial limitation of current systems is that they only present virtual content over a limited portion of the user’s natural field of view (FOV). This limitation reduces the immersion and utility of these systems. Thus, it is essential to quantify FOV coverage in AR systems and understand how to maximize it. It is straightforward to determine the FOV coverage for monocular AR systems based on the system architecture. However, stereoscopic AR systems that present 3D virtual content create a more complicated scenario because the two eyes’ views do not always completely overlap. The introduction of partial binocular overlap in stereoscopic systems can potentially expand the perceived horizontal FOV coverage, but it can also introduce perceptual nonuniformity artifacts. In this arrticle, we first review the principles of binocular FOV overlap for natural vision and for stereoscopic display systems. We report the results of a set of perceptual studies that examine how different amounts and types of horizontal binocular overlap in stereoscopic AR systems influence the perception of nonuniformity across the FOV. We then describe how to quantify the horizontal FOV in stereoscopic AR when taking 3D content into account. We show that all stereoscopic AR systems result in a variable horizontal FOV coverage and variable amounts of binocular overlap depending on fixation distance. Taken together, these results provide a framework for optimizing perceived FOV coverage and minimizing perceptual artifacts in stereoscopic AR systems for different use cases.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"19 1","pages":"1 - 23"},"PeriodicalIF":1.6,"publicationDate":"2022-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48370733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: https://dl.acm.org/doi/full/10.1145/3514244
Suren Deepak Rajasekaran, Hao Kang, Martin Čadík, Eric Galin, Eric Guérin, Adrien Peytavie, Pavel Slavík, Bedrich Benes
Terrains are visually prominent and commonly needed objects in many computer graphics applications. While there are many algorithms for synthetic terrain generation, it is rather difficult to assess the realism of a generated output. This article presents a first step toward the direction of perceptual evaluation for terrain models. We gathered and categorized several classes of real terrains, and we generated synthetic terrain models using computer graphics methods. The terrain geometries were rendered by using the same texturing, lighting, and camera position. Two studies on these image sets were conducted, ranking the terrains perceptually, and showing that the synthetic terrains are perceived as lacking realism compared to the real ones. We provide insight into the features that affect the perceived realism by a quantitative evaluation based on localized geomorphology-based landform features (geomorphons) that categorize terrain structures such as valleys, ridges, hollows, and so forth. We show that the presence or absence of certain features has a significant perceptual effect. The importance and presence of the terrain features were confirmed by using a generative deep neural network that transferred the features between the geometric models of the real terrains and the synthetic ones. The feature transfer was followed by another perceptual experiment that further showed their importance and effect on perceived realism. We then introduce Perceived Terrain Realism Metrics (PTRM), which estimates human-perceived realism of a terrain represented as a digital elevation map by relating the distribution of terrain features with their perceived realism. This metric can be used on a synthetic terrain, and it will output an estimated level of perceived realism. We validated the proposed metrics on real and synthetic data and compared them to the perceptual studies.
{"title":"PTRM: Perceived Terrain Realism Metric","authors":"Suren Deepak Rajasekaran, Hao Kang, Martin Čadík, Eric Galin, Eric Guérin, Adrien Peytavie, Pavel Slavík, Bedrich Benes","doi":"https://dl.acm.org/doi/full/10.1145/3514244","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3514244","url":null,"abstract":"<p>Terrains are visually prominent and commonly needed objects in many computer graphics applications. While there are many algorithms for synthetic terrain generation, it is rather difficult to assess the realism of a generated output. This article presents a first step toward the direction of perceptual evaluation for terrain models. We gathered and categorized several classes of real terrains, and we generated synthetic terrain models using computer graphics methods. The terrain geometries were rendered by using the same texturing, lighting, and camera position. Two studies on these image sets were conducted, ranking the terrains perceptually, and showing that the synthetic terrains are perceived as lacking realism compared to the real ones. We provide insight into the features that affect the perceived realism by a quantitative evaluation based on localized geomorphology-based landform features (geomorphons) that categorize terrain structures such as valleys, ridges, hollows, and so forth. We show that the presence or absence of certain features has a significant perceptual effect. The importance and presence of the terrain features were confirmed by using a generative deep neural network that transferred the features between the geometric models of the real terrains and the synthetic ones. The feature transfer was followed by another perceptual experiment that further showed their importance and effect on perceived realism. We then introduce <i>Perceived Terrain Realism Metrics</i> (PTRM), which estimates human-perceived realism of a terrain represented as a digital elevation map by relating the distribution of terrain features with their perceived realism. This metric can be used on a synthetic terrain, and it will output an estimated level of perceived realism. We validated the proposed metrics on real and synthetic data and compared them to the perceptual studies.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"54 3","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes how the screen size of 3D displays affect the subjective impressions of 3D-visualized content. The key requirement for 3D displays is the presentation of depth cues comprising binocular disparities and/or motion parallax; however, the development of displays and production of content that include these cues leads to an increase in costs. Given the variety of screen sizes, it is expected that 3D characteristics are experienced differently by viewers depending on the screen size. We asked 48 participants to evaluate the 3D experience when using three different-sized stereoscopic displays (11.5, 55, and 200 inches) with head trackers. The participants were asked to score presented stimuli on 20 opposite-term pairs based on the semantic differential method after viewing each of six stimuli. Using factor analysis, we extracted three principal factors: power, related to strong three-dimensionality, real, etc.; visibility, related to stable, natural, etc.; and space, related to agile, open, etc., which had proportions of variances of 0.317, 0.277, and 0.251, respectively; their cumulation was 0.844. We confirmed that the three different-sized displays did not produce the same subjective impressions of the 3D characteristics. In particular, on the small-sized display, we found larger effects on power and space impressions from motion parallax (η2 = 0.133 and 0.161, respectively) than for the other two sizes. We found degradation of the visibility impressions from binocular disparities, which might be caused by artifacts from stereoscopy. The effects of 3D viewing on subjective impression depends on the display size, and small-sized displays offer the largest benefits by adding 3D characteristics to 2D visualization.
{"title":"Display-Size Dependent Effects of 3D Viewing on Subjective Impressions","authors":"Yamato Miyashita, Yasuhito Sawahata, Akihiro Sakai, Masamitsu Harasawa, Kazuhiro Hara, Toshiya Morita, Kazuteru Komine","doi":"https://dl.acm.org/doi/full/10.1145/3510461","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3510461","url":null,"abstract":"<p>This paper describes how the screen size of 3D displays affect the subjective impressions of 3D-visualized content. The key requirement for 3D displays is the presentation of depth cues comprising binocular disparities and/or motion parallax; however, the development of displays and production of content that include these cues leads to an increase in costs. Given the variety of screen sizes, it is expected that 3D characteristics are experienced differently by viewers depending on the screen size. We asked 48 participants to evaluate the 3D experience when using three different-sized stereoscopic displays (11.5, 55, and 200 inches) with head trackers. The participants were asked to score presented stimuli on 20 opposite-term pairs based on the semantic differential method after viewing each of six stimuli. Using factor analysis, we extracted three principal factors: <i>power</i>, related to strong three-dimensionality, real, etc.; <i>visibility</i>, related to stable, natural, etc.; and <i>space</i>, related to agile, open, etc., which had proportions of variances of 0.317, 0.277, and 0.251, respectively; their cumulation was 0.844. We confirmed that the three different-sized displays did not produce the same subjective impressions of the 3D characteristics. In particular, on the small-sized display, we found larger effects on power and space impressions from motion parallax (η<sup>2</sup> = 0.133 and 0.161, respectively) than for the other two sizes. We found degradation of the visibility impressions from binocular disparities, which might be caused by artifacts from stereoscopy. The effects of 3D viewing on subjective impression depends on the display size, and small-sized displays offer the largest benefits by adding 3D characteristics to 2D visualization.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"34 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138517394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: https://dl.acm.org/doi/full/10.1145/3524692
Stephen J. Hinde, Katy C. Noland, Graham A. Thomas, David R. Bull, Iain D. Gilchrist
This paper presents the results from two studies which used a dual-task methodology to measure an audience's experience of immersion while watching video under typical television viewing conditions. Immersion was measured while participants watched either a high dynamic range, wide color gamut video or a standard dynamic range, standard color gamut video, in high definition or ultra-high definition. Other video parameters were carefully measured and controlled.
The study found that high dynamic range, wide color gamut video is significantly more immersive than standard dynamic range, standard color gamut video in the chosen configuration. However, there was no evidence of significant differences in immersion between high-definition and ultra-high-definition resolutions.
{"title":"On the Immersive Properties of High Dynamic Range Video","authors":"Stephen J. Hinde, Katy C. Noland, Graham A. Thomas, David R. Bull, Iain D. Gilchrist","doi":"https://dl.acm.org/doi/full/10.1145/3524692","DOIUrl":"https://doi.org/https://dl.acm.org/doi/full/10.1145/3524692","url":null,"abstract":"<p>This paper presents the results from two studies which used a dual-task methodology to measure an audience's experience of immersion while watching video under typical television viewing conditions. Immersion was measured while participants watched either a high dynamic range, wide color gamut video or a standard dynamic range, standard color gamut video, in high definition or ultra-high definition. Other video parameters were carefully measured and controlled.</p><p>The study found that high dynamic range, wide color gamut video is significantly more immersive than standard dynamic range, standard color gamut video in the chosen configuration. However, there was no evidence of significant differences in immersion between high-definition and ultra-high-definition resolutions.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"231 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138517347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}