Junsong Zhang, Zuyi Yang, Li Jin, Zhitang Lu, Jinhui Yu
In this article, we present a content-aware method for generating a word painting. Word painting is a composite artwork made from the assemblage of words extracted from a given text, which carries similar semantics and visual features to a given source image. However, word painting, usually created by skilled artists, involves tedious manual processes, especially when generating streamlines and laying out text. Hence, we provide an easy method to create word paintings for users. How to design textural layout that simultaneously conveys the input image and enables easy access to the semantic theme is the key challenge to generating a visually pleasing word painting. To address this issue, given an image and its content-related text, we first decompose the input image into several regions and approximate each region with a smooth vector field. At the same time, by analyzing the input text, we extract some weighted keywords as the graphic elements. Then, to measure the likelihood of positions in the input image that attract the observers’ attention, we generate a saliency map with our trained visual attention model. Finally, jointly considering visual attention and aesthetic rules, we propose an energy-based optimization framework to arrange extracted keywords into the decomposed regions and synthesize a word painting. Experimental results and user studies show that this method is able to generate a fashionable and appealing word painting.
{"title":"Creating Word Paintings Jointly Considering Semantics, Attention, and Aesthetics","authors":"Junsong Zhang, Zuyi Yang, Li Jin, Zhitang Lu, Jinhui Yu","doi":"10.1145/3539610","DOIUrl":"https://doi.org/10.1145/3539610","url":null,"abstract":"In this article, we present a content-aware method for generating a word painting. Word painting is a composite artwork made from the assemblage of words extracted from a given text, which carries similar semantics and visual features to a given source image. However, word painting, usually created by skilled artists, involves tedious manual processes, especially when generating streamlines and laying out text. Hence, we provide an easy method to create word paintings for users. How to design textural layout that simultaneously conveys the input image and enables easy access to the semantic theme is the key challenge to generating a visually pleasing word painting. To address this issue, given an image and its content-related text, we first decompose the input image into several regions and approximate each region with a smooth vector field. At the same time, by analyzing the input text, we extract some weighted keywords as the graphic elements. Then, to measure the likelihood of positions in the input image that attract the observers’ attention, we generate a saliency map with our trained visual attention model. Finally, jointly considering visual attention and aesthetic rules, we propose an energy-based optimization framework to arrange extracted keywords into the decomposed regions and synthesize a word painting. Experimental results and user studies show that this method is able to generate a fashionable and appealing word painting.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133813912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial auditory cues are important for many tasks in immersive virtual environments, especially guidance tasks. However, due to the limited fidelity of spatial sounds rendered by generic Head-Related Transfer Functions (HRTFs), sound localization usually has a limited accuracy, especially in elevation, which can potentially impact the effectiveness of auditory guidance. To address this issue, we explored whether integrating sonification with spatial audio can enhance the perceptions of auditory guidance cues so user performance in auditory guidance tasks can be improved. Specifically, we investigated the effects of sonification mapping strategy using a controlled experiment that compared four elevation sonification mapping strategies: absolute elevation mapping, unsigned relative elevation mapping, signed relative elevation mapping, and binary relative elevation mapping. In addition, we examined whether azimuth sonification mapping can further benefit the perception of spatial sounds. The results demonstrate that spatial auditory cues can be effectively enhanced by integrating elevation and azimuth sonification, where the accuracy and speed of guidance tasks can be significantly improved. In particular, the overall results suggest that binary relative elevation mapping is generally the most effective strategy among four elevation sonification mapping strategies, which indicates that auditory cues with clear directional information are key to efficient auditory guidance.
{"title":"Exploring Sonification Mapping Strategies for Spatial Auditory Guidance in Immersive Virtual Environments","authors":"Zihan Gao, Hui-qiang Wang, Guangsheng Feng, Hongwu Lv","doi":"10.1145/3528171","DOIUrl":"https://doi.org/10.1145/3528171","url":null,"abstract":"Spatial auditory cues are important for many tasks in immersive virtual environments, especially guidance tasks. However, due to the limited fidelity of spatial sounds rendered by generic Head-Related Transfer Functions (HRTFs), sound localization usually has a limited accuracy, especially in elevation, which can potentially impact the effectiveness of auditory guidance. To address this issue, we explored whether integrating sonification with spatial audio can enhance the perceptions of auditory guidance cues so user performance in auditory guidance tasks can be improved. Specifically, we investigated the effects of sonification mapping strategy using a controlled experiment that compared four elevation sonification mapping strategies: absolute elevation mapping, unsigned relative elevation mapping, signed relative elevation mapping, and binary relative elevation mapping. In addition, we examined whether azimuth sonification mapping can further benefit the perception of spatial sounds. The results demonstrate that spatial auditory cues can be effectively enhanced by integrating elevation and azimuth sonification, where the accuracy and speed of guidance tasks can be significantly improved. In particular, the overall results suggest that binary relative elevation mapping is generally the most effective strategy among four elevation sonification mapping strategies, which indicates that auditory cues with clear directional information are key to efficient auditory guidance.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116585956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article presents an investigation into the perceptual vibrotactile thresholds for a range of frequencies on both the inside and outside areas of the wrist when exciting the skin with parallel vibrations, realized using the L5 actuator made by Lofelt GmbH. The vibrotactile threshold of 30 participants was measured using a modified audiometry test for the frequency range of 25–1,000 Hz. The average threshold across the respective frequencies was then ultimately determined from acceleration minima. The results show that maximum sensitivity lies in the range of 100–275 Hz (peaking at 200 Hz) for the inside and 75–250 Hz (peaking at 125 Hz) for the outside of the wrist and that thresholds are overall higher for the hairy skin on the outside of the wrist than for the glabrous skin on the inside. The results also show that the vibrotactile thresholds varied highly between individuals. Hence, personalized threshold measurements at the actuator locations will be required to fine-tune a device for the user. This study is a part of an ongoing research and development project where the aim is to develop a tactile display device and a music encoding scheme with the purpose of augmenting the musical enjoyment of cochlear implant recipients. These results, along with results from planned follow-up experiments, will be used to determine the appropriate frequency range and to cast light on the dynamic range on offer for the tactile device.
{"title":"Vibrotactile Threshold Measurements at the Wrist Using Parallel Vibration Actuators","authors":"Elvar Atli Ævarsson, Thórhildur Ásgeirsdóttir, Finnur Pind, Á. Kristjánsson, Runar Unnthorsson","doi":"10.1145/3529259","DOIUrl":"https://doi.org/10.1145/3529259","url":null,"abstract":"This article presents an investigation into the perceptual vibrotactile thresholds for a range of frequencies on both the inside and outside areas of the wrist when exciting the skin with parallel vibrations, realized using the L5 actuator made by Lofelt GmbH. The vibrotactile threshold of 30 participants was measured using a modified audiometry test for the frequency range of 25–1,000 Hz. The average threshold across the respective frequencies was then ultimately determined from acceleration minima. The results show that maximum sensitivity lies in the range of 100–275 Hz (peaking at 200 Hz) for the inside and 75–250 Hz (peaking at 125 Hz) for the outside of the wrist and that thresholds are overall higher for the hairy skin on the outside of the wrist than for the glabrous skin on the inside. The results also show that the vibrotactile thresholds varied highly between individuals. Hence, personalized threshold measurements at the actuator locations will be required to fine-tune a device for the user. This study is a part of an ongoing research and development project where the aim is to develop a tactile display device and a music encoding scheme with the purpose of augmenting the musical enjoyment of cochlear implant recipients. These results, along with results from planned follow-up experiments, will be used to determine the appropriate frequency range and to cast light on the dynamic range on offer for the tactile device.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123316116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human emotion is one of the most complex psychophysiological phenomena and has been reported to be affected significantly by music listening. It is supposed that there is an intrinsic relationship between human emotion and music, which can be modeled and predicted quantitatively in a supervised manner. Here, a heuristic clustering analysis is carried out on large-scale free music archive to derive a genre-diverse music library, to which the emotional response of participants is measured using a standard protocol, consequently resulting in a systematic emotion-to-music profile. Eight machine learning methods are employed to statistically correlate the basic sound features of music audio tracks in the library with the measured emotional response of tested people to the music tracks in a training set and to blindly predict the emotional response from sound features in a test set. This study found that nonlinear methods are more robust and predictable but considerably more time-consuming than linear approaches. The neural networks have strong internal fittability but are associated with a significant overfitting issue. The support vector machine and Gaussian process exhibit both high internal stability and satisfactory external predictability in all used methods; they are considered as promising tools to model, predict, and explain the intrinsic relationship between human emotion and music. The psychological basis and perceptional implication underlying the built machine learning models are also discussed to find out the key music factors that affect human emotion.
{"title":"Machine Learning–based Modeling and Prediction of the Intrinsic Relationship between Human Emotion and Music","authors":"Jun Su, Pengcheng Zhou","doi":"10.1145/3534966","DOIUrl":"https://doi.org/10.1145/3534966","url":null,"abstract":"Human emotion is one of the most complex psychophysiological phenomena and has been reported to be affected significantly by music listening. It is supposed that there is an intrinsic relationship between human emotion and music, which can be modeled and predicted quantitatively in a supervised manner. Here, a heuristic clustering analysis is carried out on large-scale free music archive to derive a genre-diverse music library, to which the emotional response of participants is measured using a standard protocol, consequently resulting in a systematic emotion-to-music profile. Eight machine learning methods are employed to statistically correlate the basic sound features of music audio tracks in the library with the measured emotional response of tested people to the music tracks in a training set and to blindly predict the emotional response from sound features in a test set. This study found that nonlinear methods are more robust and predictable but considerably more time-consuming than linear approaches. The neural networks have strong internal fittability but are associated with a significant overfitting issue. The support vector machine and Gaussian process exhibit both high internal stability and satisfactory external predictability in all used methods; they are considered as promising tools to model, predict, and explain the intrinsic relationship between human emotion and music. The psychological basis and perceptional implication underlying the built machine learning models are also discussed to find out the key music factors that affect human emotion.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134642873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We report two studies that investigate the use of subjective believability in the assessment of objective realism of terrain. The first demonstrates that there is a clear subjective feature bias that depends on the types of terrain being evaluated: Our participants found certain natural terrains to be more believable than others. This confounding factor means that any comparison experiment must not ask participants to compare terrains with different types of features. Our second experiment assesses four methods of example-based terrain synthesis, comparing them against each other and against real terrain. Our results show that, while all tested methods can produce terrain that is indistinguishable from reality, all also can produce poor terrain; that there is no one method that is consistently better than the others; and that those who have professional expertise in geology, cartography, or image analysis are better able to distinguish real terrain from synthesized terrain than the general population, but those who have professional expertise in the visual arts are not.
{"title":"Evaluating Realism in Example-based Terrain Synthesis","authors":"Joshua J. Scott, N. Dodgson","doi":"10.1145/3531526","DOIUrl":"https://doi.org/10.1145/3531526","url":null,"abstract":"We report two studies that investigate the use of subjective believability in the assessment of objective realism of terrain. The first demonstrates that there is a clear subjective feature bias that depends on the types of terrain being evaluated: Our participants found certain natural terrains to be more believable than others. This confounding factor means that any comparison experiment must not ask participants to compare terrains with different types of features. Our second experiment assesses four methods of example-based terrain synthesis, comparing them against each other and against real terrain. Our results show that, while all tested methods can produce terrain that is indistinguishable from reality, all also can produce poor terrain; that there is no one method that is consistently better than the others; and that those who have professional expertise in geology, cartography, or image analysis are better able to distinguish real terrain from synthesized terrain than the general population, but those who have professional expertise in the visual arts are not.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128096872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Initially introduced in the field of informatics, an auditory icon consists of a short sound that is present in everyday life, used to represent a specific event, object, function, or action. Auditory icons have been studied in various fields, and overall, compared to other types of auditory alarms, they can be very efficient in informing the listener about a situation or event. So far, auditory icons have been used with a wide range of durations, ranging from a few hundreds of milliseconds up to several seconds. Still little is known, however, about whether and how icon duration influences its interpretation. In the present study, we therefore asked listeners to rate 12 auditory icons, divided into four different sound categories (nonverbal human sounds, machine sounds, human activities, and animal vocalizations), in five different durations (200, 400, 800, 1,600, and 3,200 ms). They rated (1) how appropriately the icon sound itself represented the icon's referent and (2) how appropriately each duration of the icon sound represented the icon's referent. Overall, results demonstrate that the duration of the auditory icons in this stimulus set can directly affect how the icon represents the referent. Auditory icons in the test set characterized by human activities represented their referent most appropriately in a relatively shorter duration (400 or 800 ms). The majority of the auditory icons in the set consisting of machine sounds, nonverbal human sounds, and animal vocalizations, however, were considered as more appropriately representing their referent in longer durations (800 ms and 1,600 ms). Further systematic research is necessary to determine whether the duration effects shown here may generalize to other stimulus sets.
{"title":"The Duration of an Auditory Icon Can Affect How the Listener Interprets Its Meaning","authors":"João P. Cabral, G. Remijn","doi":"10.1145/3527269","DOIUrl":"https://doi.org/10.1145/3527269","url":null,"abstract":"Initially introduced in the field of informatics, an auditory icon consists of a short sound that is present in everyday life, used to represent a specific event, object, function, or action. Auditory icons have been studied in various fields, and overall, compared to other types of auditory alarms, they can be very efficient in informing the listener about a situation or event. So far, auditory icons have been used with a wide range of durations, ranging from a few hundreds of milliseconds up to several seconds. Still little is known, however, about whether and how icon duration influences its interpretation. In the present study, we therefore asked listeners to rate 12 auditory icons, divided into four different sound categories (nonverbal human sounds, machine sounds, human activities, and animal vocalizations), in five different durations (200, 400, 800, 1,600, and 3,200 ms). They rated (1) how appropriately the icon sound itself represented the icon's referent and (2) how appropriately each duration of the icon sound represented the icon's referent. Overall, results demonstrate that the duration of the auditory icons in this stimulus set can directly affect how the icon represents the referent. Auditory icons in the test set characterized by human activities represented their referent most appropriately in a relatively shorter duration (400 or 800 ms). The majority of the auditory icons in the set consisting of machine sounds, nonverbal human sounds, and animal vocalizations, however, were considered as more appropriately representing their referent in longer durations (800 ms and 1,600 ms). Further systematic research is necessary to determine whether the duration effects shown here may generalize to other stimulus sets.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126975263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Hinde, K. Noland, Graham Thomas, David R. Bull, I. Gilchrist
This paper presents the results from two studies which used a dual-task methodology to measure an audience's experience of immersion while watching video under typical television viewing conditions. Immersion was measured while participants watched either a high dynamic range, wide color gamut video or a standard dynamic range, standard color gamut video, in high definition or ultra-high definition. Other video parameters were carefully measured and controlled. The study found that high dynamic range, wide color gamut video is significantly more immersive than standard dynamic range, standard color gamut video in the chosen configuration. However, there was no evidence of significant differences in immersion between high-definition and ultra-high-definition resolutions.
{"title":"On the Immersive Properties of High Dynamic Range Video","authors":"S. Hinde, K. Noland, Graham Thomas, David R. Bull, I. Gilchrist","doi":"10.1145/3524692","DOIUrl":"https://doi.org/10.1145/3524692","url":null,"abstract":"This paper presents the results from two studies which used a dual-task methodology to measure an audience's experience of immersion while watching video under typical television viewing conditions. Immersion was measured while participants watched either a high dynamic range, wide color gamut video or a standard dynamic range, standard color gamut video, in high definition or ultra-high definition. Other video parameters were carefully measured and controlled. The study found that high dynamic range, wide color gamut video is significantly more immersive than standard dynamic range, standard color gamut video in the chosen configuration. However, there was no evidence of significant differences in immersion between high-definition and ultra-high-definition resolutions.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128858788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. D. Rajasekaran, Hao Kang, Martin Čadík, Eric Galin, É. Guérin, A. Peytavie, P. Slavík, Bedrich Benes
Terrains are visually prominent and commonly needed objects in many computer graphics applications. While there are many algorithms for synthetic terrain generation, it is rather difficult to assess the realism of a generated output. This article presents a first step toward the direction of perceptual evaluation for terrain models. We gathered and categorized several classes of real terrains, and we generated synthetic terrain models using computer graphics methods. The terrain geometries were rendered by using the same texturing, lighting, and camera position. Two studies on these image sets were conducted, ranking the terrains perceptually, and showing that the synthetic terrains are perceived as lacking realism compared to the real ones. We provide insight into the features that affect the perceived realism by a quantitative evaluation based on localized geomorphology-based landform features (geomorphons) that categorize terrain structures such as valleys, ridges, hollows, and so forth. We show that the presence or absence of certain features has a significant perceptual effect. The importance and presence of the terrain features were confirmed by using a generative deep neural network that transferred the features between the geometric models of the real terrains and the synthetic ones. The feature transfer was followed by another perceptual experiment that further showed their importance and effect on perceived realism. We then introduce Perceived Terrain Realism Metrics (PTRM), which estimates human-perceived realism of a terrain represented as a digital elevation map by relating the distribution of terrain features with their perceived realism. This metric can be used on a synthetic terrain, and it will output an estimated level of perceived realism. We validated the proposed metrics on real and synthetic data and compared them to the perceptual studies.
{"title":"PTRM: Perceived Terrain Realism Metric","authors":"S. D. Rajasekaran, Hao Kang, Martin Čadík, Eric Galin, É. Guérin, A. Peytavie, P. Slavík, Bedrich Benes","doi":"10.1145/3514244","DOIUrl":"https://doi.org/10.1145/3514244","url":null,"abstract":"Terrains are visually prominent and commonly needed objects in many computer graphics applications. While there are many algorithms for synthetic terrain generation, it is rather difficult to assess the realism of a generated output. This article presents a first step toward the direction of perceptual evaluation for terrain models. We gathered and categorized several classes of real terrains, and we generated synthetic terrain models using computer graphics methods. The terrain geometries were rendered by using the same texturing, lighting, and camera position. Two studies on these image sets were conducted, ranking the terrains perceptually, and showing that the synthetic terrains are perceived as lacking realism compared to the real ones. We provide insight into the features that affect the perceived realism by a quantitative evaluation based on localized geomorphology-based landform features (geomorphons) that categorize terrain structures such as valleys, ridges, hollows, and so forth. We show that the presence or absence of certain features has a significant perceptual effect. The importance and presence of the terrain features were confirmed by using a generative deep neural network that transferred the features between the geometric models of the real terrains and the synthetic ones. The feature transfer was followed by another perceptual experiment that further showed their importance and effect on perceived realism. We then introduce Perceived Terrain Realism Metrics (PTRM), which estimates human-perceived realism of a terrain represented as a digital elevation map by relating the distribution of terrain features with their perceived realism. This metric can be used on a synthetic terrain, and it will output an estimated level of perceived realism. We validated the proposed metrics on real and synthetic data and compared them to the perceptual studies.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114188951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yamato Miyashita, Y. Sawahata, Akihiro Sakai, M. Harasawa, Kazuhiro Hara, T. Morita, K. Komine
This paper describes how the screen size of 3D displays affect the subjective impressions of 3D-visualized content. The key requirement for 3D displays is the presentation of depth cues comprising binocular disparities and/or motion parallax; however, the development of displays and production of content that include these cues leads to an increase in costs. Given the variety of screen sizes, it is expected that 3D characteristics are experienced differently by viewers depending on the screen size. We asked 48 participants to evaluate the 3D experience when using three different-sized stereoscopic displays (11.5, 55, and 200 inches) with head trackers. The participants were asked to score presented stimuli on 20 opposite-term pairs based on the semantic differential method after viewing each of six stimuli. Using factor analysis, we extracted three principal factors: power, related to strong three-dimensionality, real, etc.; visibility, related to stable, natural, etc.; and space, related to agile, open, etc., which had proportions of variances of 0.317, 0.277, and 0.251, respectively; their cumulation was 0.844. We confirmed that the three different-sized displays did not produce the same subjective impressions of the 3D characteristics. In particular, on the small-sized display, we found larger effects on power and space impressions from motion parallax (η2 = 0.133 and 0.161, respectively) than for the other two sizes. We found degradation of the visibility impressions from binocular disparities, which might be caused by artifacts from stereoscopy. The effects of 3D viewing on subjective impression depends on the display size, and small-sized displays offer the largest benefits by adding 3D characteristics to 2D visualization.
{"title":"Display-Size Dependent Effects of 3D Viewing on Subjective Impressions","authors":"Yamato Miyashita, Y. Sawahata, Akihiro Sakai, M. Harasawa, Kazuhiro Hara, T. Morita, K. Komine","doi":"10.1145/3510461","DOIUrl":"https://doi.org/10.1145/3510461","url":null,"abstract":"This paper describes how the screen size of 3D displays affect the subjective impressions of 3D-visualized content. The key requirement for 3D displays is the presentation of depth cues comprising binocular disparities and/or motion parallax; however, the development of displays and production of content that include these cues leads to an increase in costs. Given the variety of screen sizes, it is expected that 3D characteristics are experienced differently by viewers depending on the screen size. We asked 48 participants to evaluate the 3D experience when using three different-sized stereoscopic displays (11.5, 55, and 200 inches) with head trackers. The participants were asked to score presented stimuli on 20 opposite-term pairs based on the semantic differential method after viewing each of six stimuli. Using factor analysis, we extracted three principal factors: power, related to strong three-dimensionality, real, etc.; visibility, related to stable, natural, etc.; and space, related to agile, open, etc., which had proportions of variances of 0.317, 0.277, and 0.251, respectively; their cumulation was 0.844. We confirmed that the three different-sized displays did not produce the same subjective impressions of the 3D characteristics. In particular, on the small-sized display, we found larger effects on power and space impressions from motion parallax (η2 = 0.133 and 0.161, respectively) than for the other two sizes. We found degradation of the visibility impressions from binocular disparities, which might be caused by artifacts from stereoscopy. The effects of 3D viewing on subjective impression depends on the display size, and small-sized displays offer the largest benefits by adding 3D characteristics to 2D visualization.","PeriodicalId":285994,"journal":{"name":"ACM Transactions on Applied Perceptions (TAP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114420366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}