Connor J Parde, Virginia E Strehle, Vivekjyoti Banerjee, Ying Hu, Jacqueline G Cavazos, Carlos D Castillo, Alice J O'Toole
Deep convolutional neural networks (DCNNs) have achieved human-level accuracy in face identification (Phillips et al., 2018), though it is unclear how accurately they discriminate highly-similar faces. Here, humans and a DCNN performed a challenging face-identity matching task that included identical twins. Participants (N = 87) viewed pairs of face images of three types: same-identity, general imposters (different identities from similar demographic groups), and twin imposters (identical twin siblings). The task was to determine whether the pairs showed the same person or different people. Identity comparisons were tested in three viewpoint-disparity conditions: frontal to frontal, frontal to 45° profile, and frontal to 90°profile. Accuracy for discriminating matched-identity pairs from twin-imposter pairs and general-imposter pairs was assessed in each viewpoint-disparity condition. Humans were more accurate for general-imposter pairs than twin-imposter pairs, and accuracy declined with increased viewpoint disparity between the images in a pair. A DCNN trained for face identification (Ranjan et al., 2018) was tested on the same image pairs presented to humans. Machine performance mirrored the pattern of human accuracy, but with performance at or above all humans in all but one condition. Human and machine similarity scores were compared across all image-pair types. This item-level analysis showed that human and machine similarity ratings correlated significantly in six of nine image-pair types [range r = 0.38 to r = 0.63], suggesting general accord between the perception of face similarity by humans and the DCNN. These findings also contribute to our understanding of DCNN performance for discriminating high-resemblance faces, demonstrate that the DCNN performs at a level at or above humans, and suggest a degree of parity between the features used by humans and the DCNN.
深度卷积神经网络(DCNN)在人脸识别方面已经达到了人类水平的准确性(Phillips et al.,2018),但尚不清楚它们区分高度相似人脸的准确性。在这里,人类和DCNN执行了一项具有挑战性的人脸身份匹配任务,其中包括同卵双胞胎。参与者(N=87)观看了三种类型的成对人脸图像:相同身份、普通冒名顶替者(来自相似人口群体的不同身份)和双胞胎冒名顶替(同卵双胞胎兄弟姐妹)。任务是确定这对情侣是同一个人还是不同的人。在三种视角差异条件下测试身份比较:正面到正面、正面到45°轮廓和正面到90°轮廓。在每个视点视差条件下,评估了从双冒名顶替者对和一般冒名顶替对中区分匹配身份对的准确性。与双胞胎冒名顶替者对相比,人类对普通冒名顶替对更准确,并且准确性随着一对图像之间视点视差的增加而下降。针对人脸识别训练的DCNN(Ranjan等人,2018)在呈现给人类的相同图像对上进行了测试。机器性能反映了人类准确性的模式,但在除一种情况外的所有情况下,其性能都达到或高于所有人类。对所有图像对类型的人机相似性得分进行比较。该项目级分析显示,在九种图像对类型中的六种图像对中,人和机器的相似性评级显著相关[范围r=0.38至r=0.63],表明人类对人脸相似性的感知与DCNN之间总体一致。这些发现也有助于我们理解DCNN在识别高相似度人脸方面的性能,证明DCNN的性能达到或高于人类的水平,并表明人类使用的特征与DCNN之间存在一定程度的对等性。
{"title":"Twin Identification over Viewpoint Change: A Deep Convolutional Neural Network Surpasses Humans.","authors":"Connor J Parde, Virginia E Strehle, Vivekjyoti Banerjee, Ying Hu, Jacqueline G Cavazos, Carlos D Castillo, Alice J O'Toole","doi":"10.1145/3609224","DOIUrl":"10.1145/3609224","url":null,"abstract":"<p><p>Deep convolutional neural networks (DCNNs) have achieved human-level accuracy in face identification (Phillips et al., 2018), though it is unclear how accurately they discriminate highly-similar faces. Here, humans and a DCNN performed a challenging face-identity matching task that included identical twins. Participants (<i>N</i> = 87) viewed pairs of face images of three types: same-identity, general imposters (different identities from similar demographic groups), and twin imposters (identical twin siblings). The task was to determine whether the pairs showed the same person or different people. Identity comparisons were tested in three viewpoint-disparity conditions: frontal to frontal, frontal to 45° profile, and frontal to 90°profile. Accuracy for discriminating matched-identity pairs from twin-imposter pairs and general-imposter pairs was assessed in each viewpoint-disparity condition. Humans were more accurate for general-imposter pairs than twin-imposter pairs, and accuracy declined with increased viewpoint disparity between the images in a pair. A DCNN trained for face identification (Ranjan et al., 2018) was tested on the same image pairs presented to humans. Machine performance mirrored the pattern of human accuracy, but with performance at or above all humans in all but one condition. Human and machine similarity scores were compared across all image-pair types. This item-level analysis showed that human and machine similarity ratings correlated significantly in six of nine image-pair types [range <i>r</i> = 0.38 to <i>r</i> = 0.63], suggesting general accord between the perception of face similarity by humans and the DCNN. These findings also contribute to our understanding of DCNN performance for discriminating high-resemblance faces, demonstrate that the DCNN performs at a level at or above humans, and suggest a degree of parity between the features used by humans and the DCNN.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"20 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11315461/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42806850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electrovibration is used in touch enabled devices to render different textures. Tactile sub-modal stimuli can enhance texture perception when presented along with electrovibration stimuli. Perception of texture depends on the threshold of electrovibration. In the current study, we have conducted a psychophysical experiment on 13 participants to investigate the effect of introducing a subthreshold electrotactile stimulus (SES) to the perception of electrovibration. Interaction of tactile sub-modal stimuli causes masking of a stimulus in the presence of another stimulus. This study explored the occurrence of tactile masking of electrovibration by electrotactile stimulus. The results indicate the reduction of electrovibration threshold by 12.46% and 6.75% when the electrotactile stimulus was at 90% and 80% of its perception threshold, respectively. This method was tested over a wide range of frequencies from 20 Hz to 320 Hz in the tuning curve, and the variation in percentage reduction with frequency is reported. Another experiment was conducted to measure the perception of combined stimuli on Likert’s scale. The results showed that the perception was more inclined towards the electrovibration at 80% of SES and was indifferent at 90% of SES. The reduction in the threshold of electrovibration reveals that the effect of tactile masking by electrotactile stimulus was not prevalent under subthreshold conditions. This study provides significant insights into developing a texture rendering algorithm based on tactile sub-modal stimuli in the future.
{"title":"Effect of Subthreshold Electrotactile Stimulation On The Perception of Electrovibration","authors":"Jagan Krishnasamy Balasubramanian, Rahul Kumar Ray, Manivannan Muniyandi","doi":"https://dl.acm.org/doi/10.1145/3599970","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3599970","url":null,"abstract":"<p>Electrovibration is used in touch enabled devices to render different textures. Tactile sub-modal stimuli can enhance texture perception when presented along with electrovibration stimuli. Perception of texture depends on the threshold of electrovibration. In the current study, we have conducted a psychophysical experiment on 13 participants to investigate the effect of introducing a subthreshold electrotactile stimulus (SES) to the perception of electrovibration. Interaction of tactile sub-modal stimuli causes masking of a stimulus in the presence of another stimulus. This study explored the occurrence of tactile masking of electrovibration by electrotactile stimulus. The results indicate the reduction of electrovibration threshold by 12.46% and 6.75% when the electrotactile stimulus was at 90% and 80% of its perception threshold, respectively. This method was tested over a wide range of frequencies from 20 Hz to 320 Hz in the tuning curve, and the variation in percentage reduction with frequency is reported. Another experiment was conducted to measure the perception of combined stimuli on Likert’s scale. The results showed that the perception was more inclined towards the electrovibration at 80% of SES and was indifferent at 90% of SES. The reduction in the threshold of electrovibration reveals that the effect of tactile masking by electrotactile stimulus was not prevalent under subthreshold conditions. This study provides significant insights into developing a texture rendering algorithm based on tactile sub-modal stimuli in the future.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"8 3","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jagan Krishnasamy Balasubramanian, R. Ray, Manivannan Muniyandi
Electrovibration is used in touch enabled devices to render different textures. Tactile sub-modal stimuli can enhance texture perception when presented along with electrovibration stimuli. Perception of texture depends on the threshold of electrovibration. In the current study, we have conducted a psychophysical experiment on 13 participants to investigate the effect of introducing a subthreshold electrotactile stimulus (SES) to the perception of electrovibration. Interaction of tactile sub-modal stimuli causes masking of a stimulus in the presence of another stimulus. This study explored the occurrence of tactile masking of electrovibration by electrotactile stimulus. The results indicate the reduction of electrovibration threshold by 12.46% and 6.75% when the electrotactile stimulus was at 90% and 80% of its perception threshold, respectively. This method was tested over a wide range of frequencies from 20 Hz to 320 Hz in the tuning curve, and the variation in percentage reduction with frequency is reported. Another experiment was conducted to measure the perception of combined stimuli on the Likert scale. The results showed that the perception was more inclined towards the electrovibration at 80% of SES and was indifferent at 90% of SES. The reduction in the threshold of electrovibration reveals that the effect of tactile masking by electrotactile stimulus was not prevalent under subthreshold conditions. This study provides significant insights into developing a texture rendering algorithm based on tactile sub-modal stimuli in the future.
{"title":"Effect of Subthreshold Electrotactile Stimulation on the Perception of Electrovibration","authors":"Jagan Krishnasamy Balasubramanian, R. Ray, Manivannan Muniyandi","doi":"10.1145/3599970","DOIUrl":"https://doi.org/10.1145/3599970","url":null,"abstract":"Electrovibration is used in touch enabled devices to render different textures. Tactile sub-modal stimuli can enhance texture perception when presented along with electrovibration stimuli. Perception of texture depends on the threshold of electrovibration. In the current study, we have conducted a psychophysical experiment on 13 participants to investigate the effect of introducing a subthreshold electrotactile stimulus (SES) to the perception of electrovibration. Interaction of tactile sub-modal stimuli causes masking of a stimulus in the presence of another stimulus. This study explored the occurrence of tactile masking of electrovibration by electrotactile stimulus. The results indicate the reduction of electrovibration threshold by 12.46% and 6.75% when the electrotactile stimulus was at 90% and 80% of its perception threshold, respectively. This method was tested over a wide range of frequencies from 20 Hz to 320 Hz in the tuning curve, and the variation in percentage reduction with frequency is reported. Another experiment was conducted to measure the perception of combined stimuli on the Likert scale. The results showed that the perception was more inclined towards the electrovibration at 80% of SES and was indifferent at 90% of SES. The reduction in the threshold of electrovibration reveals that the effect of tactile masking by electrotactile stimulus was not prevalent under subthreshold conditions. This study provides significant insights into developing a texture rendering algorithm based on tactile sub-modal stimuli in the future.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"20 1","pages":"1 - 16"},"PeriodicalIF":1.6,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45952438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-21DOI: https://dl.acm.org/doi/10.1145/3588317
Weng Khuan Hoh, Fang-Lue Zhang, Neil A. Dodgson
We investigate the optimal aesthetic location and size of a single dominant salient region in a photographic image. Existing algorithms for photographic composition do not take full account of the spatial positioning or sizes of these salient regions. We present a set of experiments to assess aesthetic preferences, inspired by theories of centeredness, principal lines, and Rule-of-Thirds. Our experimental results show a clear preference for the salient region to be centered in the image and that there is a preferred size of non-salient border around this salient region. We thus propose a novel image cropping mechanism for images containing a single salient region to achieve the best aesthetic balance. Our results show that the Rule-of-Thirds guideline is not generally valid but also allow us to hypothesize in which situations it is useful and in which it is inappropriate.
{"title":"Salient-Centeredness and Saliency Size in Computational Aesthetics","authors":"Weng Khuan Hoh, Fang-Lue Zhang, Neil A. Dodgson","doi":"https://dl.acm.org/doi/10.1145/3588317","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588317","url":null,"abstract":"<p>We investigate the optimal aesthetic location and size of a single dominant salient region in a photographic image. Existing algorithms for photographic composition do not take full account of the spatial positioning or sizes of these salient regions. We present a set of experiments to assess aesthetic preferences, inspired by theories of centeredness, principal lines, and Rule-of-Thirds. Our experimental results show a clear preference for the salient region to be centered in the image and that there is a preferred size of non-salient border around this salient region. We thus propose a novel image cropping mechanism for images containing a single salient region to achieve the best aesthetic balance. Our results show that the Rule-of-Thirds guideline is not generally valid but also allow us to hypothesize in which situations it is useful and in which it is inappropriate.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"8 4","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-21DOI: https://dl.acm.org/doi/10.1145/3583072
Luca Surace, Marek Wernikowski, Cara Tursun, Karol Myszkowski, Radosław Mantiuk, Piotr Didyk
A foveated image can be entirely reconstructed from a sparse set of samples distributed according to the retinal sensitivity of the human visual system, which rapidly decreases with increasing eccentricity. The use of generative adversarial networks (GANs) has recently been shown to be a promising solution for such a task, as they can successfully hallucinate missing image information. As in the case of other supervised learning approaches, the definition of the loss function and the training strategy heavily influence the quality of the output. In this work,we consider the problem of efficiently guiding the training of foveated reconstruction techniques such that they are more aware of the capabilities and limitations of the human visual system, and thus can reconstruct visually important image features. Our primary goal is to make the training procedure less sensitive to distortions that humans cannot detect and focus on penalizing perceptually important artifacts. Given the nature of GAN-based solutions, we focus on the sensitivity of human vision to hallucination in case of input samples with different densities. We propose psychophysical experiments, a dataset, and a procedure for training foveated image reconstruction. The proposed strategy renders the generator network flexible by penalizing only perceptually important deviations in the output. As a result, the method emphasized the recovery of perceptually important image features. We evaluated our strategy and compared it with alternative solutions by using a newly trained objective metric, a recent foveated video quality metric, and user experiments. Our evaluations revealed significant improvements in the perceived image reconstruction quality compared with the standard GAN-based training approach.
{"title":"Learning GAN-Based Foveated Reconstruction to Recover Perceptually Important Image Features","authors":"Luca Surace, Marek Wernikowski, Cara Tursun, Karol Myszkowski, Radosław Mantiuk, Piotr Didyk","doi":"https://dl.acm.org/doi/10.1145/3583072","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3583072","url":null,"abstract":"<p>A foveated image can be entirely reconstructed from a sparse set of samples distributed according to the retinal sensitivity of the human visual system, which rapidly decreases with increasing eccentricity. The use of generative adversarial networks (GANs) has recently been shown to be a promising solution for such a task, as they can successfully hallucinate missing image information. As in the case of other supervised learning approaches, the definition of the loss function and the training strategy heavily influence the quality of the output. In this work,we consider the problem of efficiently guiding the training of foveated reconstruction techniques such that they are more aware of the capabilities and limitations of the human visual system, and thus can reconstruct visually important image features. Our primary goal is to make the training procedure less sensitive to distortions that humans cannot detect and focus on penalizing perceptually important artifacts. Given the nature of GAN-based solutions, we focus on the sensitivity of human vision to hallucination in case of input samples with different densities. We propose psychophysical experiments, a dataset, and a procedure for training foveated image reconstruction. The proposed strategy renders the generator network flexible by penalizing only perceptually important deviations in the output. As a result, the method emphasized the recovery of perceptually important image features. We evaluated our strategy and compared it with alternative solutions by using a newly trained objective metric, a recent foveated video quality metric, and user experiments. Our evaluations revealed significant improvements in the perceived image reconstruction quality compared with the standard GAN-based training approach.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"9 3","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-06DOI: https://dl.acm.org/doi/10.1145/3579357
Mor Shamy, Dror G. Feitelson
Eye tracking studies have shown that reading code, in contradistinction to reading text, includes many vertical jumps. As different lines of code may have quite different functions (e.g., variable definition, flow control, or computation), it is important to accurately identify the lines being read. We design experiments that require a specific line of text to be scrutinized. Using the distribution of gazes around this line, we then calculate how the precision with which we can identify the line being read depends on the font size and spacing. The results indicate that, even after correcting for systematic bias, unnaturally large fonts and spacing may be required for reliable line identification.
Interestingly, during the experiments, the participants also repeatedly re-checked their task and if they were looking at the correct line, leading to vertical jumps similar to those observed when reading code. This suggests that observed reading patterns may be “inefficient,” in the sense that participants feel the need to repeat actions beyond the minimal number apparently required for the task. This may have implications regarding the interpretation of reading patterns. In particular, reading does not reflect only the extraction of information from the text or code. Rather, reading patterns may also reflect other types of activities, such as getting a general orientation, and searching for specific locations in the context of performing a particular task.
{"title":"Identifying Lines and Interpreting Vertical Jumps in Eye Tracking Studies of Reading Text and Code","authors":"Mor Shamy, Dror G. Feitelson","doi":"https://dl.acm.org/doi/10.1145/3579357","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3579357","url":null,"abstract":"<p>Eye tracking studies have shown that reading code, in contradistinction to reading text, includes many vertical jumps. As different lines of code may have quite different functions (e.g., variable definition, flow control, or computation), it is important to accurately identify the lines being read. We design experiments that require a specific line of text to be scrutinized. Using the distribution of gazes around this line, we then calculate how the precision with which we can identify the line being read depends on the font size and spacing. The results indicate that, even after correcting for systematic bias, unnaturally large fonts and spacing may be required for reliable line identification.</p><p>Interestingly, during the experiments, the participants also repeatedly re-checked their task and if they were looking at the correct line, leading to vertical jumps similar to those observed when reading code. This suggests that observed reading patterns may be “inefficient,” in the sense that participants feel the need to repeat actions beyond the minimal number apparently required for the task. This may have implications regarding the interpretation of reading patterns. In particular, reading does not reflect only the extraction of information from the text or code. Rather, reading patterns may also reflect other types of activities, such as getting a general orientation, and searching for specific locations in the context of performing a particular task.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"9 4","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We investigate the optimal aesthetic location and size of a single dominant salient region in a photographic image. Existing algorithms for photographic composition do not take full account of the spatial positioning or sizes of these salient regions. We present a set of experiments to assess aesthetic preferences, inspired by theories of centeredness, principal lines, and Rule-of-Thirds. Our experimental results show a clear preference for the salient region to be centered in the image and that there is a preferred size of non-salient border around this salient region. We thus propose a novel image cropping mechanism for images containing a single salient region to achieve the best aesthetic balance. Our results show that the Rule-of-Thirds guideline is not generally valid but also allow us to hypothesize in which situations it is useful and in which it is inappropriate.
{"title":"Salient-Centeredness and Saliency Size in Computational Aesthetics","authors":"Weng Khuan Hoh, Fang-Lue Zhang, N. Dodgson","doi":"10.1145/3588317","DOIUrl":"https://doi.org/10.1145/3588317","url":null,"abstract":"We investigate the optimal aesthetic location and size of a single dominant salient region in a photographic image. Existing algorithms for photographic composition do not take full account of the spatial positioning or sizes of these salient regions. We present a set of experiments to assess aesthetic preferences, inspired by theories of centeredness, principal lines, and Rule-of-Thirds. Our experimental results show a clear preference for the salient region to be centered in the image and that there is a preferred size of non-salient border around this salient region. We thus propose a novel image cropping mechanism for images containing a single salient region to achieve the best aesthetic balance. Our results show that the Rule-of-Thirds guideline is not generally valid but also allow us to hypothesize in which situations it is useful and in which it is inappropriate.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"20 1","pages":"1 - 23"},"PeriodicalIF":1.6,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43946864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eye tracking studies have shown that reading code, in contradistinction to reading text, includes many vertical jumps. As different lines of code may have quite different functions (e.g., variable definition, flow control, or computation), it is important to accurately identify the lines being read. We design experiments that require a specific line of text to be scrutinized. Using the distribution of gazes around this line, we then calculate how the precision with which we can identify the line being read depends on the font size and spacing. The results indicate that, even after correcting for systematic bias, unnaturally large fonts and spacing may be required for reliable line identification. Interestingly, during the experiments, the participants also repeatedly re-checked their task and if they were looking at the correct line, leading to vertical jumps similar to those observed when reading code. This suggests that observed reading patterns may be “inefficient,” in the sense that participants feel the need to repeat actions beyond the minimal number apparently required for the task. This may have implications regarding the interpretation of reading patterns. In particular, reading does not reflect only the extraction of information from the text or code. Rather, reading patterns may also reflect other types of activities, such as getting a general orientation, and searching for specific locations in the context of performing a particular task.
{"title":"Identifying Lines and Interpreting Vertical Jumps in Eye Tracking Studies of Reading Text and Code","authors":"Mor Shamy, D. Feitelson","doi":"10.1145/3579357","DOIUrl":"https://doi.org/10.1145/3579357","url":null,"abstract":"Eye tracking studies have shown that reading code, in contradistinction to reading text, includes many vertical jumps. As different lines of code may have quite different functions (e.g., variable definition, flow control, or computation), it is important to accurately identify the lines being read. We design experiments that require a specific line of text to be scrutinized. Using the distribution of gazes around this line, we then calculate how the precision with which we can identify the line being read depends on the font size and spacing. The results indicate that, even after correcting for systematic bias, unnaturally large fonts and spacing may be required for reliable line identification. Interestingly, during the experiments, the participants also repeatedly re-checked their task and if they were looking at the correct line, leading to vertical jumps similar to those observed when reading code. This suggests that observed reading patterns may be “inefficient,” in the sense that participants feel the need to repeat actions beyond the minimal number apparently required for the task. This may have implications regarding the interpretation of reading patterns. In particular, reading does not reflect only the extraction of information from the text or code. Rather, reading patterns may also reflect other types of activities, such as getting a general orientation, and searching for specific locations in the context of performing a particular task.","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"20 1","pages":"1 - 20"},"PeriodicalIF":1.6,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45824907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-11DOI: https://dl.acm.org/doi/10.1145/3570904
Thomas Howard, Karina Driller, William Frier, Claudio Pacchierotti, Maud Marchal, Jessica Hartcher-O’Brien
Ultrasound mid-air haptic (UMH) devices are a novel tool for haptic feedback, capable of providing localized vibrotactile stimuli to users at a distance. UMH applications largely rely on generating tactile shape outlines on the users’ skin. Here we investigate how to achieve sensations of continuity or gaps within such two-dimensional curves by studying the perception of pairs of amplitude-modulated focused ultrasound stimuli. On the one hand, we aim to investigate perceptual effects that may arise from providing simultaneous UMH stimuli. On the other hand, we wish to provide perception-based rendering guidelines for generating continuous or discontinuous sensations of tactile shapes. Finally, we hope to contribute toward a measure of the perceptually achievable resolution of UMH interfaces. We performed a user study to identify how far apart two focal points need to be to elicit a perceptual experience of two distinct stimuli separated by a gap. Mean gap detection thresholds were found at 32.3-mm spacing between focal points, but a high within- and between-subject variability was observed. Pairs spaced below 15 mm were consistently (>95%) perceived as a single stimulus, while pairs spaced 45 mm apart were consistently (84%) perceived as two separate stimuli. To investigate the observed variability, we resort to acoustic simulations of the resulting pressure fields. These show a non-linear evolution of actual peak pressure spacing as a function of nominal focal point spacing. Beyond an initial threshold in spacing (between 15 and 18 mm), which we believe to be related to the perceived size of a focal point, the probability of detecting a gap between focal points appears to linearly increase with spacing. Our work highlights physical interactions and perceptual effects to consider when designing or investigating the perception of UMH shapes.
{"title":"Gap Detection in Pairs of Ultrasound Mid-air Vibrotactile Stimuli","authors":"Thomas Howard, Karina Driller, William Frier, Claudio Pacchierotti, Maud Marchal, Jessica Hartcher-O’Brien","doi":"https://dl.acm.org/doi/10.1145/3570904","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570904","url":null,"abstract":"<p>Ultrasound mid-air haptic (UMH) devices are a novel tool for haptic feedback, capable of providing localized vibrotactile stimuli to users at a distance. UMH applications largely rely on generating tactile shape outlines on the users’ skin. Here we investigate how to achieve sensations of continuity or gaps within such two-dimensional curves by studying the perception of pairs of amplitude-modulated focused ultrasound stimuli. On the one hand, we aim to investigate perceptual effects that may arise from providing simultaneous UMH stimuli. On the other hand, we wish to provide perception-based rendering guidelines for generating continuous or discontinuous sensations of tactile shapes. Finally, we hope to contribute toward a measure of the perceptually achievable resolution of UMH interfaces. We performed a user study to identify how far apart two focal points need to be to elicit a perceptual experience of two distinct stimuli separated by a gap. Mean gap detection thresholds were found at 32.3-mm spacing between focal points, but a high within- and between-subject variability was observed. Pairs spaced below 15 mm were consistently (>95%) perceived as a single stimulus, while pairs spaced 45 mm apart were consistently (84%) perceived as two separate stimuli. To investigate the observed variability, we resort to acoustic simulations of the resulting pressure fields. These show a non-linear evolution of actual peak pressure spacing as a function of nominal focal point spacing. Beyond an initial threshold in spacing (between 15 and 18 mm), which we believe to be related to the perceived size of a focal point, the probability of detecting a gap between focal points appears to linearly increase with spacing. Our work highlights physical interactions and perceptual effects to consider when designing or investigating the perception of UMH shapes.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"51 3","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the Big Head technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scaling methods and compared perceptual thresholds and user preferences. Our second experiment was performed in an outdoor AR environment with an optical see-through head-mounted display. Participants were asked to estimate facial expressions and eye gaze, and identify a virtual human over large distances of 30, 60, and 90 m. In both experiments, our results show significant differences in minimum, maximum, and ideal head scales for different distances and tasks related to perceiving faces, facial expressions, and eye gaze, and we also found that participants were more comfortable with slightly bigger heads at larger distances. We discuss our findings with respect to the technologies used, and we discuss implications and guidelines for practical applications that aim to leverage XR-enhanced facial cues.
{"title":"Virtual Big Heads in Extended Reality: Estimation of Ideal Head Scales and Perceptual Thresholds for Comfort and Facial Cues","authors":"Zubin Choudhary, Austin Erickson, Nahal Norouzi, Kangsoo Kim, Gerd Bruder, Gregory Welch","doi":"https://dl.acm.org/doi/10.1145/3571074","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3571074","url":null,"abstract":"<p>Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the <i>Big Head</i> technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scaling methods and compared perceptual thresholds and user preferences. Our second experiment was performed in an outdoor AR environment with an optical see-through head-mounted display. Participants were asked to estimate facial expressions and eye gaze, and identify a virtual human over large distances of 30, 60, and 90 m. In both experiments, our results show significant differences in minimum, maximum, and ideal head scales for different distances and tasks related to perceiving faces, facial expressions, and eye gaze, and we also found that participants were more comfortable with slightly bigger heads at larger distances. We discuss our findings with respect to the technologies used, and we discuss implications and guidelines for practical applications that aim to leverage XR-enhanced facial cues.</p>","PeriodicalId":50921,"journal":{"name":"ACM Transactions on Applied Perception","volume":"51 2","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138504108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}