Deep convolutional neural networks (DCNNs) have achieved human-level accuracy in face identification (Phillips et al., 2018), though it is unclear how accurately they discriminate highly-similar faces. Here, humans and a DCNN performed a challenging face-identity matching task that included identical twins. Participants (N = 87) viewed pairs of face images of three types: same-identity, general imposters (different identities from similar demographic groups), and twin imposters (identical twin siblings). The task was to determine whether the pairs showed the same person or different people. Identity comparisons were tested in three viewpoint-disparity conditions: frontal to frontal, frontal to 45-degree profile, and frontal to 90-degree-profile. Accuracy for discriminating matched-identity pairs from twin-imposter pairs and general imposter pairs was assessed in each viewpoint-disparity condition. Humans were more accurate for general-imposter pairs than twin-imposter pairs, and accuracy declined with increased viewpoint disparity between the images in a pair. A DCNN trained for face identification (Ranjan et al., 2018) was tested on the same image pairs presented to humans. Machine performance mirrored the pattern of human accuracy, but with performance at or above all humans in all but one condition. Human and machine similarity scores were compared across all image-pair types. This item-level analysis showed that human and machine similarity ratings correlated significantly in six of nine image-pair types [range r = 0.38 to r = 0.63], suggesting general accord between the perception of face similarity by humans and the DCNN. These findings also contribute to our understanding of DCNN performance for discriminating high-resemblance faces, demonstrate that the DCNN performs at a level at or above humans, and suggest a degree of parity between the features used by humans and the DCNN.
Deep convolutional neural networks (DCNNs) have achieved human-level accuracy in face identification (Phillips et al., 2018), though it is unclear how accurately they discriminate highly-similar faces. Here, humans and a DCNN performed a challenging face-identity matching task that included identical twins. Participants (N = 87) viewed pairs of face images of three types: same-identity, general imposters (different identities from similar demographic groups), and twin imposters (identical twin siblings). The task was to determine whether the pairs showed the same person or different people. Identity comparisons were tested in three viewpoint-disparity conditions: frontal to frontal, frontal to 45° profile, and frontal to 90°profile. Accuracy for discriminating matched-identity pairs from twin-imposter pairs and general-imposter pairs was assessed in each viewpoint-disparity condition. Humans were more accurate for general-imposter pairs than twin-imposter pairs, and accuracy declined with increased viewpoint disparity between the images in a pair. A DCNN trained for face identification (Ranjan et al., 2018) was tested on the same image pairs presented to humans. Machine performance mirrored the pattern of human accuracy, but with performance at or above all humans in all but one condition. Human and machine similarity scores were compared across all image-pair types. This item-level analysis showed that human and machine similarity ratings correlated significantly in six of nine image-pair types [range r = 0.38 to r = 0.63], suggesting general accord between the perception of face similarity by humans and the DCNN. These findings also contribute to our understanding of DCNN performance for discriminating high-resemblance faces, demonstrate that the DCNN performs at a level at or above humans, and suggest a degree of parity between the features used by humans and the DCNN.
Electrovibration is used in touch enabled devices to render different textures. Tactile sub-modal stimuli can enhance texture perception when presented along with electrovibration stimuli. Perception of texture depends on the threshold of electrovibration. In the current study, we have conducted a psychophysical experiment on 13 participants to investigate the effect of introducing a subthreshold electrotactile stimulus (SES) to the perception of electrovibration. Interaction of tactile sub-modal stimuli causes masking of a stimulus in the presence of another stimulus. This study explored the occurrence of tactile masking of electrovibration by electrotactile stimulus. The results indicate the reduction of electrovibration threshold by 12.46% and 6.75% when the electrotactile stimulus was at 90% and 80% of its perception threshold, respectively. This method was tested over a wide range of frequencies from 20 Hz to 320 Hz in the tuning curve, and the variation in percentage reduction with frequency is reported. Another experiment was conducted to measure the perception of combined stimuli on Likert’s scale. The results showed that the perception was more inclined towards the electrovibration at 80% of SES and was indifferent at 90% of SES. The reduction in the threshold of electrovibration reveals that the effect of tactile masking by electrotactile stimulus was not prevalent under subthreshold conditions. This study provides significant insights into developing a texture rendering algorithm based on tactile sub-modal stimuli in the future.
We investigate the optimal aesthetic location and size of a single dominant salient region in a photographic image. Existing algorithms for photographic composition do not take full account of the spatial positioning or sizes of these salient regions. We present a set of experiments to assess aesthetic preferences, inspired by theories of centeredness, principal lines, and Rule-of-Thirds. Our experimental results show a clear preference for the salient region to be centered in the image and that there is a preferred size of non-salient border around this salient region. We thus propose a novel image cropping mechanism for images containing a single salient region to achieve the best aesthetic balance. Our results show that the Rule-of-Thirds guideline is not generally valid but also allow us to hypothesize in which situations it is useful and in which it is inappropriate.
A foveated image can be entirely reconstructed from a sparse set of samples distributed according to the retinal sensitivity of the human visual system, which rapidly decreases with increasing eccentricity. The use of generative adversarial networks (GANs) has recently been shown to be a promising solution for such a task, as they can successfully hallucinate missing image information. As in the case of other supervised learning approaches, the definition of the loss function and the training strategy heavily influence the quality of the output. In this work,we consider the problem of efficiently guiding the training of foveated reconstruction techniques such that they are more aware of the capabilities and limitations of the human visual system, and thus can reconstruct visually important image features. Our primary goal is to make the training procedure less sensitive to distortions that humans cannot detect and focus on penalizing perceptually important artifacts. Given the nature of GAN-based solutions, we focus on the sensitivity of human vision to hallucination in case of input samples with different densities. We propose psychophysical experiments, a dataset, and a procedure for training foveated image reconstruction. The proposed strategy renders the generator network flexible by penalizing only perceptually important deviations in the output. As a result, the method emphasized the recovery of perceptually important image features. We evaluated our strategy and compared it with alternative solutions by using a newly trained objective metric, a recent foveated video quality metric, and user experiments. Our evaluations revealed significant improvements in the perceived image reconstruction quality compared with the standard GAN-based training approach.
Eye tracking studies have shown that reading code, in contradistinction to reading text, includes many vertical jumps. As different lines of code may have quite different functions (e.g., variable definition, flow control, or computation), it is important to accurately identify the lines being read. We design experiments that require a specific line of text to be scrutinized. Using the distribution of gazes around this line, we then calculate how the precision with which we can identify the line being read depends on the font size and spacing. The results indicate that, even after correcting for systematic bias, unnaturally large fonts and spacing may be required for reliable line identification.
Interestingly, during the experiments, the participants also repeatedly re-checked their task and if they were looking at the correct line, leading to vertical jumps similar to those observed when reading code. This suggests that observed reading patterns may be “inefficient,” in the sense that participants feel the need to repeat actions beyond the minimal number apparently required for the task. This may have implications regarding the interpretation of reading patterns. In particular, reading does not reflect only the extraction of information from the text or code. Rather, reading patterns may also reflect other types of activities, such as getting a general orientation, and searching for specific locations in the context of performing a particular task.