Social interactions often require the simultaneous processing of emotions from facial expressions and speech. However, the development of the gaze behavior used for emotion recognition, and the effects of speech perception on the visual encoding of facial expressions is less understood. We therefore conducted a word-primed face categorization experiment, where participants from multiple age groups (six-year-olds, 12-year-olds, and adults) categorized target facial expressions as positive or negative after priming with valence-congruent or -incongruent auditory emotion words, or no words at all. We recorded our participants' gaze behavior during this task using an eye-tracker, and analyzed the data with respect to the fixation time toward the eyes and mouth regions of faces, as well as the time until participants made the first fixation within those regions (time to first fixation, TTFF). We found that the six-year-olds showed significantly higher accuracy in categorizing congruently primed faces compared to the other conditions. The six-year-olds also showed faster response times, shorter total fixation durations, and faster TTFF measures in all primed trials, regardless of congruency, as compared to unprimed trials. We also found that while adults looked first, and longer, at the eyes as compared to the mouth regions of target faces, children did not exhibit this gaze behavior. Our results thus indicate that young children are more sensitive than adults or older children to auditory emotion word primes during the perception of emotional faces, and that the distribution of gaze across the regions of the face changes significantly from childhood to adulthood.
A critical component to many immersive experiences in virtual reality (VR) is vection, defined as the illusion of self-motion. Traditionally, vection has been described as a visual phenomenon, but more recent research suggests that vection can be influenced by a variety of senses. The goal of the present study was to investigate the role of multisensory cues on vection by manipulating the availability of visual, auditory, and tactile stimuli in a VR setting. To achieve this, 24 adults (Mage = 25.04) were presented with a rotating stimulus aimed to induce circular vection. All participants completed trials that included a single sensory cue, a combination of two cues, or all three cues presented together. The size of the field of view (FOV) was manipulated across four levels (no-visuals, small, medium, full). Participants rated vection intensity and duration verbally after each trial. Results showed that all three sensory cues induced vection when presented in isolation, with visual cues eliciting the highest intensity and longest duration. The presence of auditory and tactile cues further increased vection intensity and duration compared to conditions where these cues were not presented. These findings support the idea that vection can be induced via multiple types of sensory inputs and can be intensified when multiple sensory inputs are combined.
Sound symbolism refers to the association between the sounds of words and their meanings, often studied using the crossmodal correspondence between auditory pseudowords, e.g., 'takete' or 'maluma', and pointed or rounded visual shapes, respectively. In a functional magnetic resonance imaging study, participants were presented with pseudoword-shape pairs that were sound-symbolically congruent or incongruent. We found no significant congruency effects in the blood oxygenation level-dependent (BOLD) signal when participants were attending to visual shapes. During attention to auditory pseudowords, however, we observed greater BOLD activity for incongruent compared to congruent audiovisual pairs bilaterally in the intraparietal sulcus and supramarginal gyrus, and in the left middle frontal gyrus. We compared this activity to independent functional contrasts designed to test competing explanations of sound symbolism, but found no evidence for mediation via language, and only limited evidence for accounts based on multisensory integration and a general magnitude system. Instead, we suggest that the observed incongruency effects are likely to reflect phonological processing and/or multisensory attention. These findings advance our understanding of sound-to-meaning mapping in the brain.
Exploring the world through touch requires the integration of internal (e.g., anatomical) and external (e.g., spatial) reference frames - you only know what you touch when you know where your hands are in space. The deficit observed in tactile temporal-order judgements when the hands are crossed over the midline provides one tool to explore this integration. We used foot pedals and required participants to focus on either the hand that was stimulated first (an anatomical bias condition) or the location of the hand that was stimulated first (a spatiotopic bias condition). Spatiotopic-based responses produce a larger crossed-hands deficit, presumably by focusing observers on the external reference frame. In contrast, anatomical-based responses focus the observer on the internal reference frame and produce a smaller deficit. This manipulation thus provides evidence that observers can change the relative weight given to each reference frame. We quantify this effect using a probabilistic model that produces a population estimate of the relative weight given to each reference frame. We show that a spatiotopic bias can result in either a larger external weight (Experiment 1) or a smaller internal weight (Experiment 2) and provide an explanation of when each one would occur.
Mounting evidence demonstrates that people make surprisingly consistent associations between auditory attributes and a number of the commonly-agreed basic tastes. However, the sonic representation of (association with) saltiness has remained rather elusive. In the present study, a crowd-sourced online study ( n = 1819 participants) was conducted to determine the acoustical/musical attributes that best match saltiness, as well as participants' confidence levels in their choices. Based on previous literature on crossmodal correspondences involving saltiness, thirteen attributes were selected to cover a variety of temporal, tactile, and emotional associations. The results revealed that saltiness was associated most strongly with a long decay time, high auditory roughness, and a regular rhythm. In terms of emotional associations, saltiness was matched with negative valence, high arousal, and minor mode. Moreover, significantly higher average confidence ratings were observed for those saltiness-matching choices for which there was majority agreement, suggesting that individuals were more confident about their own judgments when it matched with the group response, therefore providing support for the so-called 'consensuality principle'. Taken together, these results help to uncover the complex interplay of mechanisms behind seemingly surprising crossmodal correspondences between sound attributes and taste.
Previous studies have examined whether audio-visual integration changes in older age, with some studies reporting age-related differences and others reporting no differences. Most studies have either used very basic and ambiguous stimuli (e.g., flash/beep) or highly contextualized, causally related stimuli (e.g., speech). However, few have used tasks that fall somewhere between the extremes of this continuum, such as those that include contextualized, causally related stimuli that are not speech-based; for example, audio-visual impact events. The present study used a paradigm requiring duration estimates and temporal order judgements (TOJ) of audio-visual impact events. Specifically, the Schutz-Lipscomb illusion, in which the perceived duration of a percussive tone is influenced by the length of the visual striking gesture, was examined in younger and older adults. Twenty-one younger and 21 older adult participants were presented with a visual point-light representation of a percussive impact event (i.e., a marimbist striking their instrument with a long or short gesture) combined with a percussive auditory tone. Participants completed a tone duration judgement task and a TOJ task. Five audio-visual temporal offsets (-400 to +400 ms) and five spatial offsets (from -90 to +90°) were randomly introduced. Results demonstrated that the strength of the illusion did not differ between older and younger adults and was not influenced by spatial or temporal offsets. Older adults showed an 'auditory first bias' when making TOJs. The current findings expand what is known about age-related differences in audio-visual integration by considering them in the context of impact-related events.
We examine whether crossmodal correspondences (CMCs) modulate perceptual disambiguation by considering the influence of lightness/pitch congruency on the perceptual resolution of the Rubin face/vase (RFV). We randomly paired a black-and-white RFV (black faces and white vase, or vice versa) with either a high or low pitch tone and found that CMC congruency biases the dominant visual percept. The perceptual option that was CMC-congruent with the tone (white/high pitch or black/low pitch) was reported significantly more often than the perceptual option CMC-incongruent with the tone (white/low pitch or black/high pitch). However, the effect was only observed for stimuli presented for longer and not shorter durations suggesting a perceptual effect rather than a response bias, and moreover, we infer an effect on perceptual reversals rather than initial percepts. We found that the CMC congruency effect for longer-duration stimuli only occurred after prior exposure to the stimuli of several minutes, suggesting that the CMC congruency develops over time. These findings extend the observed effects of CMCs from relatively low-level feature-based effects to higher-level object-based perceptual effects (specifically, resolving ambiguity) and demonstrate that an entirely new category of crossmodal factors (CMC congruency) influence perceptual disambiguation in bistability.
Reliability-based cue combination is a hallmark of multisensory integration, while the role of cue reliability for crossmodal recalibration is less understood. The present study investigated whether visual cue reliability affects audiovisual recalibration in adults and children. Participants had to localize sounds, which were presented either alone or in combination with a spatially discrepant high- or low-reliability visual stimulus. In a previous study we had shown that the ventriloquist effect (indicating multisensory integration) was overall larger in the children groups and that the shift in sound localization toward the spatially discrepant visual stimulus decreased with visual cue reliability in all groups. The present study replicated the onset of the immediate ventriloquist aftereffect (a shift in unimodal sound localization following a single exposure of a spatially discrepant audiovisual stimulus) at the age of 6-7 years. In adults the immediate ventriloquist aftereffect depended on visual cue reliability, whereas the cumulative ventriloquist aftereffect (reflecting the audiovisual spatial discrepancies over the complete experiment) did not. In 6-7-year-olds the immediate ventriloquist aftereffect was independent of visual cue reliability. The present results are compatible with the idea of immediate and cumulative crossmodal recalibrations being dissociable processes and that the immediate ventriloquist aftereffect is more closely related to genuine multisensory integration.