Blue whales produce stereotyped songs that are declining in fundamental frequency worldwide. We examine inter- and intra-annual frequency variation in the New Zealand population, which can be monitored year-round in the South Taranaki Bight. We document a decline of 0.13 Hz/yr between 1964 and 2025, and 0.08 Hz/yr between 2016 and 2025. Furthermore, we demonstrate higher fundamental frequencies from February to June, and a significant positive relationship between fundamental frequency and the song intensity index. These results indicate that blue whales sing at higher fundamental frequencies during the putative breeding season. Future work should aim to resolve relationships between song and morphological, demographic, behavioral, and cultural factors.
{"title":"Inter- and intra-annual variation in the frequency of blue whale songs in Aotearoa New Zealand.","authors":"Dawn R Barlow, Holger Klinck, Leigh G Torres","doi":"10.1121/10.0041948","DOIUrl":"https://doi.org/10.1121/10.0041948","url":null,"abstract":"<p><p>Blue whales produce stereotyped songs that are declining in fundamental frequency worldwide. We examine inter- and intra-annual frequency variation in the New Zealand population, which can be monitored year-round in the South Taranaki Bight. We document a decline of 0.13 Hz/yr between 1964 and 2025, and 0.08 Hz/yr between 2016 and 2025. Furthermore, we demonstrate higher fundamental frequencies from February to June, and a significant positive relationship between fundamental frequency and the song intensity index. These results indicate that blue whales sing at higher fundamental frequencies during the putative breeding season. Future work should aim to resolve relationships between song and morphological, demographic, behavioral, and cultural factors.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This Letter investigates the interplay between source-separation quality and head-related impulse response (HRIR)-based binaural rendering for monaural music recordings to provide design guidelines for monaural-to-binaural processing. Instrument signals are separated from monaural mixtures using nonnegative matrix factorization and, then, listener-specific HRIRs are convolved with separated sources to synthesize binaural audio in which each instrument is assigned a distinct direction. Subjective listening tests indicate that when source-separation accuracy is sufficiently high, the HRIR-based conversion improves localization accuracy while maintaining comparable sound quality and localization stability. Results also reveal how residual components from imperfect source separation degrade binaural rendering.
{"title":"Binaural conversion of monaural music recordings using head-related impulse responses.","authors":"Yuya Hosoda, Kazuma Fujita, Ryota Shimokura, Youji Iiguni","doi":"10.1121/10.0042184","DOIUrl":"https://doi.org/10.1121/10.0042184","url":null,"abstract":"<p><p>This Letter investigates the interplay between source-separation quality and head-related impulse response (HRIR)-based binaural rendering for monaural music recordings to provide design guidelines for monaural-to-binaural processing. Instrument signals are separated from monaural mixtures using nonnegative matrix factorization and, then, listener-specific HRIRs are convolved with separated sources to synthesize binaural audio in which each instrument is assigned a distinct direction. Subjective listening tests indicate that when source-separation accuracy is sufficiently high, the HRIR-based conversion improves localization accuracy while maintaining comparable sound quality and localization stability. Results also reveal how residual components from imperfect source separation degrade binaural rendering.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The present investigation revisited fundamental principles of auditory scene analysis via detection, discrimination, and identification of natural and man-made sounds. Soundscapes, both urban and natural, often contain multiple competing sound sources. An experiment was conducted to investigate if temporal or spatial spreading of concurrent sounds would improve identification accuracy. This paper presents an analysis of natural and man-made sounds and examines their impact on the identification accuracy considering scene complexity and presentation format. Results suggest that sounds with longer effective duration and later energy peak tend to get identified more accurately, in both natural and man-made sounds.
{"title":"Identification of natural and man-made sounds in various scene complexities with temporal and spatial separation.","authors":"Song Hui Chon, Wesley A Bulla, Sihyeon Park","doi":"10.1121/10.0042197","DOIUrl":"https://doi.org/10.1121/10.0042197","url":null,"abstract":"<p><p>The present investigation revisited fundamental principles of auditory scene analysis via detection, discrimination, and identification of natural and man-made sounds. Soundscapes, both urban and natural, often contain multiple competing sound sources. An experiment was conducted to investigate if temporal or spatial spreading of concurrent sounds would improve identification accuracy. This paper presents an analysis of natural and man-made sounds and examines their impact on the identification accuracy considering scene complexity and presentation format. Results suggest that sounds with longer effective duration and later energy peak tend to get identified more accurately, in both natural and man-made sounds.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A Rényi spectral entropy is introduced for environment-adaptive detection of underwater acoustic signals in low signal-to-noise conditions. The Rényi order α is a control parameter; varying α enhances contrast between signals and background noise and improves detectability. The entropy variance as a function of α exhibits four regimes separated by crossover points, from which an optimal interval for signal selection is identified. This establishes α as a single tunable parameter to optimize detection performance. The approach is validated on passive recordings of North Atlantic right whale gunshots and outperforms a detector based on the Shannon spectral entropy.
{"title":"Rényi entropy-based detection of North Atlantic right whale gunshot vocalizations.","authors":"Artorix de la Cruz, Mae L Seto","doi":"10.1121/10.0042199","DOIUrl":"https://doi.org/10.1121/10.0042199","url":null,"abstract":"<p><p>A Rényi spectral entropy is introduced for environment-adaptive detection of underwater acoustic signals in low signal-to-noise conditions. The Rényi order α is a control parameter; varying α enhances contrast between signals and background noise and improves detectability. The entropy variance as a function of α exhibits four regimes separated by crossover points, from which an optimal interval for signal selection is identified. This establishes α as a single tunable parameter to optimize detection performance. The approach is validated on passive recordings of North Atlantic right whale gunshots and outperforms a detector based on the Shannon spectral entropy.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cochlear implant (CI) is a successful biomedical device that enables individuals with severe-to-profound hearing loss to perceive sound through electrical stimulation, yet listening in noise remains challenging. Recent deep learning advances offer promising potential for CI sound coding by integrating visual cues. In this study, an audio-visual speech enhancement (AVSE) module is integrated with the ElectrodeNet-CS (ECS) model to form the end-to-end CI system, AVSE-ECS. Simulations show that the AVSE-ECS system with joint training achieves high objective speech intelligibility and improves the signal-to-error ratio by 7.4666 dB compared to the advanced combination encoder strategy. These findings underscore the potential of AVSE-based CI sound coding.
{"title":"End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments.","authors":"Meng-Ping Lin, Enoch Hsin-Ho Huang, Shao-Yi Chien, Yu Tsao","doi":"10.1121/10.0042198","DOIUrl":"https://doi.org/10.1121/10.0042198","url":null,"abstract":"<p><p>The cochlear implant (CI) is a successful biomedical device that enables individuals with severe-to-profound hearing loss to perceive sound through electrical stimulation, yet listening in noise remains challenging. Recent deep learning advances offer promising potential for CI sound coding by integrating visual cues. In this study, an audio-visual speech enhancement (AVSE) module is integrated with the ElectrodeNet-CS (ECS) model to form the end-to-end CI system, AVSE-ECS. Simulations show that the AVSE-ECS system with joint training achieves high objective speech intelligibility and improves the signal-to-error ratio by 7.4666 dB compared to the advanced combination encoder strategy. These findings underscore the potential of AVSE-based CI sound coding.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study examines whether Italian geminate consonants influence the fundamental frequency (f0) of adjacent vowels and how such effects interact with voicing-related Cf0 perturbations. Audio and articulatory data from 23 speakers show that vowels following geminates have slightly higher f0 than those following singletons, regardless of voicing. No effect was found on preceding vowels. f0 differences remained stable across speaking rates and articulatory cues. A positive correlation was found between f0 and intensity, consistent with a mechanical, aerodynamic basis for the effect. However, some individual patterns show correlations between f0 and other properties, which may indicate controlled enhancements. The findings are consistent with a hybrid account on which both articulatory contingencies and speaker-specific enhancements may influence co-intrinsic pitch.
{"title":"Effects of gemination and voicing on f0 in Italian across varying speaking rates.","authors":"Francesco Burroni, Sireemas Maspong, James Kirby","doi":"10.1121/10.0042293","DOIUrl":"https://doi.org/10.1121/10.0042293","url":null,"abstract":"<p><p>This study examines whether Italian geminate consonants influence the fundamental frequency (f0) of adjacent vowels and how such effects interact with voicing-related Cf0 perturbations. Audio and articulatory data from 23 speakers show that vowels following geminates have slightly higher f0 than those following singletons, regardless of voicing. No effect was found on preceding vowels. f0 differences remained stable across speaking rates and articulatory cues. A positive correlation was found between f0 and intensity, consistent with a mechanical, aerodynamic basis for the effect. However, some individual patterns show correlations between f0 and other properties, which may indicate controlled enhancements. The findings are consistent with a hybrid account on which both articulatory contingencies and speaker-specific enhancements may influence co-intrinsic pitch.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annebelle C M Kok, Caroline Soderstjerna, Ella B Kim, John E Joseph, Tetyana Margolina, Lindsey E Peavey Reeves, Leila T Hatch, Simone Baumann-Pickering
Many fish species are suspected to produce sound, but the variety of sounds they produce is still largely undocumented. This study investigated the presence and diversity of fish sounds in the Channel Islands National Marine Sanctuary. Besides regular sounds from three known species-bocaccio (Sebastes paucispinis), plainfin midshipman (Porichthys notatus), and white seabass (Atractoscion nobilis)-two unusual sound types were observed that require further analyses and for which the species is still unknown: Unidentified Fish or UF200 and UF450. Sound types had distinct acoustic signatures and varied in diel presence. Monitoring fish sounds is a promising technique for non-invasive monitoring of fish presence with conservation applications.
{"title":"Acoustic behavior and diversity of fish calling in the Channel Islands.","authors":"Annebelle C M Kok, Caroline Soderstjerna, Ella B Kim, John E Joseph, Tetyana Margolina, Lindsey E Peavey Reeves, Leila T Hatch, Simone Baumann-Pickering","doi":"10.1121/10.0042167","DOIUrl":"https://doi.org/10.1121/10.0042167","url":null,"abstract":"<p><p>Many fish species are suspected to produce sound, but the variety of sounds they produce is still largely undocumented. This study investigated the presence and diversity of fish sounds in the Channel Islands National Marine Sanctuary. Besides regular sounds from three known species-bocaccio (Sebastes paucispinis), plainfin midshipman (Porichthys notatus), and white seabass (Atractoscion nobilis)-two unusual sound types were observed that require further analyses and for which the species is still unknown: Unidentified Fish or UF200 and UF450. Sound types had distinct acoustic signatures and varied in diel presence. Monitoring fish sounds is a promising technique for non-invasive monitoring of fish presence with conservation applications.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural entrainment (the persistence of speech rhythm-dependent neural oscillations) and episodic memory-based perception (exemplar theory and belief-updating models) have been proposed to explain speech rate-dependent perception. To test these accounts, we examined the rate-dependent perception of Japanese vowel and stop length contrasts while varying speech rate by manipulating all segments, vowels only, or consonants only in the carrier sentence. Robust rate effects were observed across all conditions, consistent with the neural entrainment account. In addition, the manipulation method modulated rate effects for stops but not for vowels, offering partial support for the episodic account.
{"title":"Speech rate and the perception of consonant and vowel length in Japanese: Neural entrainment and episodic memory.","authors":"Timothy Gadanidis, Yoonjung Kang","doi":"10.1121/10.0041994","DOIUrl":"https://doi.org/10.1121/10.0041994","url":null,"abstract":"<p><p>Neural entrainment (the persistence of speech rhythm-dependent neural oscillations) and episodic memory-based perception (exemplar theory and belief-updating models) have been proposed to explain speech rate-dependent perception. To test these accounts, we examined the rate-dependent perception of Japanese vowel and stop length contrasts while varying speech rate by manipulating all segments, vowels only, or consonants only in the carrier sentence. Robust rate effects were observed across all conditions, consistent with the neural entrainment account. In addition, the manipulation method modulated rate effects for stops but not for vowels, offering partial support for the episodic account.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The potential utility of providing low-frequency portions of speech signals through vibrotactile stimulation as an aid to speech recognition by cochlear implant recipients was examined. Sixty-five young adults with normal hearing heard four-channel noise-vocoded sentences high-pass filtered above 0.25 kHz as well as those noise-vocoded sentences combined with the original signal filtered below 0.25 kHz, presented through either auditory stimulation or vibrotactile stimulation. Improved speech recognition was observed for both groups, but effects were smaller for participants in the vibrotactile group than for those in the auditory group. Future research efforts should explore ways of enhancing the vibrotactile signal.
{"title":"Vibrotactile signals can aid recognition of spectrally degraded speech signals.","authors":"Susan Nittrouer, D H Whalen, Wei-Rong Chen","doi":"10.1121/10.0042196","DOIUrl":"https://doi.org/10.1121/10.0042196","url":null,"abstract":"<p><p>The potential utility of providing low-frequency portions of speech signals through vibrotactile stimulation as an aid to speech recognition by cochlear implant recipients was examined. Sixty-five young adults with normal hearing heard four-channel noise-vocoded sentences high-pass filtered above 0.25 kHz as well as those noise-vocoded sentences combined with the original signal filtered below 0.25 kHz, presented through either auditory stimulation or vibrotactile stimulation. Improved speech recognition was observed for both groups, but effects were smaller for participants in the vibrotactile group than for those in the auditory group. Future research efforts should explore ways of enhancing the vibrotactile signal.</p>","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katelyn A Berg, Hugh M Birky, Victoria A Sevich, Aaron C Moberly, Terrin N Tamati
{"title":"Erratum: Sound quality, not speech recognition, explains cochlear implant-related quality of life outcomes [JASA Express Lett. 5, 104401 (2025)].","authors":"Katelyn A Berg, Hugh M Birky, Victoria A Sevich, Aaron C Moberly, Terrin N Tamati","doi":"10.1121/10.0042231","DOIUrl":"https://doi.org/10.1121/10.0042231","url":null,"abstract":"","PeriodicalId":73538,"journal":{"name":"JASA express letters","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}