Correction: EURASIP Journal on Audio, Speech, and Music Processing 2023, 46 (2023)
https://doi.org/10.1186/s13636-023-00310-w
Following publication of the original article [1], we have been notified that Figure 14, for each cluster subfigure, there was an additional bottom row. These have been removed.
Originally published Figure 14:
Corrected Figure 14:
The original article has been corrected.
Kindt et al., Robustness of ad hoc microphone clustering using speaker embeddings: evaluation under realistic and challenging scenarios. EURASIP J. Audio Speech Music Process. 2023, 46 (2023). https://doi.org/10.1186/s13636-023-00310-w
Article Google Scholar
Download references
IDLab, Department of Electronics and Information Systems, Ghent University - Imec, Ghent, Belgium
Stijn Kindt, Jenthe Thienpondt & Nilesh Madhu
Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany
Luca Becker
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
Correspondence to Stijn Kindt.
Open Access This article is licensed under a Creative Commons Attribution 4.0 Internati
Nowadays, we are surrounded by a plethora of recording devices, including mobile phones, laptops, tablets, smartwatches, and camcorders, among others. However, conventional multichannel signal processing methods can usually not be applied to jointly process the signals recorded by multiple distributed devices because synchronous recording is essential. Thus, commercially available microphone array processing is currently limited to a single device where all microphones are mounted. The full exploitation of the spatial diversity offered by multiple audio devices without requiring wired networking is a major challenge, whose potential practical and commercial benefits prompted significant research efforts over the past decade.
Wireless acoustic sensor networks (WASNs) have become a new paradigm of acoustic sensing to overcome the limitations of individual devices. Along with wireless communications between microphone nodes and addressing new challenges in handling asynchronous channels, unknown microphone positions, and distributed computing, the WASN enables us to spatially distribute many recording devices. These may cover a wider area and utilize the nodes to form an extended microphone array. It promises to significantly improve the performance of various audio tasks such as speech enhancement, speech recognition, diarization, scene analysis, and anomalous acoustic event detection.
For this special issue, six papers were accepted which all address the above-mentioned fundamental challenges when using WASNs: First, the question of which sensors should be used for a specific signal processing task or extraction of a target source is addressed by the papers of Guenther et al. and Kindt et al. Given a set of sensors, a method for its synchronization on waveform level in dynamic scenarios is presented by Chinaev et al., and a localization method using both sensor signals and higher-level environmental information is discussed by Grinstein et al. Finally, robust speaker counting and source separation are addressed by Hsu and Bai and the task of removing specific interference from a single sensor signal is tackled by Kawamura et al.
The paper ‘Microphone utility estimation in acoustic sensor networks using single-channel signal features’ by Guenther et al. proposes a method to assess the utility of individual sensors of a WASN for coherence-based signal processing, e.g., beamforming or blind source separation, by using appropriate single-channel signal features as proxies for waveforms. Thereby, the need for transmitting waveforms for identifying suitable sensors for a synchronized cluster of sensors is avoided and the required amount of transmitted data can be reduced by several orders of magnitude. It is shown that both estimation-theoretic processing of single-channel features and deep learning-based identification of such features lead to measures of coherence in the feature space that reflect the suitability of distributed se