Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909646
Renke Wang, Jun-Jie Huang, P. Dragotti
In this paper, we propose a novel single image super-resolution algorithm that integrates a model-based approach with self-learning deep networks. The proposed method can be adapted to low-resolution (LR) images obtained with real acquisition devices where the point spread function is Gaussian-like. By modelling natural image lines as piece-wise smooth functions and approximating the blurring kernel with B-splines, an intermediate high-resolution (HR) image can be first obtained based on Finite Rate of Innovation theory. A self-supervised deep recursive residual network is then applied to further enhance the reconstruction quality. From the simulation results, our algorithm outperforms other self-learning algorithms and achieves state-of-the-art performance.
{"title":"FRISPEE: FRI-Based Single Image Super-Resolution with Deep Recursive Residual Network","authors":"Renke Wang, Jun-Jie Huang, P. Dragotti","doi":"10.23919/eusipco55093.2022.9909646","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909646","url":null,"abstract":"In this paper, we propose a novel single image super-resolution algorithm that integrates a model-based approach with self-learning deep networks. The proposed method can be adapted to low-resolution (LR) images obtained with real acquisition devices where the point spread function is Gaussian-like. By modelling natural image lines as piece-wise smooth functions and approximating the blurring kernel with B-splines, an intermediate high-resolution (HR) image can be first obtained based on Finite Rate of Innovation theory. A self-supervised deep recursive residual network is then applied to further enhance the reconstruction quality. From the simulation results, our algorithm outperforms other self-learning algorithms and achieves state-of-the-art performance.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"13 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122581994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909799
Eleftherios Kofidis, Paris V. Giampouras, A. Rontogiannis
The block-term tensor decomposition (BTD) model has been receiving increasing attention as a quite flexible way to capture the structure of 3-dimensional data that can be naturally viewed as the superposition of $R$ block terms of multilinear rank ($L_{r}, L_{r}, 1), r=1,2,ldots,R$. Versions with nonnegativity constraints, especially relevant in applications like blind source separation problems, have only recently been proposed and they all share the need to have an a-priori knowledge of the number of block terms, $R$, and their individual ranks, $L_{i}$. Clearly, the latter requirement may severely limit their practical applicability. Building upon earlier work of ours on unconstrained BTD model selection and computation, we develop for the first time in this paper a method for nonnegative BTD approximation that is also rank-revealing. The idea is to impose column sparsity jointly on the factors and successively estimate the ranks as the numbers of factor columns of non-negligible magnitude. This is effected with the aid of nonnegative alternating iteratively reweighted least squares, implemented via projected Newton updates for increased convergence rate and accuracy. Simulation results are reported that demonstrate the effectiveness of our method in accurately estimating both the ranks and the factors of the nonnegative least squares BTD approximation.
块项张量分解(BTD)模型作为一种非常灵活的捕获三维数据结构的方法而受到越来越多的关注,三维数据可以很自然地看作是多元线性秩($L_{R}, L_{R}, 1), R =1,2,ldots,R$的块项R$的叠加。具有非负性约束的版本,特别是与盲源分离问题等应用相关的版本,直到最近才被提出,它们都需要具有块项数量的先验知识,$R$和它们的单个秩,$L_{i}$。显然,后一项要求可能严重限制它们的实际适用性。在我们之前关于无约束BTD模型选择和计算的工作的基础上,我们在本文中首次开发了一种非负BTD近似方法,该方法也具有秩揭示性。其思想是对因子联合施加列稀疏性,并依次估计作为不可忽略量级的因子列的数量的秩。这是借助于非负交替迭代加权最小二乘实现的,通过投影牛顿更新实现,以提高收敛速度和精度。仿真结果表明,该方法能够准确估计非负最小二乘BTD近似的秩和因子。
{"title":"A Projected Newton-type Algorithm for Rank - revealing Nonnegative Block - Term Tensor Decomposition","authors":"Eleftherios Kofidis, Paris V. Giampouras, A. Rontogiannis","doi":"10.23919/eusipco55093.2022.9909799","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909799","url":null,"abstract":"The block-term tensor decomposition (BTD) model has been receiving increasing attention as a quite flexible way to capture the structure of 3-dimensional data that can be naturally viewed as the superposition of $R$ block terms of multilinear rank ($L_{r}, L_{r}, 1), r=1,2,ldots,R$. Versions with nonnegativity constraints, especially relevant in applications like blind source separation problems, have only recently been proposed and they all share the need to have an a-priori knowledge of the number of block terms, $R$, and their individual ranks, $L_{i}$. Clearly, the latter requirement may severely limit their practical applicability. Building upon earlier work of ours on unconstrained BTD model selection and computation, we develop for the first time in this paper a method for nonnegative BTD approximation that is also rank-revealing. The idea is to impose column sparsity jointly on the factors and successively estimate the ranks as the numbers of factor columns of non-negligible magnitude. This is effected with the aid of nonnegative alternating iteratively reweighted least squares, implemented via projected Newton updates for increased convergence rate and accuracy. Simulation results are reported that demonstrate the effectiveness of our method in accurately estimating both the ranks and the factors of the nonnegative least squares BTD approximation.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122770970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909892
Lukas Schynol, M. Pesavento
The weighted sum-rate maximization in coordinated multicell MIMO networks with intra- and intercell interference and local channel state at the base stations is considered. Based on the concept of unrolling applied to the classical weighted minimum mean squared error (WMMSE) algorithm and ideas from graph signal processing, we present the GCN-WMMSE deep network architecture for transceiver design in multicell MU-MIMO interference channels with local channel state information. Similar to the original WMMSE algorithm it facilitates a distributed implementation in multicell networks. However, GCN-WMMSE significantly accelerates the convergence and con-sequently alleviates the communication overhead in a distributed deployment. Additionally, the architecture is agnostic to different wireless network topologies while exhibiting a low number of trainable parameters and high efficiency w.r.t. training data.
{"title":"Deep Unfolding in Multicell MU-MIMO","authors":"Lukas Schynol, M. Pesavento","doi":"10.23919/eusipco55093.2022.9909892","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909892","url":null,"abstract":"The weighted sum-rate maximization in coordinated multicell MIMO networks with intra- and intercell interference and local channel state at the base stations is considered. Based on the concept of unrolling applied to the classical weighted minimum mean squared error (WMMSE) algorithm and ideas from graph signal processing, we present the GCN-WMMSE deep network architecture for transceiver design in multicell MU-MIMO interference channels with local channel state information. Similar to the original WMMSE algorithm it facilitates a distributed implementation in multicell networks. However, GCN-WMMSE significantly accelerates the convergence and con-sequently alleviates the communication overhead in a distributed deployment. Additionally, the architecture is agnostic to different wireless network topologies while exhibiting a low number of trainable parameters and high efficiency w.r.t. training data.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122826129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909594
Ecem Bozkurt, Antonio Ortega
We present a novel framework to represent sets of time-varying signals as dynamic graphs using the non-negative kernel (NNK) graph construction. We extend the original NNK framework to allow explicit delays as part of the graph construction, so that unlike in NNK, two nodes can be connected with an edge corresponding to a non-zero time delay, if there is higher similarity between the signals after shifting one of them. We also propose to characterize the similarity between signals at different nodes using the node degree and clustering coefficients of their respective visibility graphs. Graph edges that can representing temporal delays, we provide a new perspective that enables us to see the effect of synchronization in graph construction for time-series signals. For both temperature and EEG datasets, we show that our proposed approach can achieve sparse and interpretable graph representations. Furthermore, the proposed method can be useful in characterizing different EEG experiments using sparsity.
{"title":"Non-Negative Kernel Graphs for Time-Varying Signals Using Visibility Graphs","authors":"Ecem Bozkurt, Antonio Ortega","doi":"10.23919/eusipco55093.2022.9909594","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909594","url":null,"abstract":"We present a novel framework to represent sets of time-varying signals as dynamic graphs using the non-negative kernel (NNK) graph construction. We extend the original NNK framework to allow explicit delays as part of the graph construction, so that unlike in NNK, two nodes can be connected with an edge corresponding to a non-zero time delay, if there is higher similarity between the signals after shifting one of them. We also propose to characterize the similarity between signals at different nodes using the node degree and clustering coefficients of their respective visibility graphs. Graph edges that can representing temporal delays, we provide a new perspective that enables us to see the effect of synchronization in graph construction for time-series signals. For both temperature and EEG datasets, we show that our proposed approach can achieve sparse and interpretable graph representations. Furthermore, the proposed method can be useful in characterizing different EEG experiments using sparsity.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122894405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909629
N. Stefanakis, Konstantinos Psaroulakis, Nikonas Simou, Christos Astaras
A pipeline for automatic detection of chainsaw events in audio recordings is presented as the means to detect illegal logging activity in a protected natural environment. We propose a two-step process that consists of an activity detector at the front end and a deep neural network (DNN) classifier at the back end. At the front end, we use the Summation or Residual Harmonics method in order to detect patterns with harmonic structure in the audio recording. Active audio segments are consequently fed to the classifier that decides upon the absence or presence of a chainsaw event. As acoustic feature, we propose the widely-used amplitude spectrogram, passing it through the recently proposed Per-Channel Energy Normalization (PCEN) process. Results based on real-field recordings illustrate that the proposed end-to-end system may efficiently detect low-SNR chainsaw events at a very low false detection rate.
{"title":"An Open-Access System for Long-Range Chainsaw Sound Detection","authors":"N. Stefanakis, Konstantinos Psaroulakis, Nikonas Simou, Christos Astaras","doi":"10.23919/eusipco55093.2022.9909629","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909629","url":null,"abstract":"A pipeline for automatic detection of chainsaw events in audio recordings is presented as the means to detect illegal logging activity in a protected natural environment. We propose a two-step process that consists of an activity detector at the front end and a deep neural network (DNN) classifier at the back end. At the front end, we use the Summation or Residual Harmonics method in order to detect patterns with harmonic structure in the audio recording. Active audio segments are consequently fed to the classifier that decides upon the absence or presence of a chainsaw event. As acoustic feature, we propose the widely-used amplitude spectrogram, passing it through the recently proposed Per-Channel Energy Normalization (PCEN) process. Results based on real-field recordings illustrate that the proposed end-to-end system may efficiently detect low-SNR chainsaw events at a very low false detection rate.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114254128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909804
Stefan Thaleiser, G. Enzner
Optimal performance of many speech enhancement methods is bound to an accurate noise power-spectral density (PSD) estimation. While for stationary noises, such as the white Gaussian or car noise, several approaches have proven themselves to perform sufficiently good, non-stationary noise types like the wind noise are more challenging. In the binaural setting and in multichannel systems, the speech-blocking method is essential to recent developments for non-stationary noise estimation. It critically requires information of the acoustic channel transfer function from source to listener. In this paper, we propose such noise-subspace approach for wind-noise PSD estimation, which relies on data-driven blind channel identification in speech presence and on a-priori acoustic channel information (i.e., the steering preset) in speech pause, where the smooth transition of both is controlled by a-priori SNR. The algorithm is designed for entire online operation based on the current noisy frame input. It improves on straightforward recursive subspace analysis and on established single-channel estimation in the wind-noise scenario, while dealing well with speech presence or babble noise too.
{"title":"Binaural Wind-Noise Tracking with Steering Preset","authors":"Stefan Thaleiser, G. Enzner","doi":"10.23919/eusipco55093.2022.9909804","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909804","url":null,"abstract":"Optimal performance of many speech enhancement methods is bound to an accurate noise power-spectral density (PSD) estimation. While for stationary noises, such as the white Gaussian or car noise, several approaches have proven themselves to perform sufficiently good, non-stationary noise types like the wind noise are more challenging. In the binaural setting and in multichannel systems, the speech-blocking method is essential to recent developments for non-stationary noise estimation. It critically requires information of the acoustic channel transfer function from source to listener. In this paper, we propose such noise-subspace approach for wind-noise PSD estimation, which relies on data-driven blind channel identification in speech presence and on a-priori acoustic channel information (i.e., the steering preset) in speech pause, where the smooth transition of both is controlled by a-priori SNR. The algorithm is designed for entire online operation based on the current noisy frame input. It improves on straightforward recursive subspace analysis and on established single-channel estimation in the wind-noise scenario, while dealing well with speech presence or babble noise too.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122114439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909659
Sehun Kim, Tomoki Hayashi, T. Toda
We propose a method that effectively generates a note-level transcription from a guitar sound signal. In recent years, there have been many successful guitar transcription systems. However, most of them generate a frame-level transcription rather than a note-level transcription. Furthermore, it is usually difficult to effectively model long-term characteristics. To address these problems, we propose a novel model architecture using an attention mechanism along with a convolutional neural network (CNN). Our model is capable of modeling both short-term and long-term characteristics of a guitar sound signal and a corresponding guitar transcription. A beat-informed quantization is implemented to generate a note-level transcription. Furthermore, multi-task learning with frame-level and note-level estimations is also implemented to achieve robust training. We conducted experimental evaluations on our method using a publicly available acoustic guitar dataset. We confirmed that 1) the proposed method significantly outperforms the conventional method based on a CNN in frame-level estimation performance and that 2) the proposed method can also generate note-level guitar transcription while preserving high estimation performance.
{"title":"Note-level Automatic Guitar Transcription Using Attention Mechanism","authors":"Sehun Kim, Tomoki Hayashi, T. Toda","doi":"10.23919/eusipco55093.2022.9909659","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909659","url":null,"abstract":"We propose a method that effectively generates a note-level transcription from a guitar sound signal. In recent years, there have been many successful guitar transcription systems. However, most of them generate a frame-level transcription rather than a note-level transcription. Furthermore, it is usually difficult to effectively model long-term characteristics. To address these problems, we propose a novel model architecture using an attention mechanism along with a convolutional neural network (CNN). Our model is capable of modeling both short-term and long-term characteristics of a guitar sound signal and a corresponding guitar transcription. A beat-informed quantization is implemented to generate a note-level transcription. Furthermore, multi-task learning with frame-level and note-level estimations is also implemented to achieve robust training. We conducted experimental evaluations on our method using a publicly available acoustic guitar dataset. We confirmed that 1) the proposed method significantly outperforms the conventional method based on a CNN in frame-level estimation performance and that 2) the proposed method can also generate note-level guitar transcription while preserving high estimation performance.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129766260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909869
K. Chumachenko, Alexandros Iosifidis, M. Gabbouj
This paper presents the problem of country code recognition from li-cense plate images. We propose an approach based on character de-tection and subsequent clustering for country code localization. We further propose three weighted Edit Distance metrics for country of origin prediction from imperfect detections, namely based on char-acter similarity, detection confidence, and relative operation impor-tance. Experimental results show the benefit of proposed approaches on real-world data. The proposed method is lightweight and inde-pendent of the underlying object detector, facilitating its application on edge devices.
{"title":"Weighted Edit Distance for Country Code Recognition in License Plates","authors":"K. Chumachenko, Alexandros Iosifidis, M. Gabbouj","doi":"10.23919/eusipco55093.2022.9909869","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909869","url":null,"abstract":"This paper presents the problem of country code recognition from li-cense plate images. We propose an approach based on character de-tection and subsequent clustering for country code localization. We further propose three weighted Edit Distance metrics for country of origin prediction from imperfect detections, namely based on char-acter similarity, detection confidence, and relative operation impor-tance. Experimental results show the benefit of proposed approaches on real-world data. The proposed method is lightweight and inde-pendent of the underlying object detector, facilitating its application on edge devices.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128464995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909676
Chunxuan Shi, Yongzhe Li, R. Tao
In this paper, we study the joint design of transmit waveforms and receive filter for the multiple-input multiple-output (MIMO) radar with space-time adaptive processing (STAP), wherein the complex environment that involves both clutter and jamming signals is considered. We choose to simultaneously design both the fast-time waveform and slow-time coding among transmitted pulses, together with the design of adaptive processing at receiver, which therefore leads to a three-dimensional STAP for MIMO radar. Specifically, we maximize the signal-to-jammer-plus-clutter-plus-noise ratio at the output, and meanwhile, we ensure the constant-modulus and similarity constraints for the waveform transmission. Based on this, we formulate the joint design as a non-convex optimization problem, and then recast it into a form that allows the application of alternating direction method of multipliers to find its solution. Moreover, we propose an algorithm with fast convergence speed for the conducted design, whose effectiveness is verified by simulations.
{"title":"Design of Spatial Fast- and Slow-Time Waveforms and Receive Filter for MIMO Radar Space-Time Adaptive Processing","authors":"Chunxuan Shi, Yongzhe Li, R. Tao","doi":"10.23919/eusipco55093.2022.9909676","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909676","url":null,"abstract":"In this paper, we study the joint design of transmit waveforms and receive filter for the multiple-input multiple-output (MIMO) radar with space-time adaptive processing (STAP), wherein the complex environment that involves both clutter and jamming signals is considered. We choose to simultaneously design both the fast-time waveform and slow-time coding among transmitted pulses, together with the design of adaptive processing at receiver, which therefore leads to a three-dimensional STAP for MIMO radar. Specifically, we maximize the signal-to-jammer-plus-clutter-plus-noise ratio at the output, and meanwhile, we ensure the constant-modulus and similarity constraints for the waveform transmission. Based on this, we formulate the joint design as a non-convex optimization problem, and then recast it into a form that allows the application of alternating direction method of multipliers to find its solution. Moreover, we propose an algorithm with fast convergence speed for the conducted design, whose effectiveness is verified by simulations.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129353560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909764
Guillermo García-Barrios, D. Krause, A. Politis, A. Mesaros, J. Gutiérrez-Arriola, R. Fraile
This work studies learning-based binaural sound source localization, under the influence of head rotation in rever-berant conditions. Emphasis is on whether knowledge of head rotation can improve localization performance over the non-rotating case for the same acoustic scene. Simulations of binaural head signals of a static and rotating head were conducted, for 5 different rotation speeds and a wide range of reverberant conditions. Several convolutional recurrent neural network mod-els were evaluated including a static head scenario, a model without rotation information, and distinct models differentiated on the way of manipulating the quaternions. The results were analyzed based on the direction-of-arrival error, and they show the importance of using quaternions as additional features, with the best localization accuracy obtained when using an additional convolutional branch that merges the features through addition or concatenation. Nevertheless, raw quaternion features presented lower performance than the static baseline model. Additionally, the study shows the importance of the analysis time window length when using information about head rotation.
{"title":"Binaural source localization using deep learning and head rotation information","authors":"Guillermo García-Barrios, D. Krause, A. Politis, A. Mesaros, J. Gutiérrez-Arriola, R. Fraile","doi":"10.23919/eusipco55093.2022.9909764","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909764","url":null,"abstract":"This work studies learning-based binaural sound source localization, under the influence of head rotation in rever-berant conditions. Emphasis is on whether knowledge of head rotation can improve localization performance over the non-rotating case for the same acoustic scene. Simulations of binaural head signals of a static and rotating head were conducted, for 5 different rotation speeds and a wide range of reverberant conditions. Several convolutional recurrent neural network mod-els were evaluated including a static head scenario, a model without rotation information, and distinct models differentiated on the way of manipulating the quaternions. The results were analyzed based on the direction-of-arrival error, and they show the importance of using quaternions as additional features, with the best localization accuracy obtained when using an additional convolutional branch that merges the features through addition or concatenation. Nevertheless, raw quaternion features presented lower performance than the static baseline model. Additionally, the study shows the importance of the analysis time window length when using information about head rotation.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124611906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}