Pub Date : 2022-04-20DOI: 10.3389/frsip.2022.854737
Erik Velan, M. Fontani, Sergio Carrato, M. Jerian
The last decade witnessed a renaissance of machine learning for image processing. Super-resolution (SR) is one of the areas where deep learning techniques have achieved impressive results, with a specific focus on the SR of facial images. Examining and comparing facial images is one of the critical activities in forensic video analysis; a compelling question is thus whether recent SR techniques could help face recognition (FR) made by a human operator, especially in the challenging scenario where very low resolution images are available, which is typical of surveillance recordings. This paper addresses such a question through a simple yet insightful experiment: we used two state-of-the-art deep learning-based SR algorithms to enhance some very low-resolution faces of 30 worldwide celebrities. We then asked a heterogeneous group of more than 130 individuals to recognize them and compared the recognition accuracy against the one achieved by presenting a simple bicubic-interpolated version of the same faces. Results are somehow surprising: despite an undisputed general superiority of SR-enhanced images in terms of visual appearance, SR techniques brought no considerable advantage in overall recognition accuracy.
{"title":"Does Deep Learning-Based Super-Resolution Help Humans With Face Recognition?","authors":"Erik Velan, M. Fontani, Sergio Carrato, M. Jerian","doi":"10.3389/frsip.2022.854737","DOIUrl":"https://doi.org/10.3389/frsip.2022.854737","url":null,"abstract":"The last decade witnessed a renaissance of machine learning for image processing. Super-resolution (SR) is one of the areas where deep learning techniques have achieved impressive results, with a specific focus on the SR of facial images. Examining and comparing facial images is one of the critical activities in forensic video analysis; a compelling question is thus whether recent SR techniques could help face recognition (FR) made by a human operator, especially in the challenging scenario where very low resolution images are available, which is typical of surveillance recordings. This paper addresses such a question through a simple yet insightful experiment: we used two state-of-the-art deep learning-based SR algorithms to enhance some very low-resolution faces of 30 worldwide celebrities. We then asked a heterogeneous group of more than 130 individuals to recognize them and compared the recognition accuracy against the one achieved by presenting a simple bicubic-interpolated version of the same faces. Results are somehow surprising: despite an undisputed general superiority of SR-enhanced images in terms of visual appearance, SR techniques brought no considerable advantage in overall recognition accuracy.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80896581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-13DOI: 10.3389/frsip.2022.861469
Mouath Aouayeb, Catherine Soladié, W. Hamidouche, K. Kpalma, R. Séguier
Facial micro-expressions (MiEs) analysis has applications in various fields, including emotional intelligence, psychotherapy, and police investigation. However, because MiEs are fast, subtle, and local reactions, there is a challenge for humans and machines to detect and recognize them. In this article, we propose a deep learning approach that addresses the locality and the temporal aspects of MiE by learning spatiotemporal features from local facial regions. Our proposed method is particularly unique in that we use two fusion-based squeeze and excitation (SE) strategies to drive the model to learn the optimal combination of extracted spatiotemporal features from each area. The proposed architecture enhances a previous solution of an automatic system for micro-expression recognition (MER) from local facial regions using a composite deep learning model of convolutional neural network (CNN) and long short-term memory (LSTM). Experiments on three spontaneous MiE datasets show that the proposed solution outperforms state-of-the-art approaches. Our code is presented at https://github.com/MouathAb/AnalyseMiE-CNN_LSTM_SE as an open source.
{"title":"Spatiotemporal Features Fusion From Local Facial Regions for Micro-Expressions Recognition","authors":"Mouath Aouayeb, Catherine Soladié, W. Hamidouche, K. Kpalma, R. Séguier","doi":"10.3389/frsip.2022.861469","DOIUrl":"https://doi.org/10.3389/frsip.2022.861469","url":null,"abstract":"Facial micro-expressions (MiEs) analysis has applications in various fields, including emotional intelligence, psychotherapy, and police investigation. However, because MiEs are fast, subtle, and local reactions, there is a challenge for humans and machines to detect and recognize them. In this article, we propose a deep learning approach that addresses the locality and the temporal aspects of MiE by learning spatiotemporal features from local facial regions. Our proposed method is particularly unique in that we use two fusion-based squeeze and excitation (SE) strategies to drive the model to learn the optimal combination of extracted spatiotemporal features from each area. The proposed architecture enhances a previous solution of an automatic system for micro-expression recognition (MER) from local facial regions using a composite deep learning model of convolutional neural network (CNN) and long short-term memory (LSTM). Experiments on three spontaneous MiE datasets show that the proposed solution outperforms state-of-the-art approaches. Our code is presented at https://github.com/MouathAb/AnalyseMiE-CNN_LSTM_SE as an open source.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90224397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-11DOI: 10.3389/frsip.2022.866047
A. Abdou, S. Krishnan
Single-lead wearable electrocardiographic (ECG) devices for remote monitoring are emerging as critical components of the viability of long-term continuous health and wellness monitoring applications. These sensors make it simple to monitor chronically ill patients and the elderly in long-term care homes, as well as empower users focused on fitness and wellbeing with timely health and lifestyle information and metrics. This article addresses the future developments in single-lead electrocardiogram (ECG) wearables, their design concepts, signal processing, machine learning (ML), and emerging healthcare applications. A literature review of multiple wearable ECG remote monitoring devices is first performed; Apple Watch, Kardia, Zio, BioHarness, Bittium Faros and Carnation Ambulatory Monitor. Zio showed the longest wear time with patients wearing the patch for 14 days maximum but required users to mail the device to a processing center for analysis. While the Apple Watch and Kardia showed good quality acquisition of raw ECG but are not continuous monitoring devices. The design considerations for single-lead ECG wearable devices could be classified as follows: power needs, computational complexity, signal quality, and human factors. These dimensions shadow hardware and software characteristics of ECG wearables and can act as a checklist for future single-lead ECG wearable designs. Trends in ECG de-noising, signal processing, feature extraction, compressive sensing (CS), and remote monitoring applications are later followed to show the emerging opportunities and recent innovations in single-lead ECG wearables.
{"title":"Horizons in Single-Lead ECG Analysis From Devices to Data","authors":"A. Abdou, S. Krishnan","doi":"10.3389/frsip.2022.866047","DOIUrl":"https://doi.org/10.3389/frsip.2022.866047","url":null,"abstract":"Single-lead wearable electrocardiographic (ECG) devices for remote monitoring are emerging as critical components of the viability of long-term continuous health and wellness monitoring applications. These sensors make it simple to monitor chronically ill patients and the elderly in long-term care homes, as well as empower users focused on fitness and wellbeing with timely health and lifestyle information and metrics. This article addresses the future developments in single-lead electrocardiogram (ECG) wearables, their design concepts, signal processing, machine learning (ML), and emerging healthcare applications. A literature review of multiple wearable ECG remote monitoring devices is first performed; Apple Watch, Kardia, Zio, BioHarness, Bittium Faros and Carnation Ambulatory Monitor. Zio showed the longest wear time with patients wearing the patch for 14 days maximum but required users to mail the device to a processing center for analysis. While the Apple Watch and Kardia showed good quality acquisition of raw ECG but are not continuous monitoring devices. The design considerations for single-lead ECG wearable devices could be classified as follows: power needs, computational complexity, signal quality, and human factors. These dimensions shadow hardware and software characteristics of ECG wearables and can act as a checklist for future single-lead ECG wearable designs. Trends in ECG de-noising, signal processing, feature extraction, compressive sensing (CS), and remote monitoring applications are later followed to show the emerging opportunities and recent innovations in single-lead ECG wearables.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76791046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-05DOI: 10.3389/frsip.2022.794469
K. Gupta, F. Kaakai, B. Pesquet-Popescu, J. Pesquet, Fragkiskos D. Malliaros
The stability of neural networks with respect to adversarial perturbations has been extensively studied. One of the main strategies consist of quantifying the Lipschitz regularity of neural networks. In this paper, we introduce a multivariate Lipschitz constant-based stability analysis of fully connected neural networks allowing us to capture the influence of each input or group of inputs on the neural network stability. Our approach relies on a suitable re-normalization of the input space, with the objective to perform a more precise analysis than the one provided by a global Lipschitz constant. We investigate the mathematical properties of the proposed multivariate Lipschitz analysis and show its usefulness in better understanding the sensitivity of the neural network with regard to groups of inputs. We display the results of this analysis by a new representation designed for machine learning practitioners and safety engineers termed as a Lipschitz star. The Lipschitz star is a graphical and practical tool to analyze the sensitivity of a neural network model during its development, with regard to different combinations of inputs. By leveraging this tool, we show that it is possible to build robust-by-design models using spectral normalization techniques for controlling the stability of a neural network, given a safety Lipschitz target. Thanks to our multivariate Lipschitz analysis, we can also measure the efficiency of adversarial training in inference tasks. We perform experiments on various open access tabular datasets, and also on a real Thales Air Mobility industrial application subject to certification requirements.
{"title":"Multivariate Lipschitz Analysis of the Stability of Neural Networks","authors":"K. Gupta, F. Kaakai, B. Pesquet-Popescu, J. Pesquet, Fragkiskos D. Malliaros","doi":"10.3389/frsip.2022.794469","DOIUrl":"https://doi.org/10.3389/frsip.2022.794469","url":null,"abstract":"The stability of neural networks with respect to adversarial perturbations has been extensively studied. One of the main strategies consist of quantifying the Lipschitz regularity of neural networks. In this paper, we introduce a multivariate Lipschitz constant-based stability analysis of fully connected neural networks allowing us to capture the influence of each input or group of inputs on the neural network stability. Our approach relies on a suitable re-normalization of the input space, with the objective to perform a more precise analysis than the one provided by a global Lipschitz constant. We investigate the mathematical properties of the proposed multivariate Lipschitz analysis and show its usefulness in better understanding the sensitivity of the neural network with regard to groups of inputs. We display the results of this analysis by a new representation designed for machine learning practitioners and safety engineers termed as a Lipschitz star. The Lipschitz star is a graphical and practical tool to analyze the sensitivity of a neural network model during its development, with regard to different combinations of inputs. By leveraging this tool, we show that it is possible to build robust-by-design models using spectral normalization techniques for controlling the stability of a neural network, given a safety Lipschitz target. Thanks to our multivariate Lipschitz analysis, we can also measure the efficiency of adversarial training in inference tasks. We perform experiments on various open access tabular datasets, and also on a real Thales Air Mobility industrial application subject to certification requirements.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"111 3S 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87594009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-05DOI: 10.3389/frsip.2022.808594
Pritish Chandna, Helena Cuesta, Darius Petermann, E. Gómez
Choral singing in the soprano, alto, tenor and bass (SATB) format is a widely practiced and studied art form with significant cultural importance. Despite the popularity of the choral setting, it has received little attention in the field of Music Information Retrieval. However, the recent publication of high-quality choral singing datasets as well as recent developments in deep learning based methodologies applied to the field of music and speech processing, have opened new avenues for research in this field. In this paper, we use some of the publicly available choral singing datasets to train and evaluate state-of-the-art source separation algorithms from the speech and music domains for the case of choral singing. Furthermore, we evaluate existing monophonic F0 estimators on the separated unison stems and propose an approximation of the perceived F0 of a unison signal. Additionally, we present a set of applications combining the proposed methodologies, including synthesizing a single singer voice from the unison, and transposing and remixing the separated stems into a synthetic multi-singer choral signal. We finally conduct a set of listening tests to perform a perceptual evaluation of the results we obtain with the proposed methodologies.
{"title":"A Deep-Learning Based Framework for Source Separation, Analysis, and Synthesis of Choral Ensembles","authors":"Pritish Chandna, Helena Cuesta, Darius Petermann, E. Gómez","doi":"10.3389/frsip.2022.808594","DOIUrl":"https://doi.org/10.3389/frsip.2022.808594","url":null,"abstract":"Choral singing in the soprano, alto, tenor and bass (SATB) format is a widely practiced and studied art form with significant cultural importance. Despite the popularity of the choral setting, it has received little attention in the field of Music Information Retrieval. However, the recent publication of high-quality choral singing datasets as well as recent developments in deep learning based methodologies applied to the field of music and speech processing, have opened new avenues for research in this field. In this paper, we use some of the publicly available choral singing datasets to train and evaluate state-of-the-art source separation algorithms from the speech and music domains for the case of choral singing. Furthermore, we evaluate existing monophonic F0 estimators on the separated unison stems and propose an approximation of the perceived F0 of a unison signal. Additionally, we present a set of applications combining the proposed methodologies, including synthesizing a single singer voice from the unison, and transposing and remixing the separated stems into a synthetic multi-singer choral signal. We finally conduct a set of listening tests to perform a perceptual evaluation of the results we obtain with the proposed methodologies.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76723978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-05DOI: 10.3389/frsip.2022.857313
Matteo Rossi, L. Marsilio, L. Mainardi, A. Manzotti, P. Cerveri
Unet architectures are being investigated for automatic image segmentation of bones in CT scans because of their ability to address size-varying anatomies and pathological deformations. Nonetheless, changes in mineral density, narrowing of joint spaces and formation of largely irregular osteophytes may easily disrupt automatism requiring extensive manual refinement. A novel Unet variant, called CEL-Unet, is presented to boost the segmentation quality of the femur and tibia in the osteoarthritic knee joint. The neural network embeds region-aware and two contour-aware branches in the decoding path. The paper features three main technical novelties: 1) directed connections between contour and region branches progressively at different decoding scales; 2) pyramidal edge extraction in the contour branch to perform multi-resolution edge processing; 3) distance-weighted cross-entropy loss function to increase delineation quality at the sharp edges of the shapes. A set of 700 knee CT scans was used to train the model and test segmentation performance. Qualitatively CEL-Unet correctly segmented cases where the state-of-the-art architectures failed. Quantitatively, the Jaccard indexes of femur and tibia segmentation were 0.98 and 0.97, with median 3D reconstruction errors less than 0.80 and 0.60 mm, overcoming competitive Unet models. The results were evaluated against knee arthroplasty planning based on personalized surgical instruments (PSI). Excellent agreement with reference data was found for femoral (0.11°) and tibial (0.05°) alignments of the distal and proximal cuts computed on the reconstructed surfaces. The bone segmentation was effective for large pathological deformations and osteophytes, making the techniques potentially usable in PSI-based surgical planning, where the reconstruction accuracy of the bony shapes is one of the main critical factors for the success of the operation.
{"title":"CEL-Unet: Distance Weighted Maps and Multi-Scale Pyramidal Edge Extraction for Accurate Osteoarthritic Bone Segmentation in CT Scans","authors":"Matteo Rossi, L. Marsilio, L. Mainardi, A. Manzotti, P. Cerveri","doi":"10.3389/frsip.2022.857313","DOIUrl":"https://doi.org/10.3389/frsip.2022.857313","url":null,"abstract":"Unet architectures are being investigated for automatic image segmentation of bones in CT scans because of their ability to address size-varying anatomies and pathological deformations. Nonetheless, changes in mineral density, narrowing of joint spaces and formation of largely irregular osteophytes may easily disrupt automatism requiring extensive manual refinement. A novel Unet variant, called CEL-Unet, is presented to boost the segmentation quality of the femur and tibia in the osteoarthritic knee joint. The neural network embeds region-aware and two contour-aware branches in the decoding path. The paper features three main technical novelties: 1) directed connections between contour and region branches progressively at different decoding scales; 2) pyramidal edge extraction in the contour branch to perform multi-resolution edge processing; 3) distance-weighted cross-entropy loss function to increase delineation quality at the sharp edges of the shapes. A set of 700 knee CT scans was used to train the model and test segmentation performance. Qualitatively CEL-Unet correctly segmented cases where the state-of-the-art architectures failed. Quantitatively, the Jaccard indexes of femur and tibia segmentation were 0.98 and 0.97, with median 3D reconstruction errors less than 0.80 and 0.60 mm, overcoming competitive Unet models. The results were evaluated against knee arthroplasty planning based on personalized surgical instruments (PSI). Excellent agreement with reference data was found for femoral (0.11°) and tibial (0.05°) alignments of the distal and proximal cuts computed on the reconstructed surfaces. The bone segmentation was effective for large pathological deformations and osteophytes, making the techniques potentially usable in PSI-based surgical planning, where the reconstruction accuracy of the bony shapes is one of the main critical factors for the success of the operation.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"91 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86707290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-25DOI: 10.3389/frsip.2022.760643
Y. Varshney, Azizuddin Khan
Imagined speech can be used to send commands without any muscle movement or emitting audio. The current status of research is in the early stage, and there is a shortage of open-access datasets for imagined speech analysis. We have proposed an openly accessible electroencephalograph (EEG) dataset for six imagined words in this work. We have selected six phonetically distributed, monosyllabic, and emotionally neutral words from W-22 CID word lists. The phonetic distribution of words consisted of the different places of consonants’ articulation and different positions of tongue advancement for vowel pronunciation. The selected words were “could,” “yard,” “give,” “him,” “there,” and “toe.” The experiment was performed over 15 subjects who performed the overt and imagined speech task for the displayed word. Each word was presented 50 times in random order. EEG signals were recorded during the experiment using a 64-channel EEG acquisition system with a sampling rate of 2,048 Hz. A preliminary analysis of the recorded data is presented by performing the classification of EEGs corresponding to the imagined words. The achieved accuracy is above the chance level for all subjects, which suggests that the recorded EEGs contain distinctive information about the imagined words.
{"title":"Imagined Speech Classification Using Six Phonetically Distributed Words","authors":"Y. Varshney, Azizuddin Khan","doi":"10.3389/frsip.2022.760643","DOIUrl":"https://doi.org/10.3389/frsip.2022.760643","url":null,"abstract":"Imagined speech can be used to send commands without any muscle movement or emitting audio. The current status of research is in the early stage, and there is a shortage of open-access datasets for imagined speech analysis. We have proposed an openly accessible electroencephalograph (EEG) dataset for six imagined words in this work. We have selected six phonetically distributed, monosyllabic, and emotionally neutral words from W-22 CID word lists. The phonetic distribution of words consisted of the different places of consonants’ articulation and different positions of tongue advancement for vowel pronunciation. The selected words were “could,” “yard,” “give,” “him,” “there,” and “toe.” The experiment was performed over 15 subjects who performed the overt and imagined speech task for the displayed word. Each word was presented 50 times in random order. EEG signals were recorded during the experiment using a 64-channel EEG acquisition system with a sampling rate of 2,048 Hz. A preliminary analysis of the recorded data is presented by performing the classification of EEGs corresponding to the imagined words. The achieved accuracy is above the chance level for all subjects, which suggests that the recorded EEGs contain distinctive information about the imagined words.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73016808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-25DOI: 10.3389/frsip.2022.822285
N. Rojhani, M. Greco, F. Gini
In this article, we investigate the problem of jointly estimating target location and velocity for widely separated multiple-input multiple-output (MIMO) radar operating in correlated non-Gaussian clutter, modeled by a complex elliptically symmetric (CES) distribution. More specifically, we derive the Cramér–Rao lower bounds (CRLBs) when the target is modeled by the Swerling 0 model and the clutter is complex t-distributed. We thoroughly analyze the impact of the clutter correlation and spikiness to provide accurate performance estimation. Index terms—Cramér–Rao lower bounds (CRLBs), MIMO radar, location and velocity estimation, performance analysis, complex elliptically symmetric (CES) distributed, and complex t-distribution.
{"title":"CRLBs for Location and Velocity Estimation for MIMO Radars in CES-Distributed Clutter","authors":"N. Rojhani, M. Greco, F. Gini","doi":"10.3389/frsip.2022.822285","DOIUrl":"https://doi.org/10.3389/frsip.2022.822285","url":null,"abstract":"In this article, we investigate the problem of jointly estimating target location and velocity for widely separated multiple-input multiple-output (MIMO) radar operating in correlated non-Gaussian clutter, modeled by a complex elliptically symmetric (CES) distribution. More specifically, we derive the Cramér–Rao lower bounds (CRLBs) when the target is modeled by the Swerling 0 model and the clutter is complex t-distributed. We thoroughly analyze the impact of the clutter correlation and spikiness to provide accurate performance estimation. Index terms—Cramér–Rao lower bounds (CRLBs), MIMO radar, location and velocity estimation, performance analysis, complex elliptically symmetric (CES) distributed, and complex t-distribution.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80981293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-24DOI: 10.3389/frsip.2022.847890
I. Ariav, I. Cohen
Depth information captured by affordable depth sensors is characterized by low spatial resolution, which limits potential applications. Several methods have recently been proposed for guided super-resolution of depth maps using convolutional neural networks to overcome this limitation. In a guided super-resolution scheme, high-resolution depth maps are inferred from low-resolution ones with the additional guidance of a corresponding high-resolution intensity image. However, these methods are still prone to texture copying issues due to improper guidance by the intensity image. We propose a multi-scale residual deep network for depth map super-resolution. A cascaded transformer module incorporates high-resolution structural information from the intensity image into the depth upsampling process. The proposed cascaded transformer module achieves linear complexity in image resolution, making it applicable to high-resolution images. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art techniques for guided depth super-resolution.
{"title":"Depth Map Super-Resolution via Cascaded Transformers Guidance","authors":"I. Ariav, I. Cohen","doi":"10.3389/frsip.2022.847890","DOIUrl":"https://doi.org/10.3389/frsip.2022.847890","url":null,"abstract":"Depth information captured by affordable depth sensors is characterized by low spatial resolution, which limits potential applications. Several methods have recently been proposed for guided super-resolution of depth maps using convolutional neural networks to overcome this limitation. In a guided super-resolution scheme, high-resolution depth maps are inferred from low-resolution ones with the additional guidance of a corresponding high-resolution intensity image. However, these methods are still prone to texture copying issues due to improper guidance by the intensity image. We propose a multi-scale residual deep network for depth map super-resolution. A cascaded transformer module incorporates high-resolution structural information from the intensity image into the depth upsampling process. The proposed cascaded transformer module achieves linear complexity in image resolution, making it applicable to high-resolution images. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art techniques for guided depth super-resolution.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73350761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-17DOI: 10.3389/frsip.2022.800003
Mariem Bouafif Mansali, Pablo Pérez Zarazaga, Tom Bäckström, Z. Lachiri
The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.
{"title":"Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks","authors":"Mariem Bouafif Mansali, Pablo Pérez Zarazaga, Tom Bäckström, Z. Lachiri","doi":"10.3389/frsip.2022.800003","DOIUrl":"https://doi.org/10.3389/frsip.2022.800003","url":null,"abstract":"The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79019704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}