Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909612
Rodrigo Hernangómez, Alexandros Palaios, Gayathri Guruvayoorappan, Martin Kasparick, N. Ain, Sławomir Stańczak
Quality of service (QoS) estimation is a key enabler in wireless networks. This has been facilitated by the increasing capabilities of machine learning (ML). However, ML algorithms often underperform when presented with non-stationary data, which is typically the case for radio environments. In such environments, ML schemes might require extra signaling for retraining. In this paper, we propose an approach to online QoS estimation, where a trained model can be taken as a base estimator and fine-tuned with information from the user equipment (UE) and the cell itself. The proposed approach is based on the Adaptive Random Forest (ARF) algorithm, which uses streaming data and reacts on changes under concept drift, i.e., to changes in the data's statistical properties. This effectively allows to retrain parts of the ML model as vehicular UEs visit diverse radio environments. We evaluate this method with real data from an extensive measurement campaign in a cellular test network that covered diverse radio environments.
{"title":"Online QoS estimation for vehicular radio environments","authors":"Rodrigo Hernangómez, Alexandros Palaios, Gayathri Guruvayoorappan, Martin Kasparick, N. Ain, Sławomir Stańczak","doi":"10.23919/eusipco55093.2022.9909612","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909612","url":null,"abstract":"Quality of service (QoS) estimation is a key enabler in wireless networks. This has been facilitated by the increasing capabilities of machine learning (ML). However, ML algorithms often underperform when presented with non-stationary data, which is typically the case for radio environments. In such environments, ML schemes might require extra signaling for retraining. In this paper, we propose an approach to online QoS estimation, where a trained model can be taken as a base estimator and fine-tuned with information from the user equipment (UE) and the cell itself. The proposed approach is based on the Adaptive Random Forest (ARF) algorithm, which uses streaming data and reacts on changes under concept drift, i.e., to changes in the data's statistical properties. This effectively allows to retrain parts of the ML model as vehicular UEs visit diverse radio environments. We evaluate this method with real data from an extensive measurement campaign in a cellular test network that covered diverse radio environments.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124467319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909783
Somar Karheily, A. Moukadem, Jean-Baptiste Courbot, D. Abdeslam
This paper proposes a method based on a generalized version of the Discrete Orthonormal Stockwell Transform (GDOST) with Gaussian window to extract features from surface electromyography (sEMG) signals in order to identify hand's movements. The features space derived from the GDOST is then reduced by applying a modified Multi-Dimensional Scaling (MDS) method. The proposed modification on MDS consists in using a translation in kernel building instead of the direct distance calculation. The results are compared with another study applied on the same dataset where usual DOST and MDS are applied. We achieved significant improvements in classification accuracy, attaining 97.56% for 17 hand movements.
{"title":"sEMG feature extraction using Generalized Discrete Orthonormal Stockwell Transform and Modified Multi-Dimensional Scaling","authors":"Somar Karheily, A. Moukadem, Jean-Baptiste Courbot, D. Abdeslam","doi":"10.23919/eusipco55093.2022.9909783","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909783","url":null,"abstract":"This paper proposes a method based on a generalized version of the Discrete Orthonormal Stockwell Transform (GDOST) with Gaussian window to extract features from surface electromyography (sEMG) signals in order to identify hand's movements. The features space derived from the GDOST is then reduced by applying a modified Multi-Dimensional Scaling (MDS) method. The proposed modification on MDS consists in using a translation in kernel building instead of the direct distance calculation. The results are compared with another study applied on the same dataset where usual DOST and MDS are applied. We achieved significant improvements in classification accuracy, attaining 97.56% for 17 hand movements.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117056085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909640
Anubhab Ghosh, Aleix Espuña Fontcuberta, M. Abdalmoaty, S. Chatterjee
We develop a time-varying normalizing flow (TVNF) for explicit generative modeling of dynamical signals. Being explicit, it can generate samples of dynamical signals, and compute the likelihood of a (given) dynamical signal sample. In the proposed model, signal flow in the layers of the normalizing flow is a function of time, which is realized using an encoded representation that is the output of a recurrent neural network (RNN). Given a set of dynamical signals, the parameters of TVNF are learned according to maximum-likelihood approach in conjunction with gradient descent (backpropagation). Use of the proposed model is illustrated for a toy application scenario - maximum-likelihood based speech-phone classification task.
{"title":"Time-varying Normalizing Flow for Generative Modeling of Dynamical Signals","authors":"Anubhab Ghosh, Aleix Espuña Fontcuberta, M. Abdalmoaty, S. Chatterjee","doi":"10.23919/eusipco55093.2022.9909640","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909640","url":null,"abstract":"We develop a time-varying normalizing flow (TVNF) for explicit generative modeling of dynamical signals. Being explicit, it can generate samples of dynamical signals, and compute the likelihood of a (given) dynamical signal sample. In the proposed model, signal flow in the layers of the normalizing flow is a function of time, which is realized using an encoded representation that is the output of a recurrent neural network (RNN). Given a set of dynamical signals, the parameters of TVNF are learned according to maximum-likelihood approach in conjunction with gradient descent (backpropagation). Use of the proposed model is illustrated for a toy application scenario - maximum-likelihood based speech-phone classification task.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121084299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909636
Zicheng Feng, Yu Tsao, Fei Chen
Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models.
{"title":"Recurrent Neural Network-based Estimation and Correction of Relative Transfer Function for Preserving Spatial Cues in Speech Separation","authors":"Zicheng Feng, Yu Tsao, Fei Chen","doi":"10.23919/eusipco55093.2022.9909636","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909636","url":null,"abstract":"Although deep learning-based algorithms have achieved great success in single-channel and multi-channel speech separation tasks, limited studies have focused on the binaural output and the preservation of spatial cues. Existing methods indirectly preserve spatial cues by enhancing signal-to-noise ratios (SNRs), and the accuracy of spatial cue preservation remains unsatisfactory. A framework has been proposed before to directly restore the spatial cues of the separated speech by applying relative transfer function (RTF) estimation and correction after speech separation. To further improve this framework, a new RTF estimator based on recurrent neural network is proposed in this study, which directly estimates the RTF from the separated speech and the noisy mixture. The upgraded framework was evaluated with spatialized WSJ0-2mix dataset with diffused noise. Experimental results showed that the interaural time difference and interaural level difference errors of the separated speech were significantly reduced after RTF correction, and its SNR was not sacrificed. The new RTF estimator further improved the performance of the system, with about 5 times smaller model than the previous one. As the proposed framework does not rely on any specific type of model structure, it could be incorporated with both multi-channel and single-channel speech separation models.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127228702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909870
Tatsuya Koyakumaru, M. Yukawa
This paper presents an efficient robust method to learn sparse graphs from contaminated data. Specifically, the convex-analytic approach using the minimax concave penalty is formulated using the so-called $gamma$-lasso which exploits the $gamma-$ cross entropy. We devise a weighting technique which designs the data weights based on the $ell_{1}$ distance in addition to the Mahalanobis distance for avoiding possible failures of outlier rejection due to the combinatorial graph Laplacian structure. Numerical examples show that the proposed method significantly outperforms $gamma$-lasso and tlasso as well as the existing non-robust graph learning methods in contaminated situations.
{"title":"Efficient Robust Graph Learning Based on Minimax Concave Penalty and $gamma$-Cross Entropy","authors":"Tatsuya Koyakumaru, M. Yukawa","doi":"10.23919/eusipco55093.2022.9909870","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909870","url":null,"abstract":"This paper presents an efficient robust method to learn sparse graphs from contaminated data. Specifically, the convex-analytic approach using the minimax concave penalty is formulated using the so-called $gamma$-lasso which exploits the $gamma-$ cross entropy. We devise a weighting technique which designs the data weights based on the $ell_{1}$ distance in addition to the Mahalanobis distance for avoiding possible failures of outlier rejection due to the combinatorial graph Laplacian structure. Numerical examples show that the proposed method significantly outperforms $gamma$-lasso and tlasso as well as the existing non-robust graph learning methods in contaminated situations.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"03 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127256029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909563
Carolin Wuerich, Eva-Maria Humm, C. Wiede, Gregor Schiele
Conventional blood pressure monitors and sensors have several limitations in terms of accuracy, measurement time, comfort or safety. To address these limitations, we realized and tested a surrogate-based contact-less blood pressure estimation method which relies on a single remote photoplethysmogram (rPPG) captured by camera. From this rPPG signal, we compute 120 features, and perform a sequential forward feature selection to obtain the best subset of features. With a multilayer perceptron model, we obtain a mean absolute error ± standard deviation of MAE $5.50pm 4.52$ mmHg for systolic pressure and $3.73pm 2.86$ mmHg for diastolic pressure. In contrast to previous studies, our model is trained and tested on a data set including normotensive, pre-hypertensive and hypertensive values.
{"title":"A Feature-based Approach on Contact-less Blood Pressure Estimation from Video Data","authors":"Carolin Wuerich, Eva-Maria Humm, C. Wiede, Gregor Schiele","doi":"10.23919/eusipco55093.2022.9909563","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909563","url":null,"abstract":"Conventional blood pressure monitors and sensors have several limitations in terms of accuracy, measurement time, comfort or safety. To address these limitations, we realized and tested a surrogate-based contact-less blood pressure estimation method which relies on a single remote photoplethysmogram (rPPG) captured by camera. From this rPPG signal, we compute 120 features, and perform a sequential forward feature selection to obtain the best subset of features. With a multilayer perceptron model, we obtain a mean absolute error ± standard deviation of MAE $5.50pm 4.52$ mmHg for systolic pressure and $3.73pm 2.86$ mmHg for diastolic pressure. In contrast to previous studies, our model is trained and tested on a data set including normotensive, pre-hypertensive and hypertensive values.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127326556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909550
Michael Beyer, A. Guntoro, Holger Blume
Soft errors, such as bit flips, pose a serious threat to the functional safety of systems. Thus, ensuring the correct operation even in case of errors is particularly relevant for safety-critical applications. In this paper, we present a novel error detection and mitigation method for parallel FFTs in radar signal processing. We systematically define small observation windows in the 2D spectrum to detect peaks caused by soft errors. This enables protecting FFTs with several orders of magnitude lower computational overhead compared to related work. We conduct fault injection experiments to validate our method. Our experiments show that targets can be reliably detected even at higher error rates where more than 500 bit flips are present.
{"title":"Fault-tolerant Radar Signal Processing using Selective Observation Windows and Peak Detection","authors":"Michael Beyer, A. Guntoro, Holger Blume","doi":"10.23919/eusipco55093.2022.9909550","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909550","url":null,"abstract":"Soft errors, such as bit flips, pose a serious threat to the functional safety of systems. Thus, ensuring the correct operation even in case of errors is particularly relevant for safety-critical applications. In this paper, we present a novel error detection and mitigation method for parallel FFTs in radar signal processing. We systematically define small observation windows in the 2D spectrum to detect peaks caused by soft errors. This enables protecting FFTs with several orders of magnitude lower computational overhead compared to related work. We conduct fault injection experiments to validate our method. Our experiments show that targets can be reliably detected even at higher error rates where more than 500 bit flips are present.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124796845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909780
Mitsuhiko Horie, Hiroyuki Kasai
Sequence matching problems have been central to the field of data analysis for decades. Such problems arise in widely diverse areas including computer vision, speech processing, bioinformatics, and natural language processing. However, solving such problems efficiently is difficult because one must consider temporal consistency, neighborhood structure similarity, robustness to noise and outliers, and flexibility on start-end matching points. This paper presents a proposal of a shape-aware Wasserstein distance between sequences building upon optimal transport (OT) framework. The proposed distance considers similarity measures of the elements, their neighborhood structures, and temporal positions. We incorporate these similarity measures into three ground cost matrixes of the OT formulation. The noteworthy contribution is that we formulate these measures as independent OT distances with a single shared optimal transport matrix, and adjust those weights automatically according to their effects on the total OT distance. Numerical evaluations suggest that the sequence matching method using our proposed Wasserstein distance robustly outperforms state-of-the-art methods across different real-world datasets.
{"title":"Auto-weighted Sequential Wasserstein Distance and Application to Sequence Matching","authors":"Mitsuhiko Horie, Hiroyuki Kasai","doi":"10.23919/eusipco55093.2022.9909780","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909780","url":null,"abstract":"Sequence matching problems have been central to the field of data analysis for decades. Such problems arise in widely diverse areas including computer vision, speech processing, bioinformatics, and natural language processing. However, solving such problems efficiently is difficult because one must consider temporal consistency, neighborhood structure similarity, robustness to noise and outliers, and flexibility on start-end matching points. This paper presents a proposal of a shape-aware Wasserstein distance between sequences building upon optimal transport (OT) framework. The proposed distance considers similarity measures of the elements, their neighborhood structures, and temporal positions. We incorporate these similarity measures into three ground cost matrixes of the OT formulation. The noteworthy contribution is that we formulate these measures as independent OT distances with a single shared optimal transport matrix, and adjust those weights automatically according to their effects on the total OT distance. Numerical evaluations suggest that the sequence matching method using our proposed Wasserstein distance robustly outperforms state-of-the-art methods across different real-world datasets.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126157857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909672
Akansha Tyagi, Padmanabhan Rajan
High intra-class variance is one of the significant challenges in solving the problem of acoustic scene classification. This work identifies the recording location (or city) of an audio sample as a source of intra-class variation. We overcome this variation by utilising multi-view learning, where each recording location is considered as a view. Canonical correlation analysis (CCA) based multi-view algorithms learn a subspace where samples from the same class are brought together, and samples from different classes are moved apart, irrespective of the views. By considering cities as views, and by using several variants of CCA algorithms, we show that intra-class variation can be reduced, and location-invariant representations can be learnt. The proposed method demonstrates an improvement of more than 8% on the DCASE 2018 and 2019 datasets, when compared to not using the view information.
{"title":"Location-invariant representations for acoustic scene classification","authors":"Akansha Tyagi, Padmanabhan Rajan","doi":"10.23919/eusipco55093.2022.9909672","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909672","url":null,"abstract":"High intra-class variance is one of the significant challenges in solving the problem of acoustic scene classification. This work identifies the recording location (or city) of an audio sample as a source of intra-class variation. We overcome this variation by utilising multi-view learning, where each recording location is considered as a view. Canonical correlation analysis (CCA) based multi-view algorithms learn a subspace where samples from the same class are brought together, and samples from different classes are moved apart, irrespective of the views. By considering cities as views, and by using several variants of CCA algorithms, we show that intra-class variation can be reduced, and location-invariant representations can be learnt. The proposed method demonstrates an improvement of more than 8% on the DCASE 2018 and 2019 datasets, when compared to not using the view information.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"1994 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125546601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-29DOI: 10.23919/eusipco55093.2022.9909754
B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic
The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.
{"title":"Speech Enhancement Using Augmented SSL CycleGAN","authors":"B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic","doi":"10.23919/eusipco55093.2022.9909754","DOIUrl":"https://doi.org/10.23919/eusipco55093.2022.9909754","url":null,"abstract":"The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"395 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116392324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}