A Lightweight Learning Framework for Packet Loss Concealment and Speech Enhancement

IF 7 1区计算机科学 Q1 TELECOMMUNICATIONS IEEE Transactions on Cognitive Communications and Networking Pub Date : 2024-10-17 DOI:10.1109/TCCN.2024.3482355

Syu-Siang Wang;Chen-Chih Tsai;Wei-Cheng Yu;Shih-Hau Fang

{"title":"A Lightweight Learning Framework for Packet Loss Concealment and Speech Enhancement","authors":"Syu-Siang Wang;Chen-Chih Tsai;Wei-Cheng Yu;Shih-Hau Fang","doi":"10.1109/TCCN.2024.3482355","DOIUrl":null,"url":null,"abstract":"Voice-related online communication applications are vulnerable to disruptions from complex environments, such as packet loss in IP-switch channels and ambient noise, which hinder communication efficiency. Various packet loss concealment (PLC) and speech enhancement (SE) techniques have been developed to enhance speech quality and to improve user experience. However, generating high-quality speech under noisy conditions with low computational cost remains a significant challenge. This study introduces a lightweight FCN-IPF approach that integrates a fully convolutional network (FCN) with an interpolation-based post-filter (IPF) to reduce signal processing time while improving sound quality and intelligibility. The proposed FCN-IPF is evaluated on system robustness, computational cost, and performance under packet loss and ambient noise conditions. Results show that FCN-IPF-processed speech achieves improvements of 17.65% in sound quality and 12.00% in intelligibility compared to the packet loss input. Under challenging conditions, sound quality and intelligibility are improved by 12.84% and 10.45%, respectively. Additionally, compared to the conventional CRN_FC method, FCN-IPF reduces model processing time per frame from 60.52 ms to 42.69 ms and decreases memory usage from 162 MB to 6.51 MB. These enhancements lower communication latency and boost noise reduction capabilities, making FCN-IPF well-suited for communication in challenging environments.","PeriodicalId":13069,"journal":{"name":"IEEE Transactions on Cognitive Communications and Networking","volume":"11 3","pages":"2043-2053"},"PeriodicalIF":7.0000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10720812/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Voice-related online communication applications are vulnerable to disruptions from complex environments, such as packet loss in IP-switch channels and ambient noise, which hinder communication efficiency. Various packet loss concealment (PLC) and speech enhancement (SE) techniques have been developed to enhance speech quality and to improve user experience. However, generating high-quality speech under noisy conditions with low computational cost remains a significant challenge. This study introduces a lightweight FCN-IPF approach that integrates a fully convolutional network (FCN) with an interpolation-based post-filter (IPF) to reduce signal processing time while improving sound quality and intelligibility. The proposed FCN-IPF is evaluated on system robustness, computational cost, and performance under packet loss and ambient noise conditions. Results show that FCN-IPF-processed speech achieves improvements of 17.65% in sound quality and 12.00% in intelligibility compared to the packet loss input. Under challenging conditions, sound quality and intelligibility are improved by 12.84% and 10.45%, respectively. Additionally, compared to the conventional CRN_FC method, FCN-IPF reduces model processing time per frame from 60.52 ms to 42.69 ms and decreases memory usage from 162 MB to 6.51 MB. These enhancements lower communication latency and boost noise reduction capabilities, making FCN-IPF well-suited for communication in challenging environments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于数据包丢失隐蔽和语音增强的轻量级学习框架

与语音相关的在线通信应用容易受到复杂环境的干扰，例如ip交换信道的丢包和环境噪声，从而影响通信效率。各种丢包隐藏（PLC）和语音增强（SE）技术已经被开发出来，以提高语音质量和改善用户体验。然而，在噪声条件下以低计算成本生成高质量语音仍然是一个重大挑战。本研究介绍了一种轻量级的FCN-IPF方法，该方法将全卷积网络（FCN）与基于插值的后滤波器（IPF）集成在一起，以减少信号处理时间，同时提高声音质量和可理解性。在丢包和环境噪声条件下，评估了FCN-IPF的系统鲁棒性、计算成本和性能。结果表明，与丢包输入相比，fcn - ipf处理后的语音音质提高了17.65%，可理解性提高了12.00%。在具有挑战性的条件下，语音质量和可理解度分别提高了12.84%和10.45%。此外，与传统的CRN_FC方法相比，FCN-IPF将每帧模型处理时间从60.52 ms减少到42.69 ms，将内存使用从162 MB减少到6.51 MB。这些增强降低了通信延迟并提高了降噪能力，使FCN-IPF非常适合在具有挑战性的环境中进行通信。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Cognitive Communications and Networking Computer Science-Artificial Intelligence

CiteScore

15.50

自引率

7.00%

发文量

108

期刊介绍： The IEEE Transactions on Cognitive Communications and Networking (TCCN) aims to publish high-quality manuscripts that push the boundaries of cognitive communications and networking research. Cognitive, in this context, refers to the application of perception, learning, reasoning, memory, and adaptive approaches in communication system design. The transactions welcome submissions that explore various aspects of cognitive communications and networks, focusing on innovative and holistic approaches to complex system design. Key topics covered include architecture, protocols, cross-layer design, and cognition cycle design for cognitive networks. Additionally, research on machine learning, artificial intelligence, end-to-end and distributed intelligence, software-defined networking, cognitive radios, spectrum sharing, and security and privacy issues in cognitive networks are of interest. The publication also encourages papers addressing novel services and applications enabled by these cognitive concepts.