A Lightweight Learning Framework for Packet Loss Concealment and Speech Enhancement

IF 7 1区 计算机科学 Q1 TELECOMMUNICATIONS IEEE Transactions on Cognitive Communications and Networking Pub Date : 2024-10-17 DOI:10.1109/TCCN.2024.3482355
Syu-Siang Wang;Chen-Chih Tsai;Wei-Cheng Yu;Shih-Hau Fang
{"title":"A Lightweight Learning Framework for Packet Loss Concealment and Speech Enhancement","authors":"Syu-Siang Wang;Chen-Chih Tsai;Wei-Cheng Yu;Shih-Hau Fang","doi":"10.1109/TCCN.2024.3482355","DOIUrl":null,"url":null,"abstract":"Voice-related online communication applications are vulnerable to disruptions from complex environments, such as packet loss in IP-switch channels and ambient noise, which hinder communication efficiency. Various packet loss concealment (PLC) and speech enhancement (SE) techniques have been developed to enhance speech quality and to improve user experience. However, generating high-quality speech under noisy conditions with low computational cost remains a significant challenge. This study introduces a lightweight FCN-IPF approach that integrates a fully convolutional network (FCN) with an interpolation-based post-filter (IPF) to reduce signal processing time while improving sound quality and intelligibility. The proposed FCN-IPF is evaluated on system robustness, computational cost, and performance under packet loss and ambient noise conditions. Results show that FCN-IPF-processed speech achieves improvements of 17.65% in sound quality and 12.00% in intelligibility compared to the packet loss input. Under challenging conditions, sound quality and intelligibility are improved by 12.84% and 10.45%, respectively. Additionally, compared to the conventional CRN_FC method, FCN-IPF reduces model processing time per frame from 60.52 ms to 42.69 ms and decreases memory usage from 162 MB to 6.51 MB. These enhancements lower communication latency and boost noise reduction capabilities, making FCN-IPF well-suited for communication in challenging environments.","PeriodicalId":13069,"journal":{"name":"IEEE Transactions on Cognitive Communications and Networking","volume":"11 3","pages":"2043-2053"},"PeriodicalIF":7.0000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10720812/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Voice-related online communication applications are vulnerable to disruptions from complex environments, such as packet loss in IP-switch channels and ambient noise, which hinder communication efficiency. Various packet loss concealment (PLC) and speech enhancement (SE) techniques have been developed to enhance speech quality and to improve user experience. However, generating high-quality speech under noisy conditions with low computational cost remains a significant challenge. This study introduces a lightweight FCN-IPF approach that integrates a fully convolutional network (FCN) with an interpolation-based post-filter (IPF) to reduce signal processing time while improving sound quality and intelligibility. The proposed FCN-IPF is evaluated on system robustness, computational cost, and performance under packet loss and ambient noise conditions. Results show that FCN-IPF-processed speech achieves improvements of 17.65% in sound quality and 12.00% in intelligibility compared to the packet loss input. Under challenging conditions, sound quality and intelligibility are improved by 12.84% and 10.45%, respectively. Additionally, compared to the conventional CRN_FC method, FCN-IPF reduces model processing time per frame from 60.52 ms to 42.69 ms and decreases memory usage from 162 MB to 6.51 MB. These enhancements lower communication latency and boost noise reduction capabilities, making FCN-IPF well-suited for communication in challenging environments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于数据包丢失隐蔽和语音增强的轻量级学习框架
与语音相关的在线通信应用容易受到复杂环境的干扰,例如ip交换信道的丢包和环境噪声,从而影响通信效率。各种丢包隐藏(PLC)和语音增强(SE)技术已经被开发出来,以提高语音质量和改善用户体验。然而,在噪声条件下以低计算成本生成高质量语音仍然是一个重大挑战。本研究介绍了一种轻量级的FCN-IPF方法,该方法将全卷积网络(FCN)与基于插值的后滤波器(IPF)集成在一起,以减少信号处理时间,同时提高声音质量和可理解性。在丢包和环境噪声条件下,评估了FCN-IPF的系统鲁棒性、计算成本和性能。结果表明,与丢包输入相比,fcn - ipf处理后的语音音质提高了17.65%,可理解性提高了12.00%。在具有挑战性的条件下,语音质量和可理解度分别提高了12.84%和10.45%。此外,与传统的CRN_FC方法相比,FCN-IPF将每帧模型处理时间从60.52 ms减少到42.69 ms,将内存使用从162 MB减少到6.51 MB。这些增强降低了通信延迟并提高了降噪能力,使FCN-IPF非常适合在具有挑战性的环境中进行通信。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Cognitive Communications and Networking
IEEE Transactions on Cognitive Communications and Networking Computer Science-Artificial Intelligence
CiteScore
15.50
自引率
7.00%
发文量
108
期刊介绍: The IEEE Transactions on Cognitive Communications and Networking (TCCN) aims to publish high-quality manuscripts that push the boundaries of cognitive communications and networking research. Cognitive, in this context, refers to the application of perception, learning, reasoning, memory, and adaptive approaches in communication system design. The transactions welcome submissions that explore various aspects of cognitive communications and networks, focusing on innovative and holistic approaches to complex system design. Key topics covered include architecture, protocols, cross-layer design, and cognition cycle design for cognitive networks. Additionally, research on machine learning, artificial intelligence, end-to-end and distributed intelligence, software-defined networking, cognitive radios, spectrum sharing, and security and privacy issues in cognitive networks are of interest. The publication also encourages papers addressing novel services and applications enabled by these cognitive concepts.
期刊最新文献
IEEE Communications Society Coverage Optimization in RIS-enabled Satellite-Terrestrial Networks: A Digital Twin-based Spatial-Temporal Approach Curated Collaborative AI Edge with Network Data Analytics for B5G/6G Radio Access Networks Two-Phase Cell Switching in 6G vHetNets: Sleeping-Cell Load Estimation and Renewable-Aware Switching Toward NES Towards Intelligent IoT Services: A Generative AI-Assisted Semantic-Aware Framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1