Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features

IF 4.1 2区 计算机科学 Q1 ACOUSTICS IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-11-08 DOI:10.1109/TASLP.2024.3486206
Chuxuan Tong;Iynkaran Natgunanathan;Yong Xiang;Jianhua Li;Tianrui Zong;Xi Zheng;Longxiang Gao
{"title":"Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features","authors":"Chuxuan Tong;Iynkaran Natgunanathan;Yong Xiang;Jianhua Li;Tianrui Zong;Xi Zheng;Longxiang Gao","doi":"10.1109/TASLP.2024.3486206","DOIUrl":null,"url":null,"abstract":"Digital watermarking serves as an effective approach for safeguarding speech signal copyrights, achieved by the incorporation of ownership information into the original signal and its subsequent extraction from the watermarked signal. While traditional watermarking methods can embed and extract watermarks successfully when the watermarked signals are not exposed to severe alterations, these methods cannot withstand attacks such as de-synchronization. In this work, we introduce a novel transformer-based framework designed to enhance the imperceptibility and robustness of speech watermarking. This framework incorporates encoders and decoders built on multi-scale transformer blocks to effectively capture local and long-range features from inputs, such as acoustic features extracted by Short-Time Fourier Transformation (STFT). Further, a deep neural networks (DNNs) based generator, notably the Transformer architecture, is employed to adaptively embed imperceptible watermarks. These perturbations serve as a step for simulating noise, thereby bolstering the watermark robustness during the training phase. Experimental results show the superiority of our proposed framework in terms of watermark imperceptibility and robustness against various watermark attacks. When compared to the currently available related techniques, the framework exhibits an eightfold increase in embedding rate. Further, it also presents superior practicality with scalability and reduced inference time of DNN models.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4822-4837"},"PeriodicalIF":4.1000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10748346/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Digital watermarking serves as an effective approach for safeguarding speech signal copyrights, achieved by the incorporation of ownership information into the original signal and its subsequent extraction from the watermarked signal. While traditional watermarking methods can embed and extract watermarks successfully when the watermarked signals are not exposed to severe alterations, these methods cannot withstand attacks such as de-synchronization. In this work, we introduce a novel transformer-based framework designed to enhance the imperceptibility and robustness of speech watermarking. This framework incorporates encoders and decoders built on multi-scale transformer blocks to effectively capture local and long-range features from inputs, such as acoustic features extracted by Short-Time Fourier Transformation (STFT). Further, a deep neural networks (DNNs) based generator, notably the Transformer architecture, is employed to adaptively embed imperceptible watermarks. These perturbations serve as a step for simulating noise, thereby bolstering the watermark robustness during the training phase. Experimental results show the superiority of our proposed framework in terms of watermark imperceptibility and robustness against various watermark attacks. When compared to the currently available related techniques, the framework exhibits an eightfold increase in embedding rate. Further, it also presents superior practicality with scalability and reduced inference time of DNN models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于变压器的声学特征框架增强语音水印的鲁棒性
数字水印是保护语音信号版权的一种有效方法,它通过在原始信号中加入版权信息,然后从水印信号中提取版权信息来实现。虽然传统的水印方法能在水印信号不被严重改变的情况下成功嵌入和提取水印,但这些方法无法抵御去同步化等攻击。在这项工作中,我们介绍了一种基于变压器的新型框架,旨在增强语音水印的不可感知性和鲁棒性。该框架包含建立在多尺度变压器块上的编码器和解码器,可有效捕捉输入的局部和长程特征,如通过短时傅里叶变换(STFT)提取的声学特征。此外,还采用了基于深度神经网络(DNN)的生成器,特别是变压器架构,以自适应地嵌入不易察觉的水印。这些扰动可作为模拟噪声的步骤,从而在训练阶段增强水印的鲁棒性。实验结果表明,我们提出的框架在水印不可感知性和抵御各种水印攻击的鲁棒性方面具有优势。与目前可用的相关技术相比,该框架的嵌入率提高了八倍。此外,它还在可扩展性和减少 DNN 模型推理时间方面表现出卓越的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
期刊最新文献
Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning Knowledge-Guided Transformer for Joint Theme and Emotion Classification of Chinese Classical Poetry WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1