Digital watermarking serves as an effective approach for safeguarding speech signal copyrights, achieved by the incorporation of ownership information into the original signal and its subsequent extraction from the watermarked signal. While traditional watermarking methods can embed and extract watermarks successfully when the watermarked signals are not exposed to severe alterations, these methods cannot withstand attacks such as de-synchronization. In this work, we introduce a novel transformer-based framework designed to enhance the imperceptibility and robustness of speech watermarking. This framework incorporates encoders and decoders built on multi-scale transformer blocks to effectively capture local and long-range features from inputs, such as acoustic features extracted by Short-Time Fourier Transformation (STFT). Further, a deep neural networks (DNNs) based generator, notably the Transformer architecture, is employed to adaptively embed imperceptible watermarks. These perturbations serve as a step for simulating noise, thereby bolstering the watermark robustness during the training phase. Experimental results show the superiority of our proposed framework in terms of watermark imperceptibility and robustness against various watermark attacks. When compared to the currently available related techniques, the framework exhibits an eightfold increase in embedding rate. Further, it also presents superior practicality with scalability and reduced inference time of DNN models.
{"title":"Enhancing Robustness of Speech Watermarking Using a Transformer-Based Framework Exploiting Acoustic Features","authors":"Chuxuan Tong;Iynkaran Natgunanathan;Yong Xiang;Jianhua Li;Tianrui Zong;Xi Zheng;Longxiang Gao","doi":"10.1109/TASLP.2024.3486206","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3486206","url":null,"abstract":"Digital watermarking serves as an effective approach for safeguarding speech signal copyrights, achieved by the incorporation of ownership information into the original signal and its subsequent extraction from the watermarked signal. While traditional watermarking methods can embed and extract watermarks successfully when the watermarked signals are not exposed to severe alterations, these methods cannot withstand attacks such as de-synchronization. In this work, we introduce a novel transformer-based framework designed to enhance the imperceptibility and robustness of speech watermarking. This framework incorporates encoders and decoders built on multi-scale transformer blocks to effectively capture local and long-range features from inputs, such as acoustic features extracted by Short-Time Fourier Transformation (STFT). Further, a deep neural networks (DNNs) based generator, notably the Transformer architecture, is employed to adaptively embed imperceptible watermarks. These perturbations serve as a step for simulating noise, thereby bolstering the watermark robustness during the training phase. Experimental results show the superiority of our proposed framework in terms of watermark imperceptibility and robustness against various watermark attacks. When compared to the currently available related techniques, the framework exhibits an eightfold increase in embedding rate. Further, it also presents superior practicality with scalability and reduced inference time of DNN models.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4822-4837"},"PeriodicalIF":4.1,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1109/TASLP.2024.3490373
Ren Li;Qiao Xiao;Jianxi Yang;Luyi Zhang;Yu Chen
The rapid development of pre-trained language models (PLMs) has significantly enhanced the performance of machine reading comprehension (MRC). Nevertheless, the traditional fine-tuning approaches necessitate extensive labeled data. MRC remains a challenging task in the few-shot settings or low-resource scenarios. This study proposes a novel few-shot MRC approach via post-training and answer span-oriented contrastive learning, termed MRC-PASCL. Specifically, in the post-training module, a novel noun-entity-aware data selection and generation strategy is proposed according to characteristics of MRC task and data, focusing on masking nouns and named entities in the context. In terms of fine-tuning, the proposed answer span-oriented contrastive learning manner selects spans around the golden answers as negative examples, and performs multi-task learning together with the standard MRC answer prediction task. Experimental results show that MRC-PASCL outperforms the PLMs-based baseline models and the 7B and 13B large language models (LLMs) cross most MRQA 2019 datasets. Further analyses show that our approach achieves better inference efficiency with lower computational resource requirement. The analysis results also indicate that the proposed method can better adapt to the domain-specific scenarios.
{"title":"MRC-PASCL: A Few-Shot Machine Reading Comprehension Approach via Post-Training and Answer Span-Oriented Contrastive Learning","authors":"Ren Li;Qiao Xiao;Jianxi Yang;Luyi Zhang;Yu Chen","doi":"10.1109/TASLP.2024.3490373","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3490373","url":null,"abstract":"The rapid development of pre-trained language models (PLMs) has significantly enhanced the performance of machine reading comprehension (MRC). Nevertheless, the traditional fine-tuning approaches necessitate extensive labeled data. MRC remains a challenging task in the few-shot settings or low-resource scenarios. This study proposes a novel few-shot MRC approach via post-training and answer span-oriented contrastive learning, termed MRC-PASCL. Specifically, in the post-training module, a novel noun-entity-aware data selection and generation strategy is proposed according to characteristics of MRC task and data, focusing on masking nouns and named entities in the context. In terms of fine-tuning, the proposed answer span-oriented contrastive learning manner selects spans around the golden answers as negative examples, and performs multi-task learning together with the standard MRC answer prediction task. Experimental results show that MRC-PASCL outperforms the PLMs-based baseline models and the 7B and 13B large language models (LLMs) cross most MRQA 2019 datasets. Further analyses show that our approach achieves better inference efficiency with lower computational resource requirement. The analysis results also indicate that the proposed method can better adapt to the domain-specific scenarios.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4838-4849"},"PeriodicalIF":4.1,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1109/TASLP.2024.3473294
Munukutla L. N. Srinivas Karthik;Joel S.;Nithin V. George
Decentralized systems are appealing due to their reduced complexity and flexibility. A class of decentralized multi-channel active noise control (MCANC) systems has been developed in this paper. In the first part of the study, a modified filtered-x least mean square/fourth (FxLMS/F) algorithm, which offers improved noise reduction performance over the conventional FxLMS/F algorithm, was developed for MCANC. Further, to reduce the computational complexity of the proposed MCANC system, a nearest Kronecker product (NKP) decomposition strategy has been incorporated to develop decentralized versions of FxLMS/F algorithms. The proposed algorithms have been shown to offer enhanced noise reduction at reduced computational complexity when applied for noise control for narrowband noise, bandlimited white noise, traffic noise and wind noise.
{"title":"FxLMS/F Based Tap Decomposed Adaptive Filter for Decentralized Active Noise Control System","authors":"Munukutla L. N. Srinivas Karthik;Joel S.;Nithin V. George","doi":"10.1109/TASLP.2024.3473294","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3473294","url":null,"abstract":"Decentralized systems are appealing due to their reduced complexity and flexibility. A class of decentralized multi-channel active noise control (MCANC) systems has been developed in this paper. In the first part of the study, a modified filtered-x least mean square/fourth (FxLMS/F) algorithm, which offers improved noise reduction performance over the conventional FxLMS/F algorithm, was developed for MCANC. Further, to reduce the computational complexity of the proposed MCANC system, a nearest Kronecker product (NKP) decomposition strategy has been incorporated to develop decentralized versions of FxLMS/F algorithms. The proposed algorithms have been shown to offer enhanced noise reduction at reduced computational complexity when applied for noise control for narrowband noise, bandlimited white noise, traffic noise and wind noise.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4691-4699"},"PeriodicalIF":4.1,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1109/TASLP.2024.3487419
Shen Wang;Jialiang Dong;Longfei Wu;Zhitao Guan
Large Language Models (LLMs) have shown incomparable representation and generalization capabilities, which have led to significant advancements in Natural Language Processing (NLP). Before deployment, the pre-trained LLMs often need to be tailored to specific downstream tasks for improved performance, which is commonly referred to as downstream alignment. This is a costly effort considering the needed manpower, training resources, and downstream-specific data. While much attention has been paid to protecting the copyright of the models themselves, the copyright protection of LLM alignment has been largely overlooked. In this paper, we present Watermark Embedding for Downstream Alignment (WEDA) scheme, which can provide effective copyright protection for two popular LLM alignment techniques parameter-efficient fine-tuning (PEFT) and in-context learning (ICL). For alignment through PEFT, we propose a Chain of Thought (CoT) based solution to embed watermarks into the PEFT weights. Furthermore, we extend this solution to safeguard alignment through ICL by utilizing the prefix-integrated CoT to watermark examples embedded within ICL prompts. We conduct an extensive experimental evaluation to demonstrate the effectiveness of our proposed scheme.
{"title":"WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment","authors":"Shen Wang;Jialiang Dong;Longfei Wu;Zhitao Guan","doi":"10.1109/TASLP.2024.3487419","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3487419","url":null,"abstract":"Large Language Models (LLMs) have shown incomparable representation and generalization capabilities, which have led to significant advancements in Natural Language Processing (NLP). Before deployment, the pre-trained LLMs often need to be tailored to specific downstream tasks for improved performance, which is commonly referred to as downstream alignment. This is a costly effort considering the needed manpower, training resources, and downstream-specific data. While much attention has been paid to protecting the copyright of the models themselves, the copyright protection of LLM alignment has been largely overlooked. In this paper, we present Watermark Embedding for Downstream Alignment (WEDA) scheme, which can provide effective copyright protection for two popular LLM alignment techniques parameter-efficient fine-tuning (PEFT) and in-context learning (ICL). For alignment through PEFT, we propose a Chain of Thought (CoT) based solution to embed watermarks into the PEFT weights. Furthermore, we extend this solution to safeguard alignment through ICL by utilizing the prefix-integrated CoT to watermark examples embedded within ICL prompts. We conduct an extensive experimental evaluation to demonstrate the effectiveness of our proposed scheme.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4755-4767"},"PeriodicalIF":4.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1109/TASLP.2024.3487409
Yuting Wei;Linmei Hu;Yangfu Zhu;Jiaqi Zhao;Bin Wu
The classifications of the theme and emotion are essential for understanding and organizing Chinese classical poetry. Existing works often overlook the rich semantic knowledge derived from poem annotations, which contain crucial insights into themes and emotions and are instrumental in semantic understanding. Additionally, the complex interdependence and diversity of themes and emotions within poems are frequently disregarded. Hence, this paper introduces a Poetry Knowledge-augmented Joint Model (Poka) specifically designed for the multi-label classification of themes and emotions in Chinese classical poetry. Specifically, we first employ an automated approach to construct two semantic knowledge graphs for theme and emotion. These graphs facilitate a deeper understanding of the poems by bridging the semantic gap between the obscure ancient words and their modern Chinese counterparts. Representations related to themes and emotions are then acquired through a knowledge-guided mask-transformer. Moreover, Poka leverages the inherent correlations between themes and emotions by adopting a joint classification strategy with shared training parameters. Extensive experiments demonstrate that our model achieves state-of-the-art performance on both theme and emotion classifications, especially on tail labels.
{"title":"Knowledge-Guided Transformer for Joint Theme and Emotion Classification of Chinese Classical Poetry","authors":"Yuting Wei;Linmei Hu;Yangfu Zhu;Jiaqi Zhao;Bin Wu","doi":"10.1109/TASLP.2024.3487409","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3487409","url":null,"abstract":"The classifications of the theme and emotion are essential for understanding and organizing Chinese classical poetry. Existing works often overlook the rich semantic knowledge derived from poem annotations, which contain crucial insights into themes and emotions and are instrumental in semantic understanding. Additionally, the complex interdependence and diversity of themes and emotions within poems are frequently disregarded. Hence, this paper introduces a Poetry Knowledge-augmented Joint Model (Poka) specifically designed for the multi-label classification of themes and emotions in Chinese classical poetry. Specifically, we first employ an automated approach to construct two semantic knowledge graphs for theme and emotion. These graphs facilitate a deeper understanding of the poems by bridging the semantic gap between the obscure ancient words and their modern Chinese counterparts. Representations related to themes and emotions are then acquired through a knowledge-guided mask-transformer. Moreover, Poka leverages the inherent correlations between themes and emotions by adopting a joint classification strategy with shared training parameters. Extensive experiments demonstrate that our model achieves state-of-the-art performance on both theme and emotion classifications, especially on tail labels.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4783-4794"},"PeriodicalIF":4.1,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1109/TASLP.2024.3485547
Jun-Yu Ma;Jia-Chen Gu;Zhen-Hua Ling;Quan Liu;Cong Liu;Guoping Hu
Zero-shot cross-lingual information extraction (IE) aims at constructing an IE model for some low-resource target languages, given annotations exclusively in some rich-resource languages. Recent studies have shown language-universal features can bridge the gap between languages. However, prior work has neither explored the potential of establishing interactions between language-universal features and contextual representations nor incorporated features that can effectively model constituent span attributes and relationships between multiple spans. In this study, a s