首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
Rethinking Pooling for Multi-Granularity Features in Aerial-View Geo-Localization 反思航拍地理定位中的多粒度特征汇集法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-21 DOI: 10.1109/LSP.2024.3484330
Tingyu Wang;Zihao Yang;Quan Chen;Yaoqi Sun;Chenggang Yan
Vision-based aerial-view geo-localization aims to match drone- and satellite-views of the same geographical location. Several feature partition strategies divide spatial features to mine contextual information. However, the compression from fine-grained features to visual descriptors is ill-considered, that is, classical pooling destroys discriminative features while increasing the sensitivity of networks to contextual information. In order to clarify this, we first review existing pooling layer and analyze their pros and cons when applied in feature compression. Inspired by the appearance of aerial views, we then summarize an ideal feature compression operation, i.e., precisely highlighting the central target while maximizing the use of environmental information in a feature-smoothing manner. To achieve the above process, we propose a distance-dependent parameter initialization strategy and form a novel pooling called $D^{2}$-GeM pooling, which can explicitly guide the network to compress fine-grained features in multiple patterns. Extensive experiments on public benchmark University-1652 substantiate that our strategy attains more appealing results without additional costs.
基于视觉的航空视图地理定位旨在匹配同一地理位置的无人机视图和卫星视图。有几种特征分割策略可以分割空间特征,挖掘上下文信息。然而,从细粒度特征到视觉描述符的压缩考虑不周,也就是说,经典的池化破坏了判别特征,同时增加了网络对上下文信息的敏感性。为了澄清这一点,我们首先回顾了现有的池化层,并分析了它们在应用于特征压缩时的利弊。受鸟瞰图外观的启发,我们总结了理想的特征压缩操作,即在精确突出中心目标的同时,以特征平滑的方式最大限度地利用环境信息。为了实现上述过程,我们提出了一种与距离相关的参数初始化策略,并形成了一种名为 $D^{2}$-GeM pooling 的新型池化,它可以明确引导网络以多种模式压缩细粒度特征。在公共基准 University-1652 上进行的大量实验证明,我们的策略可以在不增加成本的情况下获得更有吸引力的结果。
{"title":"Rethinking Pooling for Multi-Granularity Features in Aerial-View Geo-Localization","authors":"Tingyu Wang;Zihao Yang;Quan Chen;Yaoqi Sun;Chenggang Yan","doi":"10.1109/LSP.2024.3484330","DOIUrl":"https://doi.org/10.1109/LSP.2024.3484330","url":null,"abstract":"Vision-based aerial-view geo-localization aims to match drone- and satellite-views of the same geographical location. Several feature partition strategies divide spatial features to mine contextual information. However, the compression from fine-grained features to visual descriptors is ill-considered, that is, classical pooling destroys discriminative features while increasing the sensitivity of networks to contextual information. In order to clarify this, we first review existing pooling layer and analyze their pros and cons when applied in feature compression. Inspired by the appearance of aerial views, we then summarize an ideal feature compression operation, i.e., precisely highlighting the central target while maximizing the use of environmental information in a feature-smoothing manner. To achieve the above process, we propose a distance-dependent parameter initialization strategy and form a novel pooling called \u0000<inline-formula><tex-math>$D^{2}$</tex-math></inline-formula>\u0000-GeM pooling, which can explicitly guide the network to compress fine-grained features in multiple patterns. Extensive experiments on public benchmark University-1652 substantiate that our strategy attains more appealing results without additional costs.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3005-3009"},"PeriodicalIF":3.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data 利用说话人匿名数据进行多说话人文本到语音训练
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3482701
Wen-Chin Huang;Yi-Chiao Wu;Tomoki Toda
The trend of scaling up speech generation models poses the threat of biometric information leakage of the identities of the voices in the training data, raising privacy and security concerns. In this letter, we investigate the training of multi-speaker text-to-speech (TTS) models using data that underwent speaker anonymization (SA), a process that tends to hide the speaker identity of the input speech while maintaining other attributes. Two signal processing-based and three deep neural network-based SA methods were used to anonymize VCTK, a multi-speaker TTS dataset, which is further used to train an end-to-end TTS model, VITS, to perform unseen speaker TTS during the testing phase. We conducted extensive objective and subjective experiments to evaluate the anonymized training data, as well as the performance of the downstream TTS model trained using those data. Importantly, we found that UTMOS, a data-driven subjective rating predictor model, and GVD, a metric that measures the gain of voice distinctiveness, are good indicators of the downstream TTS performance. We summarize insights in the hope of helping future researchers determine the usefulness of the SA system for multi-speaker TTS training.
语音生成模型的规模不断扩大,这一趋势带来了训练数据中语音身份生物识别信息泄露的威胁,引发了隐私和安全问题。在这封信中,我们研究了使用经过说话人匿名化(SA)处理的数据训练多说话人文本到语音(TTS)模型的问题。我们使用两种基于信号处理的匿名化方法和三种基于深度神经网络的匿名化方法对多说话人文本到语音数据集 VCTK 进行了匿名化处理,并将其进一步用于训练端到端 TTS 模型 VITS,以便在测试阶段执行未见说话人的 TTS。我们进行了大量客观和主观实验,以评估匿名训练数据以及使用这些数据训练的下游 TTS 模型的性能。重要的是,我们发现数据驱动的主观评分预测模型 UTMOS 和衡量语音独特性增益的指标 GVD 是下游 TTS 性能的良好指标。我们总结了这些见解,希望能帮助未来的研究人员确定 SA 系统在多发言人 TTS 培训中的实用性。
{"title":"Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data","authors":"Wen-Chin Huang;Yi-Chiao Wu;Tomoki Toda","doi":"10.1109/LSP.2024.3482701","DOIUrl":"https://doi.org/10.1109/LSP.2024.3482701","url":null,"abstract":"The trend of scaling up speech generation models poses the threat of biometric information leakage of the identities of the voices in the training data, raising privacy and security concerns. In this letter, we investigate the training of multi-speaker text-to-speech (TTS) models using data that underwent speaker anonymization (SA), a process that tends to hide the speaker identity of the input speech while maintaining other attributes. Two signal processing-based and three deep neural network-based SA methods were used to anonymize VCTK, a multi-speaker TTS dataset, which is further used to train an end-to-end TTS model, VITS, to perform unseen speaker TTS during the testing phase. We conducted extensive objective and subjective experiments to evaluate the anonymized training data, as well as the performance of the downstream TTS model trained using those data. Importantly, we found that UTMOS, a data-driven subjective rating predictor model, and GVD, a metric that measures the gain of voice distinctiveness, are good indicators of the downstream TTS performance. We summarize insights in the hope of helping future researchers determine the usefulness of the SA system for multi-speaker TTS training.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2995-2999"},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10720809","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning 音频曼巴用于音频表征学习的双向状态空间模型
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483009
Mehmet Hamza Erol;Arda Senocak;Jiu Feng;Joon Son Chung
Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision tasks in this regard. In this study, we explore whether reliance on self-attention is necessary for audio classification tasks. By introducing Audio Mamba (AuM), the first self-attention-free, purely SSM-based model for audio classification, we aim to address this question. We evaluate AuM on various audio datasets - comprising six different benchmarks - where it achieves comparable or better performance compared to well-established AST model.
变换器已迅速成为音频分类的首选,超过了基于 CNN 的方法。然而,音频频谱图变换器(AST)会因自关注而产生二次缩放。消除这种二次自注意成本是一个很有吸引力的方向。最近,状态空间模型(SSM),如 Mamba,在语言和视觉任务中展示了这方面的潜力。在本研究中,我们将探讨在音频分类任务中是否有必要依赖自我注意。通过引入 Audio Mamba (AuM),我们旨在解决这个问题,AuM 是第一个不依赖自我注意力、纯粹基于 SSM 的音频分类模型。我们在各种音频数据集(包括六个不同的基准)上对 AuM 进行了评估,结果表明它与成熟的 AST 模型相比,性能相当甚至更好。
{"title":"Audio Mamba: Bidirectional State Space Model for Audio Representation Learning","authors":"Mehmet Hamza Erol;Arda Senocak;Jiu Feng;Joon Son Chung","doi":"10.1109/LSP.2024.3483009","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483009","url":null,"abstract":"Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision tasks in this regard. In this study, we explore whether reliance on self-attention is necessary for audio classification tasks. By introducing Audio Mamba (AuM), the first self-attention-free, purely SSM-based model for audio classification, we aim to address this question. We evaluate AuM on various audio datasets - comprising six different benchmarks - where it achieves comparable or better performance compared to well-established AST model.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2975-2979"},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StreamVoice+: Evolving Into End-to-End Streaming Zero-Shot Voice Conversion StreamVoice+:向端到端流媒体零镜头语音转换发展
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483012
Zhichao Wang;Yuanzhe Chen;Xinsheng Wang;Lei Xie;Yuping Wang
StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) into acoustic features with the desired speaker timbre. Despite its innovations, StreamVoice faces challenges due to its dependency on a streaming ASR within a cascaded framework, which complicates system deployment and optimization, affects VC system's design and performance based on the choice of ASR, and struggles with conversion stability when faced with low-quality semantic inputs. To overcome these limitations, we introduce StreamVoice+, an enhanced LM-based end-to-end streaming framework that operates independently of streaming ASR. StreamVoice+ integrates a semantic encoder and a connector with the original StreamVoice framework, now trained using a non-streaming ASR. This model undergoes a two-stage training process: initially, the StreamVoice backbone is pre-trained for voice conversion and the semantic encoder for robust semantic extraction. Subsequently, the system is fine-tuned end-to-end, incorporating a LoRA matrix to activate comprehensive streaming functionality. Furthermore, StreamVoice+ mainly introduces two strategic enhancements to boost conversion quality: a residual compensation mechanism in the connector to ensure effective semantic transmission and a self-refinement strategy that leverages pseudo-parallel speech pairs generated by the conversion backbone to improve speech decoupling. Experiments demonstrate that StreamVoice+ not only achieves higher naturalness and speaker similarity in voice conversion than its predecessor but also provides versatile support for both streaming and non-streaming conversion scenarios.
StreamVoice 最近在流媒体领域推动了零镜头语音转换 (VC) 的发展。它采用可流语言模型(LM)和上下文感知方法,将自动语音识别(ASR)的语义特征转换为具有所需扬声器音色的声学特征。尽管 StreamVoice 具有创新性,但它也面临着挑战,因为它依赖于级联框架内的流式 ASR,这使得系统部署和优化变得复杂,并根据 ASR 的选择影响 VC 系统的设计和性能,而且在面对低质量语义输入时,转换稳定性也很难保证。为了克服这些局限性,我们推出了 StreamVoice+,这是一种基于 LM 的增强型端到端流媒体框架,可独立于流媒体 ASR 运行。StreamVoice+ 将语义编码器和连接器与原始 StreamVoice 框架集成在一起,现在使用非流式 ASR 进行训练。该模型的训练过程分为两个阶段:首先,对 StreamVoice 骨干进行语音转换预训练,对语义编码器进行稳健的语义提取训练。随后,对系统进行端到端微调,纳入 LoRA 矩阵,以激活全面的流媒体功能。此外,StreamVoice+ 还主要引入了两项战略增强功能来提高转换质量:连接器中的残差补偿机制可确保有效的语义传输,而自精简策略则可利用转换主干生成的伪并行语音对来改善语音解耦。实验证明,StreamVoice+ 与前者相比,不仅在语音转换中实现了更高的自然度和说话人相似度,还为流式和非流式转换场景提供了多功能支持。
{"title":"StreamVoice+: Evolving Into End-to-End Streaming Zero-Shot Voice Conversion","authors":"Zhichao Wang;Yuanzhe Chen;Xinsheng Wang;Lei Xie;Yuping Wang","doi":"10.1109/LSP.2024.3483012","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483012","url":null,"abstract":"StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) into acoustic features with the desired speaker timbre. Despite its innovations, StreamVoice faces challenges due to its dependency on a streaming ASR within a cascaded framework, which complicates system deployment and optimization, affects VC system's design and performance based on the choice of ASR, and struggles with conversion stability when faced with low-quality semantic inputs. To overcome these limitations, we introduce StreamVoice+, an enhanced LM-based end-to-end streaming framework that operates independently of streaming ASR. StreamVoice+ integrates a semantic encoder and a connector with the original StreamVoice framework, now trained using a non-streaming ASR. This model undergoes a two-stage training process: initially, the StreamVoice backbone is pre-trained for voice conversion and the semantic encoder for robust semantic extraction. Subsequently, the system is fine-tuned end-to-end, incorporating a LoRA matrix to activate comprehensive streaming functionality. Furthermore, StreamVoice+ mainly introduces two strategic enhancements to boost conversion quality: a residual compensation mechanism in the connector to ensure effective semantic transmission and a self-refinement strategy that leverages pseudo-parallel speech pairs generated by the conversion backbone to improve speech decoupling. Experiments demonstrate that StreamVoice+ not only achieves higher naturalness and speaker similarity in voice conversion than its predecessor but also provides versatile support for both streaming and non-streaming conversion scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3000-3004"},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RFI-Aware and Low-Cost Maximum Likelihood Imaging for High-Sensitivity Radio Telescopes 用于高灵敏度射电望远镜的射频干扰感知和低成本最大似然成像技术
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483011
J. Wang;M. N. El Korso;L. Bacharach;P. Larzabal
This paper addresses the challenge of interference mitigation and reduction of computational cost in the context of radio interferometric imaging. We propose a novel maximum-likelihood-based methodology based on the antenna sub-array switching technique, which strikes a refined balance between imaging accuracy and computational efficiency. In addition, we tackle robustness regarding radio interference by modeling the additive noise as t-distributed. Through simulation results, we demonstrate the superiority of the t-distributed noise model over the conventional Gaussian noise model in scenarios involving interferences. We evidence that our proposed switching approach yields similar imaging performances with far fewer visibilities compared to the full array configuration, thus, diminishing the computational complexity.
本文旨在解决无线电干涉成像中的干扰缓解和计算成本降低问题。我们基于天线子阵列切换技术,提出了一种基于最大似然法的新方法,在成像精度和计算效率之间取得了完美的平衡。此外,我们将加性噪声建模为 t 分布,从而解决了无线电干扰的鲁棒性问题。通过模拟结果,我们证明了在涉及干扰的情况下,t 分布噪声模型优于传统的高斯噪声模型。我们证明,与全阵列配置相比,我们提出的切换方法以更少的可见度获得了类似的成像性能,从而降低了计算复杂度。
{"title":"RFI-Aware and Low-Cost Maximum Likelihood Imaging for High-Sensitivity Radio Telescopes","authors":"J. Wang;M. N. El Korso;L. Bacharach;P. Larzabal","doi":"10.1109/LSP.2024.3483011","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483011","url":null,"abstract":"This paper addresses the challenge of interference mitigation and reduction of computational cost in the context of radio interferometric imaging. We propose a novel maximum-likelihood-based methodology based on the antenna sub-array switching technique, which strikes a refined balance between imaging accuracy and computational efficiency. In addition, we tackle robustness regarding radio interference by modeling the additive noise as t-distributed. Through simulation results, we demonstrate the superiority of the t-distributed noise model over the conventional Gaussian noise model in scenarios involving interferences. We evidence that our proposed switching approach yields similar imaging performances with far fewer visibilities compared to the full array configuration, thus, diminishing the computational complexity.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2960-2964"},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Order Estimation of Linear-Phase FIR Filters for DAC Equalization in Multiple Nyquist Bands 用于多奈奎斯特频带 DAC 均衡的线性相位 FIR 滤波器的阶次估计
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483008
Deijany Rodriguez Linares;Håkan Johansson;Yinan Wang
This letter considers the design and properties of linear-phase finite-length impulse response (FIR) filters for equalization of the frequency responses of digital-to-analog converters (DACs). The letter derives estimates for the filter orders required, as functions of the bandwidth and equalization accuracy, for four DAC pulses that are used in DACs in multiple Nyquist bands. The estimates are derived through a large set of minimax-optimal equalizers and the use of symbolic regression followed by minimax-optimal curve fitting for further enhancement. Design examples demonstrate the accuracy of the proposed estimates. In addition, the letter discusses the appropriateness of the four types of linear-phase FIR filters, for the different equalizer cases, as well as the corresponding properties of the equalized systems.
这封信探讨了用于均衡数模转换器(DAC)频率响应的线性相位有限长度脉冲响应(FIR)滤波器的设计和特性。信中推导了带宽和均衡精度函数下所需滤波器阶数的估算值,适用于多个奈奎斯特频带的 DAC 中使用的四种 DAC 脉冲。这些估算值是通过一大组最小最优均衡器以及使用符号回归和最小最优曲线拟合进一步增强后得出的。设计实例证明了所提出的估计值的准确性。此外,信中还讨论了四种线性相位 FIR 滤波器在不同均衡器情况下的适用性,以及均衡系统的相应特性。
{"title":"Order Estimation of Linear-Phase FIR Filters for DAC Equalization in Multiple Nyquist Bands","authors":"Deijany Rodriguez Linares;Håkan Johansson;Yinan Wang","doi":"10.1109/LSP.2024.3483008","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483008","url":null,"abstract":"This letter considers the design and properties of linear-phase finite-length impulse response (FIR) filters for equalization of the frequency responses of digital-to-analog converters (DACs). The letter derives estimates for the filter orders required, as functions of the bandwidth and equalization accuracy, for four DAC pulses that are used in DACs in multiple Nyquist bands. The estimates are derived through a large set of minimax-optimal equalizers and the use of symbolic regression followed by minimax-optimal curve fitting for further enhancement. Design examples demonstrate the accuracy of the proposed estimates. In addition, the letter discusses the appropriateness of the four types of linear-phase FIR filters, for the different equalizer cases, as well as the corresponding properties of the equalized systems.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2955-2959"},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
System-Informed Neural Network for Frequency Detection 用于频率检测的系统信息神经网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483036
Sunyoung Ko;Myoungin Shin;Geunhwan Kim;Youngmin Choo
We contrive a deep learning-based frequency analysis scheme called system-informed neural network (SINN) by considering the corresponding linear system model. SINN adopts the adaptive learned iterative soft shrinkage algorithm as the NN architecture and includes the system model in loss function. It has good generalization with fast processing time and finds a solution that satisfies the system model as a physics-informed neural network. To further improve SINN, multiple measurements are exploited by assuming the existence of common frequency components over the measurements. SINN is examined using simulated acoustic data, and the performance is compared to Fourier transform and sparse Bayesian learning (SBL) in terms of the detection/false alarm rate and mean squared error. SINN exhibits clear frequency components in in-situ data tests, as in SBL, by reducing noise effectively. Finally, SINN is applied to noisy passive sonar signals, which include 43 frequency components, and many are recovered.
考虑到相应的线性系统模型,我们设计了一种基于深度学习的频率分析方案,称为系统信息神经网络(SINN)。SINN 采用自适应学习迭代软收缩算法作为神经网络架构,并在损失函数中包含系统模型。作为一种物理信息神经网络,它具有良好的泛化能力和快速的处理时间,并能找到满足系统模型的解。为了进一步改进 SINN,通过假设测量中存在共同的频率成分,利用了多重测量。利用模拟声学数据对 SINN 进行了检验,并在检测/误报率和均方误差方面与傅立叶变换和稀疏贝叶斯学习(SBL)进行了性能比较。与 SBL 一样,SINN 通过有效降低噪声,在现场数据测试中表现出清晰的频率成分。最后,将 SINN 应用于包含 43 个频率成分的高噪声被动声纳信号,其中许多频率成分得到了恢复。
{"title":"System-Informed Neural Network for Frequency Detection","authors":"Sunyoung Ko;Myoungin Shin;Geunhwan Kim;Youngmin Choo","doi":"10.1109/LSP.2024.3483036","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483036","url":null,"abstract":"We contrive a deep learning-based frequency analysis scheme called system-informed neural network (SINN) by considering the corresponding linear system model. SINN adopts the adaptive learned iterative soft shrinkage algorithm as the NN architecture and includes the system model in loss function. It has good generalization with fast processing time and finds a solution that satisfies the system model as a physics-informed neural network. To further improve SINN, multiple measurements are exploited by assuming the existence of common frequency components over the measurements. SINN is examined using simulated acoustic data, and the performance is compared to Fourier transform and sparse Bayesian learning (SBL) in terms of the detection/false alarm rate and mean squared error. SINN exhibits clear frequency components in in-situ data tests, as in SBL, by reducing noise effectively. Finally, SINN is applied to noisy passive sonar signals, which include 43 frequency components, and many are recovered.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2980-2984"},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discriminability-Aware Intermediate Domains for Mismatched Steganalysis 用于错配隐写分析的辨识度感知中间域
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-16 DOI: 10.1109/LSP.2024.3482184
Yang Li;Lifang Yu;Shaowei Weng;Huawei Tian;Gang Cao
This letter proposes GDNet equipped with the generation of discriminative mixing regions (GDMR) and discriminability-aware local image mixing (DLIM), a steganalysis network aiming at alleviating significant accuracy degradation caused by cover-source mismatch (CSM), which pertains to the situation where source and target domains come from different distributions. GDNet guides a steganalyzer trained on the source domain to the target domain by mixing the source and target images at the region-level and pixel-level to construct a discriminative intermediate domain. On the one hand, GDMR designs an epoch-related region-level mixing ratio to control the size of the mixed region, and based on this ratio, selects the regions within the target image strongly related to the stego signal to participate in the generation of the intermediate domain, while suppressing other regions weakly related to the stego signal. On the other hand, DLIM utilizes the pixel-level mixing ratio to reduce the impact of the regions weakly related to the stego signal on the discriminability of the intermediate domain as the region-level mixing ratio increases, thereby increasing the diversity of the intermediate domain. Experimental results demonstrate that GDNet significantly outperforms existing methods across various CSM scenarios.
本文提出了带有生成鉴别混合区域(GDMR)和鉴别感知局部图像混合(DLIM)功能的隐写分析网络--GDNet,该网络旨在缓解由于覆盖-来源不匹配(CSM)造成的显著精度下降,CSM涉及源域和目标域来自不同分布的情况。GDNet 通过在区域级和像素级混合源图像和目标图像来构建一个具有区分性的中间域,从而将在源域上训练好的隐分析器引导到目标域。一方面,GDMR 设计了一个与时间相关的区域级混合比率来控制混合区域的大小,并根据该比率在目标图像中选择与偷窃信号关系密切的区域参与中间域的生成,同时抑制与偷窃信号关系较弱的其他区域。另一方面,DLIM 利用像素级混合比,随着区域级混合比的增加,减少与偷窃信号弱相关区域对中间域可辨别性的影响,从而增加中间域的多样性。实验结果表明,在各种 CSM 场景下,GDNet 的性能明显优于现有方法。
{"title":"Discriminability-Aware Intermediate Domains for Mismatched Steganalysis","authors":"Yang Li;Lifang Yu;Shaowei Weng;Huawei Tian;Gang Cao","doi":"10.1109/LSP.2024.3482184","DOIUrl":"https://doi.org/10.1109/LSP.2024.3482184","url":null,"abstract":"This letter proposes GDNet equipped with the generation of discriminative mixing regions (GDMR) and discriminability-aware local image mixing (DLIM), a steganalysis network aiming at alleviating significant accuracy degradation caused by cover-source mismatch (CSM), which pertains to the situation where source and target domains come from different distributions. GDNet guides a steganalyzer trained on the source domain to the target domain by mixing the source and target images at the region-level and pixel-level to construct a discriminative intermediate domain. On the one hand, GDMR designs an epoch-related region-level mixing ratio to control the size of the mixed region, and based on this ratio, selects the regions within the target image strongly related to the stego signal to participate in the generation of the intermediate domain, while suppressing other regions weakly related to the stego signal. On the other hand, DLIM utilizes the pixel-level mixing ratio to reduce the impact of the regions weakly related to the stego signal on the discriminability of the intermediate domain as the region-level mixing ratio increases, thereby increasing the diversity of the intermediate domain. Experimental results demonstrate that GDNet significantly outperforms existing methods across various CSM scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3054-3058"},"PeriodicalIF":3.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Method for Generating Pseudo-Polarization Images 生成伪偏振图像的方法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-16 DOI: 10.1109/LSP.2024.3482229
Shiyu Li;Meijing Gao;Xiangrui Fan;Yang Bai;Yonghao Yan
This letter proposes an algorithm to generate pseudo-polarization images under situations with limited polarization image samples. The algorithm is inspired by the principle of polarimetric imaging with DoFP and utilizes a specially designed 2×2 pseudo-polarization filter to generate two sets of orthogonal pseudo-polarization images. Furthermore, the Gaussian filter layering and image gradient-based feature search method are employed for the simulated generation of polarization features for both specular and diffuse reflections. Experiment results indicate high correspondence between the generated pseudo-polarization and real polarization images. The method effectively simulates polarization images acquired by the DoFP polarimeter under different conditions.
本信提出了一种在偏振图像样本有限的情况下生成伪偏振图像的算法。该算法受 DoFP 偏振成像原理的启发,利用专门设计的 2×2 伪偏振滤波器生成两组正交的伪偏振图像。此外,还采用了高斯滤波器分层和基于图像梯度的特征搜索方法来模拟生成镜面反射和漫反射的偏振特征。实验结果表明,生成的伪偏振图像与真实偏振图像的对应性很高。该方法可有效模拟 DoFP 偏振计在不同条件下获取的偏振图像。
{"title":"A Method for Generating Pseudo-Polarization Images","authors":"Shiyu Li;Meijing Gao;Xiangrui Fan;Yang Bai;Yonghao Yan","doi":"10.1109/LSP.2024.3482229","DOIUrl":"https://doi.org/10.1109/LSP.2024.3482229","url":null,"abstract":"This letter proposes an algorithm to generate pseudo-polarization images under situations with limited polarization image samples. The algorithm is inspired by the principle of polarimetric imaging with DoFP and utilizes a specially designed 2×2 pseudo-polarization filter to generate two sets of orthogonal pseudo-polarization images. Furthermore, the Gaussian filter layering and image gradient-based feature search method are employed for the simulated generation of polarization features for both specular and diffuse reflections. Experiment results indicate high correspondence between the generated pseudo-polarization and real polarization images. The method effectively simulates polarization images acquired by the DoFP polarimeter under different conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3030-3033"},"PeriodicalIF":3.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Noise Adapters for Incremental Speech Enhancement 学习噪声适配器以增强语音效果
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-16 DOI: 10.1109/LSP.2024.3482171
Ziye Yang;Xiang Song;Jie Chen;Cédric Richard;Israel Cohen
Incremental speech enhancement (ISE), with the ability to incrementally adapt to new noise domains, represents a critical yet comparatively under-investigated topic. While the regularization-based method has been proposed to solve the ISE task, it usually suffers from the dilemma wherein the gain of one domain directly entails the loss of another. To solve this issue, we propose an effective paradigm, termed Learning Noise Adapters (LNA), which significantly mitigates the catastrophic domain forgetting phenomenon in the ISE task. In our methodology, we employ a frozen pre-trained model to train and retain a domain-specific adapter for each newly encountered domain, enabling the capture of variations in feature distributions within these domains. Subsequently, our approach involves the development of an unsupervised, training-free noise selector for the inference stage, which is responsible for identifying the domains of test speech samples. A comprehensive experimental validation has substantiated the effectiveness of our approach.
增量语音增强(ISE)能够逐步适应新的噪声域,是一个重要但研究相对不足的课题。虽然已经提出了基于正则化的方法来解决 ISE 任务,但这种方法通常存在两难问题,即一个域的增益会直接导致另一个域的损失。为了解决这个问题,我们提出了一种有效的范式,即学习噪声适配器(LNA),它能显著减轻 ISE 任务中的灾难性域遗忘现象。在我们的方法中,我们采用一个冻结的预训练模型,为每个新遇到的领域训练和保留特定领域的适配器,从而捕捉这些领域内特征分布的变化。随后,我们在推理阶段开发了一种无监督、无训练的噪声选择器,负责识别测试语音样本的领域。全面的实验验证证明了我们方法的有效性。
{"title":"Learning Noise Adapters for Incremental Speech Enhancement","authors":"Ziye Yang;Xiang Song;Jie Chen;Cédric Richard;Israel Cohen","doi":"10.1109/LSP.2024.3482171","DOIUrl":"https://doi.org/10.1109/LSP.2024.3482171","url":null,"abstract":"Incremental speech enhancement (ISE), with the ability to incrementally adapt to new noise domains, represents a critical yet comparatively under-investigated topic. While the regularization-based method has been proposed to solve the ISE task, it usually suffers from the dilemma wherein the gain of one domain directly entails the loss of another. To solve this issue, we propose an effective paradigm, termed Learning Noise Adapters (LNA), which significantly mitigates the catastrophic domain forgetting phenomenon in the ISE task. In our methodology, we employ a frozen pre-trained model to train and retain a domain-specific adapter for each newly encountered domain, enabling the capture of variations in feature distributions within these domains. Subsequently, our approach involves the development of an unsupervised, training-free noise selector for the inference stage, which is responsible for identifying the domains of test speech samples. A comprehensive experimental validation has substantiated the effectiveness of our approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2915-2919"},"PeriodicalIF":3.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1