首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
Multi-Level Adaptive Attention Fusion Network for Infrared and Visible Image Fusion 红外与可见光图像融合的多级自适应注意力融合网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-28 DOI: 10.1109/LSP.2024.3509341
Ziming Hu;Quan Kong;Qing Liao
Infrared and visible image fusion involves integrating complementary or critical information extracted from different source images into one image. Due to the significant differences between the two modality features and those across different scales, traditional fusion strategies, such as addition or concatenation, often result in information redundancy or the degradation of crucial information. This letter proposes a multi-level adaptive attention fusion network to adaptively fuse features extracted from different sources. Specifically, we introduced an Adaptive Scale Attention Fusion (ASAF) module that uses a soft selection mechanism to assess the relative importance of different modality features at the same scale and assign corresponding fusion weights. Additionally, a guided upsampling layer is utilized to integrate shallow and deep feature information at different scales in the multi-scale structure. Qualitative and quantitative results on public datasets validate the superior performance of our approach in both visual effects and quantitative metrics.
红外和可见光图像融合是将从不同源图像中提取的互补或关键信息整合到一幅图像中。由于两种模态特征之间以及不同尺度的模态特征之间存在显著差异,传统的融合策略,如添加或连接,往往会导致信息冗余或关键信息的退化。本文提出了一种多层次的自适应注意力融合网络,以自适应地融合从不同来源提取的特征。具体来说,我们引入了一个自适应尺度注意融合(ASAF)模块,该模块使用软选择机制来评估相同尺度下不同模态特征的相对重要性,并分配相应的融合权重。此外,在多尺度结构中,利用引导上采样层整合不同尺度的浅层和深层特征信息。公共数据集的定性和定量结果验证了我们的方法在视觉效果和定量指标方面的优越性能。
{"title":"Multi-Level Adaptive Attention Fusion Network for Infrared and Visible Image Fusion","authors":"Ziming Hu;Quan Kong;Qing Liao","doi":"10.1109/LSP.2024.3509341","DOIUrl":"https://doi.org/10.1109/LSP.2024.3509341","url":null,"abstract":"Infrared and visible image fusion involves integrating complementary or critical information extracted from different source images into one image. Due to the significant differences between the two modality features and those across different scales, traditional fusion strategies, such as addition or concatenation, often result in information redundancy or the degradation of crucial information. This letter proposes a multi-level adaptive attention fusion network to adaptively fuse features extracted from different sources. Specifically, we introduced an Adaptive Scale Attention Fusion (ASAF) module that uses a soft selection mechanism to assess the relative importance of different modality features at the same scale and assign corresponding fusion weights. Additionally, a guided upsampling layer is utilized to integrate shallow and deep feature information at different scales in the multi-scale structure. Qualitative and quantitative results on public datasets validate the superior performance of our approach in both visual effects and quantitative metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"366-370"},"PeriodicalIF":3.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142912465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cluster Guided Truncated Hashing for Enhanced Approximate Nearest Neighbor Search 基于聚类引导的截断哈希增强近似近邻搜索
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-28 DOI: 10.1109/LSP.2024.3509333
Mingyang Liu;Zuyuan Yang;Wei Han;Shengli Xie
Hashing is essential for approximate nearest neighbor search by mapping high-dimensional data to compact binary codes. The balance between similarity preservation and code diversity is a key challenge. Existing projection-based methods often struggle with fitting binary codes to continuous space due to space heterogeneity. To address this, we propose a novel Cluster Guided Truncated Hashing (CGTH) method that uses latent cluster information to guide the binary learning process. By leveraging data clusters as anchor points and applying a truncated coding strategy, our method effectively maintains local similarity and code diversity. Experiments on benchmark datasets demonstrate that CGTH outperforms existing methods, achieving superior search performance.
哈希是通过将高维数据映射到紧凑的二进制代码进行近似最近邻搜索所必需的。保持相似性和代码多样性之间的平衡是一个关键的挑战。由于空间的异质性,现有的基于投影的方法往往难以将二进制码拟合到连续空间中。为了解决这个问题,我们提出了一种新的簇引导截断哈希(CGTH)方法,该方法使用潜在的簇信息来指导二进制学习过程。通过利用数据簇作为锚点并采用截断编码策略,我们的方法有效地保持了局部相似性和代码多样性。在基准数据集上的实验表明,CGTH优于现有的搜索方法,获得了更好的搜索性能。
{"title":"Cluster Guided Truncated Hashing for Enhanced Approximate Nearest Neighbor Search","authors":"Mingyang Liu;Zuyuan Yang;Wei Han;Shengli Xie","doi":"10.1109/LSP.2024.3509333","DOIUrl":"https://doi.org/10.1109/LSP.2024.3509333","url":null,"abstract":"Hashing is essential for approximate nearest neighbor search by mapping high-dimensional data to compact binary codes. The balance between similarity preservation and code diversity is a key challenge. Existing projection-based methods often struggle with fitting binary codes to continuous space due to space heterogeneity. To address this, we propose a novel Cluster Guided Truncated Hashing (CGTH) method that uses latent cluster information to guide the binary learning process. By leveraging data clusters as anchor points and applying a truncated coding strategy, our method effectively maintains local similarity and code diversity. Experiments on benchmark datasets demonstrate that CGTH outperforms existing methods, achieving superior search performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"181-185"},"PeriodicalIF":3.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patch Inverter: A Novel Block-Wise GAN Inversion Method for Arbitrary Image Resolutions 贴片逆变器:一种适用于任意图像分辨率的GAN分块反演方法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-27 DOI: 10.1109/LSP.2024.3506859
Yifei Li;Mai Xu;Shengxi Li;Jialu Zhang;Zhenyu Guan
Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.
生成式对抗网络(GAN)在从小维度生成逼真图像方面取得了显著进展,这从根本上通过丰富的语义建立了潜在生成空间。因此,GAN 反演旨在将现实世界的图像映射回潜在空间,从而从图像中获取语义。然而,现有的 GAN 反演方法只能反演具有固定分辨率的图像,这大大限制了真实世界场景中的表示能力。为了解决这个问题,我们提出了通过补丁反转图像的方法,因此被命名为补丁反转,这是首次尝试针对任意分辨率的分块反转。更具体地说,我们开发了无填充操作,以确保跨补丁的连续性,并分析了反转过程中的内在不匹配问题。为了缓解这种不匹配,我们提出了一种移位卷积操作,它既能保持图像斑块间的连续性,又能同时扩大每个卷积层的感受野。我们还进一步提出了倒易损失法,将反转潜码正则化,使其驻留在原始潜码生成空间,从而最大限度地保留了丰富的语义。实验结果表明,我们的补丁反相器能够准确反相任意分辨率的图像,同时在真实世界场景中呈现精确而丰富的图像语义。
{"title":"Patch Inverter: A Novel Block-Wise GAN Inversion Method for Arbitrary Image Resolutions","authors":"Yifei Li;Mai Xu;Shengxi Li;Jialu Zhang;Zhenyu Guan","doi":"10.1109/LSP.2024.3506859","DOIUrl":"https://doi.org/10.1109/LSP.2024.3506859","url":null,"abstract":"Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"171-175"},"PeriodicalIF":3.2,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Depth Enhancement Network for RGB-D Salient Object Detection RGB-D显著目标检测的自适应深度增强网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-26 DOI: 10.1109/LSP.2024.3506863
Kang Yi;Yumeng Li;Haoran Tang;Jing Xu
RGB-D Salient Object Detection (SOD) aims to identify and highlight the most visually prominent objects from complex backgrounds by leveraging both RGB and depth information. However, depth maps often suffer from noise and inconsistencies due to the imaging modalities and sensor limitations. Additionally, the low-level spatial details and high-level semantic information from multiple levels pose another complexity layer. These issues result in depth maps that may not align well with the corresponding RGB images, causing incorrect foreground and background segmentation. To address these issues, we propose a novel adaptive depth enhancement network (ADENet), which adopts the Depth Feature Refinement (DFR) module to mitigate the negative impact of low-quality depth data and improve the synergy between multi-modal features. We also design a simple yet effective Cross Modality Fusion (CMF) module that combines the spatial and channel attention mechanisms to calibrate single modality features and boost the fusion. The Progressive Multiscale Aggregation (PMA) decoder has also been introduced to integrate multiscale features, promoting more globally retained information. Extensive experiments illustrate that our proposed ADENet is superior to the other 10 state-of-the-art methods on four benchmark datasets.
RGB- d显著目标检测(SOD)旨在通过利用RGB和深度信息,从复杂背景中识别和突出显示视觉上最突出的目标。然而,由于成像方式和传感器的限制,深度图经常受到噪声和不一致的影响。此外,来自多个层次的低级空间细节和高级语义信息构成了另一个复杂性层。这些问题导致深度图可能无法与相应的RGB图像很好地对齐,从而导致不正确的前景和背景分割。为了解决这些问题,我们提出了一种新的自适应深度增强网络(ADENet),该网络采用深度特征细化(DFR)模块来减轻低质量深度数据的负面影响,并提高多模态特征之间的协同作用。我们还设计了一个简单而有效的跨模态融合(CMF)模块,该模块结合了空间和通道注意机制来校准单一模态特征并促进融合。引入了渐进式多尺度聚合(PMA)解码器来整合多尺度特征,促进更多的全局保留信息。大量的实验表明,我们提出的ADENet在四个基准数据集上优于其他10种最先进的方法。
{"title":"Adaptive Depth Enhancement Network for RGB-D Salient Object Detection","authors":"Kang Yi;Yumeng Li;Haoran Tang;Jing Xu","doi":"10.1109/LSP.2024.3506863","DOIUrl":"https://doi.org/10.1109/LSP.2024.3506863","url":null,"abstract":"RGB-D Salient Object Detection (SOD) aims to identify and highlight the most visually prominent objects from complex backgrounds by leveraging both RGB and depth information. However, depth maps often suffer from noise and inconsistencies due to the imaging modalities and sensor limitations. Additionally, the low-level spatial details and high-level semantic information from multiple levels pose another complexity layer. These issues result in depth maps that may not align well with the corresponding RGB images, causing incorrect foreground and background segmentation. To address these issues, we propose a novel adaptive depth enhancement network (ADENet), which adopts the Depth Feature Refinement (DFR) module to mitigate the negative impact of low-quality depth data and improve the synergy between multi-modal features. We also design a simple yet effective Cross Modality Fusion (CMF) module that combines the spatial and channel attention mechanisms to calibrate single modality features and boost the fusion. The Progressive Multiscale Aggregation (PMA) decoder has also been introduced to integrate multiscale features, promoting more globally retained information. Extensive experiments illustrate that our proposed ADENet is superior to the other 10 state-of-the-art methods on four benchmark datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"176-180"},"PeriodicalIF":3.2,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Many-to-Many Singing Performance Style Transfer on Pitch and Energy Contours 多对多歌唱表演风格在音高和能量轮廓上的传递
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-25 DOI: 10.1109/LSP.2024.3506858
Yu-Teng Hsu;Jun-You Wang;Jyh-Shing Roger Jang
Singing voice conversion (SVC) aims to convert the singer identity of a singing voice to that of another singer. However, most existing SVC systems only perform the conversion of timbre information, while leaving other information unchanged. This approach does not consider other aspects of singer identity, particularly a singer's performance style, which is reflected in the pitch (F0) and the energy (volume dynamics) contours of singing. To address this issue, this paper proposes a many-to-many singing performance style transfer system that converts the pitch and energy contours of one singer's style to another singer's. To achieve this target, we utilize two AutoVC-like autoencoders with an information bottleneck to automatically disentangle performance style from other musical contents, one for the pitch contour while another for the energy contour. Experiment results suggested that the proposed model can perform singing performance style transfer in a many-to-many conversion scenario, resulting in improved singer identity similarity to the target singer.
歌唱声音转换(SVC)的目的是将一个歌唱声音的歌手身份转换为另一个歌手的歌手身份。然而,现有的SVC系统大多只对音色信息进行转换,其他信息保持不变。这种方法没有考虑歌手身份的其他方面,特别是歌手的表演风格,这反映在演唱的音高(F0)和能量(音量动态)轮廓上。为了解决这个问题,本文提出了一个多对多的演唱风格转换系统,将一个歌手的音高和能量轮廓转换为另一个歌手的风格。为了实现这一目标,我们使用了两个带有信息瓶颈的类似autovc的自动编码器来自动从其他音乐内容中分离出演奏风格,一个用于音高轮廓,另一个用于能量轮廓。实验结果表明,该模型可以在多对多转换场景下进行演唱风格迁移,从而提高歌手与目标歌手的身份相似度。
{"title":"Many-to-Many Singing Performance Style Transfer on Pitch and Energy Contours","authors":"Yu-Teng Hsu;Jun-You Wang;Jyh-Shing Roger Jang","doi":"10.1109/LSP.2024.3506858","DOIUrl":"https://doi.org/10.1109/LSP.2024.3506858","url":null,"abstract":"Singing voice conversion (SVC) aims to convert the singer identity of a singing voice to that of another singer. However, most existing SVC systems only perform the conversion of timbre information, while leaving other information unchanged. This approach does not consider other aspects of singer identity, particularly a singer's performance style, which is reflected in the pitch (F0) and the energy (volume dynamics) contours of singing. To address this issue, this paper proposes a many-to-many singing performance style transfer system that converts the pitch and energy contours of one singer's style to another singer's. To achieve this target, we utilize two AutoVC-like autoencoders with an information bottleneck to automatically disentangle performance style from other musical contents, one for the pitch contour while another for the energy contour. Experiment results suggested that the proposed model can perform singing performance style transfer in a many-to-many conversion scenario, resulting in improved singer identity similarity to the target singer.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"166-170"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PACMR: Progressive Adaptive Crossmodal Reinforcement for Multimodal Apparent Personality Traits Analysis 多模态表观人格特征分析的渐进式自适应跨模态强化
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-25 DOI: 10.1109/LSP.2024.3505799
Peng Shen;Dandan Wang;Yingying Xu;Shiqing Zhang;Xiaoming Zhao
Multimodal apparent personality traits analysis is a challenging issue due to the asynchrony among modalities. To address this issue, this paper proposes a Progressive Adaptive Crossmodal Reinforcement (PACMR) approach for multimodal apparent personality traits analysis. PACMR adopts a progressive reinforcement strategy to provide a multi-level information exchange among different modalities for crossmodal interactions, resulting in reinforcing the source and target modalities simultaneously. Specifically, PACMR introduces an Adaptive Modality Reinforcement Unit (AMRU) to adaptively adjust the weights of self-attention and crossmodal attention for capturing reliable contextual dependencies of multimodal sequence data. Experiment results on the public First Impressions dataset demonstrate the effectiveness of the proposed method.
多模态表观人格特征分析是一个具有挑战性的问题。为了解决这一问题,本文提出了一种用于多模态表观人格特征分析的渐进自适应跨模态强化(PACMR)方法。PACMR采用渐进式强化策略,为不同模态之间的跨模态交互提供多层次的信息交换,从而同时强化源模态和目标模态。具体而言,PACMR引入了自适应模态强化单元(AMRU)来自适应调整自注意和跨模态注意的权重,以捕获多模态序列数据的可靠上下文依赖性。在公开的第一印象数据集上的实验结果证明了该方法的有效性。
{"title":"PACMR: Progressive Adaptive Crossmodal Reinforcement for Multimodal Apparent Personality Traits Analysis","authors":"Peng Shen;Dandan Wang;Yingying Xu;Shiqing Zhang;Xiaoming Zhao","doi":"10.1109/LSP.2024.3505799","DOIUrl":"https://doi.org/10.1109/LSP.2024.3505799","url":null,"abstract":"Multimodal apparent personality traits analysis is a challenging issue due to the asynchrony among modalities. To address this issue, this paper proposes a Progressive Adaptive Crossmodal Reinforcement (PACMR) approach for multimodal apparent personality traits analysis. PACMR adopts a progressive reinforcement strategy to provide a multi-level information exchange among different modalities for crossmodal interactions, resulting in reinforcing the source and target modalities simultaneously. Specifically, PACMR introduces an Adaptive Modality Reinforcement Unit (AMRU) to adaptively adjust the weights of self-attention and crossmodal attention for capturing reliable contextual dependencies of multimodal sequence data. Experiment results on the public First Impressions dataset demonstrate the effectiveness of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"161-165"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Attention-Based Feature Processing Method for Cross-Domain Hyperspectral Image Classification 基于注意力的跨域高光谱图像分类特征处理方法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-25 DOI: 10.1109/LSP.2024.3505793
Yazhen Wang;Guojun Liu;Lixia Yang;Junmin Liu;Lili Wei
Cross-domain classification of hyperspectral remote sensing images is one of the hotspots of research in recent years, and its main problem is insufficient training samples. To address this issue, few-shot learning (FSL) has emerged as a promising paradigm in cross-domain classification tasks. However, a notable limitation of most existing FSL methods is that they focus only on local information and less on the critical role of global information. Based on this, this paper proposes a new feature processing method with adaptive band selection, which takes into account the global nature of image features. Firstly, adaptive band analysis is performed in the target domain, and threshold analysis is used to determine the number of selected bands. Secondly, a band selection method is employed to select representative bands from the spectral bands of the high-dimensional data according to the determined band count. Finally, the weights of the selected bands are analyzed, fully considering the importance of pixel weight, and then the results are used as inputs for the classification model. The experimental results on various datasets show that this method can effectively improve the classification accuracy and generalization ability. Meanwhile, the results of the objective accuracy index of the proposed method in different databases improved by 3.9%, 4.7% and 5.4%.
高光谱遥感图像的跨域分类是近年来的研究热点之一,其主要问题是训练样本不足。为了解决这个问题,在跨领域分类任务中出现了一种很有前途的学习模式。然而,大多数现有的FSL方法的一个明显的局限性是它们只关注局部信息,而很少关注全局信息的关键作用。在此基础上,本文提出了一种考虑图像特征全局性的自适应波段选择特征处理方法。首先,在目标域进行自适应频带分析,利用阈值分析确定选择频带的个数;其次,采用波段选择方法,根据确定的波段数,从高维数据的光谱波段中选择具有代表性的波段;最后,充分考虑像素权重的重要性,对所选波段的权重进行分析,然后将结果作为分类模型的输入。在不同数据集上的实验结果表明,该方法能有效提高分类精度和泛化能力。同时,该方法在不同数据库中的客观准确度指标分别提高了3.9%、4.7%和5.4%。
{"title":"An Attention-Based Feature Processing Method for Cross-Domain Hyperspectral Image Classification","authors":"Yazhen Wang;Guojun Liu;Lixia Yang;Junmin Liu;Lili Wei","doi":"10.1109/LSP.2024.3505793","DOIUrl":"https://doi.org/10.1109/LSP.2024.3505793","url":null,"abstract":"Cross-domain classification of hyperspectral remote sensing images is one of the hotspots of research in recent years, and its main problem is insufficient training samples. To address this issue, few-shot learning (FSL) has emerged as a promising paradigm in cross-domain classification tasks. However, a notable limitation of most existing FSL methods is that they focus only on local information and less on the critical role of global information. Based on this, this paper proposes a new feature processing method with adaptive band selection, which takes into account the global nature of image features. Firstly, adaptive band analysis is performed in the target domain, and threshold analysis is used to determine the number of selected bands. Secondly, a band selection method is employed to select representative bands from the spectral bands of the high-dimensional data according to the determined band count. Finally, the weights of the selected bands are analyzed, fully considering the importance of pixel weight, and then the results are used as inputs for the classification model. The experimental results on various datasets show that this method can effectively improve the classification accuracy and generalization ability. Meanwhile, the results of the objective accuracy index of the proposed method in different databases improved by 3.9%, 4.7% and 5.4%.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"196-200"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controllable Conformer for Speech Enhancement and Recognition 语音增强与识别的可控共形器
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-25 DOI: 10.1109/LSP.2024.3505794
Zilu Guo;Jun Du;Sabato Marco Siniscalchi;Jia Pan;Qingfeng Liu
We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach.
我们提出了一种新的语音增强方法,称为语音增强的可控共形器(CCMSE),它利用基于共形器的体系结构集成了控制因子嵌入模块。我们的方法旨在优化人类听觉感知和自动语音识别(ASR)的语音质量。我们观察到,虽然温和的去噪通常可以保持语音的自然度,但较强的去噪可以改善人类的听觉任务,但往往以ASR的准确性为代价,因为失真增加了。为了解决这个问题,我们引入了一个平衡这些权衡的算法。通过利用微分方程在不同降噪强度的输出之间进行插值,我们的方法有效地结合了轻度降噪的鲁棒性和强降噪的清晰度,从而产生非常适合人类和机器听众的增强语音。在CHiME-4数据集上的实验结果验证了该方法的有效性。
{"title":"Controllable Conformer for Speech Enhancement and Recognition","authors":"Zilu Guo;Jun Du;Sabato Marco Siniscalchi;Jia Pan;Qingfeng Liu","doi":"10.1109/LSP.2024.3505794","DOIUrl":"https://doi.org/10.1109/LSP.2024.3505794","url":null,"abstract":"We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"156-160"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quaternion Vector Quantized Variational Autoencoder 四元数矢量量化变分自编码器
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-22 DOI: 10.1109/LSP.2024.3504374
Hui Luo;Xin Liu;Jian Sun;Yang Zhang
Vector quantized variational autoencoders, as variants of variational autoencoders, effectively capture discrete representations by quantizing continuous latent spaces and are widely used in generative tasks. However, these models still face limitations in handling complex image reconstruction, particularly in preserving high-quality details. Moreover, quaternion neural networks have shown unique advantages in handling multi-dimensional data, indicating that integrating quaternion approaches could potentially improve the performance of these autoencoders. To this end, we propose QVQ-VAE, a lightweight network in the quaternion domain that introduces a quaternion-based quantization layer and training strategy to improve reconstruction precision. By fully leveraging quaternion operations, QVQ-VAE reduces the number of model parameters, thereby lowering computational resource demands. Extensive evaluations on face and general object reconstruction tasks show that QVQ-VAE consistently outperforms existing methods while using significantly fewer parameters.
矢量量化变分自编码器作为变分自编码器的一种变体,通过量化连续潜在空间来有效捕获离散表示,广泛应用于生成任务中。然而,这些模型在处理复杂图像重建方面仍然面临局限性,特别是在保留高质量细节方面。此外,四元数神经网络在处理多维数据方面显示出独特的优势,这表明集成四元数方法可能会提高这些自编码器的性能。为此,我们提出了QVQ-VAE,这是一个四元数域的轻量级网络,它引入了基于四元数的量化层和训练策略来提高重建精度。通过充分利用四元数运算,QVQ-VAE减少了模型参数的数量,从而降低了计算资源的需求。对人脸和一般物体重建任务的广泛评估表明,QVQ-VAE在使用更少参数的情况下始终优于现有方法。
{"title":"Quaternion Vector Quantized Variational Autoencoder","authors":"Hui Luo;Xin Liu;Jian Sun;Yang Zhang","doi":"10.1109/LSP.2024.3504374","DOIUrl":"https://doi.org/10.1109/LSP.2024.3504374","url":null,"abstract":"Vector quantized variational autoencoders, as variants of variational autoencoders, effectively capture discrete representations by quantizing continuous latent spaces and are widely used in generative tasks. However, these models still face limitations in handling complex image reconstruction, particularly in preserving high-quality details. Moreover, quaternion neural networks have shown unique advantages in handling multi-dimensional data, indicating that integrating quaternion approaches could potentially improve the performance of these autoencoders. To this end, we propose QVQ-VAE, a lightweight network in the quaternion domain that introduces a quaternion-based quantization layer and training strategy to improve reconstruction precision. By fully leveraging quaternion operations, QVQ-VAE reduces the number of model parameters, thereby lowering computational resource demands. Extensive evaluations on face and general object reconstruction tasks show that QVQ-VAE consistently outperforms existing methods while using significantly fewer parameters.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"151-155"},"PeriodicalIF":3.2,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Range-Ambiguous Clutter Separation via Reweighted Atomic Norm Minimization With EPC-MIMO Radar 基于加权原子范数最小化的EPC-MIMO雷达距离模糊杂波分离
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-21 DOI: 10.1109/LSP.2024.3504339
Jie Gao;Shengqi Zhu;Lan Lan;Jinxin Sui;Ximin Li
The existence of range ambiguity and range dependence will seriously deteriorate the performance of space-time adaptive processing (STAP). In this regard, an adaptive range-ambiguous clutter separation method suitable for the element-pulse coding (EPC)-multiple-input multiple-output (MIMO) radar is developed in this letter. By introducing the EPC factor in both transmit elements and pulses, the clutter located in different range-ambiguous regions can be distinguished in the transmit spatial frequency dimension. Particularly, to ensure the separated performance of range-ambiguous clutter, the EPC factor is designed. Moreover, an approach on the basis of reweighted atomic norm minimization (RANM) is developed to separate the range-ambiguous clutter, leveraging the transmit spatial frequencies of clutter located in various range ambiguity areas. Furthermore, after clutter separation, the clutter is canceled via STAP individually in each range ambiguous region. A series of simulation results validate the efficacy of the proposed approach.
距离模糊和距离依赖的存在会严重影响空时自适应处理(STAP)的性能。为此,本文提出了一种适用于元脉冲编码(EPC)多输入多输出(MIMO)雷达的自适应距离模糊杂波分离方法。通过在发射单元和脉冲中同时引入EPC因子,可以在发射空间频率维度上区分出不同距离模糊区域的杂波。为了保证距离模糊杂波的分离性能,设计了消噪因子。此外,提出了一种基于重加权原子范数最小化(RANM)的距离模糊杂波分离方法,利用不同距离模糊区杂波的发射空间频率分离距离模糊杂波。杂波分离后,在每个距离模糊区域分别通过STAP消除杂波。一系列的仿真结果验证了该方法的有效性。
{"title":"Range-Ambiguous Clutter Separation via Reweighted Atomic Norm Minimization With EPC-MIMO Radar","authors":"Jie Gao;Shengqi Zhu;Lan Lan;Jinxin Sui;Ximin Li","doi":"10.1109/LSP.2024.3504339","DOIUrl":"https://doi.org/10.1109/LSP.2024.3504339","url":null,"abstract":"The existence of range ambiguity and range dependence will seriously deteriorate the performance of space-time adaptive processing (STAP). In this regard, an adaptive range-ambiguous clutter separation method suitable for the element-pulse coding (EPC)-multiple-input multiple-output (MIMO) radar is developed in this letter. By introducing the EPC factor in both transmit elements and pulses, the clutter located in different range-ambiguous regions can be distinguished in the transmit spatial frequency dimension. Particularly, to ensure the separated performance of range-ambiguous clutter, the EPC factor is designed. Moreover, an approach on the basis of reweighted atomic norm minimization (RANM) is developed to separate the range-ambiguous clutter, leveraging the transmit spatial frequencies of clutter located in various range ambiguity areas. Furthermore, after clutter separation, the clutter is canceled via STAP individually in each range ambiguous region. A series of simulation results validate the efficacy of the proposed approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"146-150"},"PeriodicalIF":3.2,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1