首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
Infrared Small Target Detection via Local-Global Feature Fusion 基于局部-全局特征融合的红外小目标检测
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-26 DOI: 10.1109/LSP.2024.3523226
Lang Wu;Yong Ma;Fan Fan;Jun Huang
Due to the high-luminance (HL) background clutter in infrared (IR) images, the existing IR small target detection methods struggle to achieve a good balance between efficiency and performance. Addressing the issue of HL clutter, which is difficult to suppress, leading to a high false alarm rate, this letter proposes an IR small target detection method based on local-global feature fusion (LGFF). We develop a fast and efficient local feature extraction operator and utilize global rarity to characterize the global feature of small targets, effectively suppressing a significant amount of HL clutter. By integrating local and global features, we achieve further enhancement of the targets and robust suppression of the clutter. Experimental results demonstrate that the proposed method outperforms existing methods in terms of target enhancement, clutter removal, and real-time performance.
由于红外图像中存在高亮度背景杂波,现有红外小目标检测方法难以在效率和性能之间取得良好的平衡。针对HL杂波难以抑制、虚警率高的问题,提出了一种基于局部-全局特征融合(LGFF)的红外小目标检测方法。我们开发了一种快速高效的局部特征提取算子,并利用全局稀有度来表征小目标的全局特征,有效地抑制了大量的HL杂波。结合局部特征和全局特征,进一步增强了目标的抗干扰能力,实现了对杂波的鲁棒抑制。实验结果表明,该方法在目标增强、杂波去除和实时性方面优于现有方法。
{"title":"Infrared Small Target Detection via Local-Global Feature Fusion","authors":"Lang Wu;Yong Ma;Fan Fan;Jun Huang","doi":"10.1109/LSP.2024.3523226","DOIUrl":"https://doi.org/10.1109/LSP.2024.3523226","url":null,"abstract":"Due to the high-luminance (HL) background clutter in infrared (IR) images, the existing IR small target detection methods struggle to achieve a good balance between efficiency and performance. Addressing the issue of HL clutter, which is difficult to suppress, leading to a high false alarm rate, this letter proposes an IR small target detection method based on local-global feature fusion (LGFF). We develop a fast and efficient local feature extraction operator and utilize global rarity to characterize the global feature of small targets, effectively suppressing a significant amount of HL clutter. By integrating local and global features, we achieve further enhancement of the targets and robust suppression of the clutter. Experimental results demonstrate that the proposed method outperforms existing methods in terms of target enhancement, clutter removal, and real-time performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"466-470"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement GALD-SE:有效语音增强的引导各向异性轻量扩散
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-26 DOI: 10.1109/LSP.2024.3522852
Chengzhong Wang;Jianjun Gu;Dingding Yao;Junfeng Li;Yonghong Yan
Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion models have gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the distribution of the signal with isotropic Gaussian noise and recover clean speech distribution from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy recordings. This approach substantially reduces computational complexity while exhibiting robustness against various forms of noise and speech distortion. Experiments demonstrate that the proposed method achieves state-of-the-art results with only approximately 4.5 million parameters, a number significantly lower than that required by other diffusion methods. This effectively narrows the model size disparity between diffusion-based and predictive speech enhancement approaches. Additionally, the proposed method performs well in very noisy scenarios, demonstrating its potential for applications in highly challenging environments.
语音增强的目的是在不同的噪声条件下提高语音的清晰度和质量。近年来,扩散模型在语音增强领域得到了广泛的关注,并取得了较好的效果。目前基于扩散的方法模糊了各向同性高斯噪声的信号分布,并从先验中恢复干净的语音分布。然而,这些方法往往有大量的计算负担。我们认为,计算效率低下的部分原因在于人们忽视了语音增强不是纯粹的生成任务;它主要涉及降噪和缺失信息的补充,而原始混合中的干净线索不需要再生。在本文中,我们提出了一种在扩散过程中引入各向异性引导噪声的方法,使神经网络能够在噪声记录中保留干净的线索。这种方法大大降低了计算复杂度,同时表现出对各种形式的噪声和语音失真的鲁棒性。实验表明,该方法仅需要约450万个参数即可获得最先进的结果,大大低于其他扩散方法所需的参数。这有效地缩小了基于扩散和预测语音增强方法之间的模型大小差距。此外,所提出的方法在非常嘈杂的场景中表现良好,证明了其在高度挑战性环境中的应用潜力。
{"title":"GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement","authors":"Chengzhong Wang;Jianjun Gu;Dingding Yao;Junfeng Li;Yonghong Yan","doi":"10.1109/LSP.2024.3522852","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522852","url":null,"abstract":"Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion models have gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the distribution of the signal with isotropic Gaussian noise and recover clean speech distribution from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy recordings. This approach substantially reduces computational complexity while exhibiting robustness against various forms of noise and speech distortion. Experiments demonstrate that the proposed method achieves state-of-the-art results with only approximately 4.5 million parameters, a number significantly lower than that required by other diffusion methods. This effectively narrows the model size disparity between diffusion-based and predictive speech enhancement approaches. Additionally, the proposed method performs well in very noisy scenarios, demonstrating its potential for applications in highly challenging environments.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"426-430"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing No-Reference Audio-Visual Quality Assessment via Joint Cross-Attention Fusion 通过关节交叉注意融合增强无参考视听质量评估
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-25 DOI: 10.1109/LSP.2024.3522855
Zhaolin Wan;Xiguang Hao;Xiaopeng Fan;Wangmeng Zuo;Debin Zhao
As the consumption of multimedia content continues to rise, audio and video have become central to everyday entertainment and social interactions. This growing reliance amplifies the demand for effective and objective audio-visual quality assessment (AVQA) to understand the interaction between audio and visual elements, ultimately enhancing user satisfaction. However, existing state-of-the-art AVQA methods often rely on simplistic machine learning models or fully connected networks for audio-visual signal fusion, which limits their ability to exploit the complementary nature of these modalities. In response to this gap, we propose a novel no-reference AVQA method that utilizes joint cross-attention fusion of audio-visual perception. Our approach begins with a dual-stream feature extraction process that simultaneously captures long-range spatiotemporal visual features and audio features. The fusion model then dynamically adjusts the contributions of features from both modalities, effectively integrating them to provide a more comprehensive perception for quality score prediction. Experimental results on the LIVE-SJTU and UnB-AVC datasets demonstrate that our model outperforms state-of-the-art methods, achieving superior performance in audio-visual quality assessment.
随着多媒体内容消费的持续增长,音频和视频已经成为日常娱乐和社会互动的核心。这种日益增长的依赖放大了对有效和客观的视听质量评估(AVQA)的需求,以了解音频和视觉元素之间的相互作用,最终提高用户满意度。然而,现有的最先进的AVQA方法通常依赖于简单的机器学习模型或完全连接的网络进行视听信号融合,这限制了它们利用这些模式的互补性的能力。针对这一缺陷,我们提出了一种利用视听感知联合交叉注意融合的无参考AVQA方法。我们的方法从双流特征提取过程开始,同时捕获远程时空视觉特征和音频特征。然后,融合模型动态调整两种模式的特征贡献,有效地整合它们,为质量分数预测提供更全面的感知。在LIVE-SJTU和UnB-AVC数据集上的实验结果表明,我们的模型优于目前最先进的方法,在视听质量评估方面取得了卓越的表现。
{"title":"Enhancing No-Reference Audio-Visual Quality Assessment via Joint Cross-Attention Fusion","authors":"Zhaolin Wan;Xiguang Hao;Xiaopeng Fan;Wangmeng Zuo;Debin Zhao","doi":"10.1109/LSP.2024.3522855","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522855","url":null,"abstract":"As the consumption of multimedia content continues to rise, audio and video have become central to everyday entertainment and social interactions. This growing reliance amplifies the demand for effective and objective audio-visual quality assessment (AVQA) to understand the interaction between audio and visual elements, ultimately enhancing user satisfaction. However, existing state-of-the-art AVQA methods often rely on simplistic machine learning models or fully connected networks for audio-visual signal fusion, which limits their ability to exploit the complementary nature of these modalities. In response to this gap, we propose a novel no-reference AVQA method that utilizes joint cross-attention fusion of audio-visual perception. Our approach begins with a dual-stream feature extraction process that simultaneously captures long-range spatiotemporal visual features and audio features. The fusion model then dynamically adjusts the contributions of features from both modalities, effectively integrating them to provide a more comprehensive perception for quality score prediction. Experimental results on the LIVE-SJTU and UnB-AVC datasets demonstrate that our model outperforms state-of-the-art methods, achieving superior performance in audio-visual quality assessment.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"556-560"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outlier Indicator Based Projection Fuzzy K-Means Clustering for Hyperspectral Image 基于离群指标的高光谱图像投影模糊k均值聚类
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-25 DOI: 10.1109/LSP.2024.3521714
Xinze Liu;Xiaojun Yang;Jiale Zhang;Jing Wang;Feiping Nie
The application of hyperspectral image (HSI) clustering has become widely used in the field of remote sensing. Traditional fuzzy K-means clustering methods often struggle with HSI data due to the significant levels of noise, consequently resulting in segmentation inaccuracies. To address this limitation, this letter introduces an innovative outlier indicator-based projection fuzzy K-means clustering (OIPFK) algorithm for clustering of HSI data, enhancing the efficacy and robustness of previous fuzzy K-means methodologies through a two-pronged strategy. Initially, an outlier indicator vector is constructed to identify noise and outliers by computing the distances between each data point in a reduced dimensional space. Subsequently, the OIPFK algorithm incorporates the fuzzy membership relationships between samples and clustering centers within this lower-dimensional framework, along with the integration of the outlier indicator vectors, to significantly mitigates the influence of noise and extraneous features. Moreover, an efficient iterative optimization algorithm is employed to address the optimization challenges inherent to OIPKM. Experimental results from three real-world hyperspectral image datasets demonstrate the effectiveness and superiority of our proposed method.
高光谱图像聚类技术在遥感领域的应用越来越广泛。传统的模糊k均值聚类方法往往难以与恒指数据由于显著水平的噪声,从而导致分割不准确。为了解决这一限制,本文介绍了一种创新的基于离群指标的投影模糊k均值聚类(OIPFK)算法,用于恒生指数数据的聚类,通过双管齐下的策略提高了以前模糊k均值方法的有效性和鲁棒性。首先,通过在降维空间中计算每个数据点之间的距离,构造一个离群指标向量来识别噪声和离群值。随后,OIPFK算法将样本和聚类中心之间的模糊隶属关系纳入该低维框架,并结合离群指标向量的集成,显著减轻了噪声和无关特征的影响。此外,采用了一种高效的迭代优化算法来解决OIPKM固有的优化挑战。三个真实高光谱图像数据集的实验结果证明了该方法的有效性和优越性。
{"title":"Outlier Indicator Based Projection Fuzzy K-Means Clustering for Hyperspectral Image","authors":"Xinze Liu;Xiaojun Yang;Jiale Zhang;Jing Wang;Feiping Nie","doi":"10.1109/LSP.2024.3521714","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521714","url":null,"abstract":"The application of hyperspectral image (HSI) clustering has become widely used in the field of remote sensing. Traditional fuzzy K-means clustering methods often struggle with HSI data due to the significant levels of noise, consequently resulting in segmentation inaccuracies. To address this limitation, this letter introduces an innovative outlier indicator-based projection fuzzy K-means clustering (OIPFK) algorithm for clustering of HSI data, enhancing the efficacy and robustness of previous fuzzy K-means methodologies through a two-pronged strategy. Initially, an outlier indicator vector is constructed to identify noise and outliers by computing the distances between each data point in a reduced dimensional space. Subsequently, the OIPFK algorithm incorporates the fuzzy membership relationships between samples and clustering centers within this lower-dimensional framework, along with the integration of the outlier indicator vectors, to significantly mitigates the influence of noise and extraneous features. Moreover, an efficient iterative optimization algorithm is employed to address the optimization challenges inherent to OIPKM. Experimental results from three real-world hyperspectral image datasets demonstrate the effectiveness and superiority of our proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"496-500"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interbeat Interval Filtering 拍间间隔过滤
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-25 DOI: 10.1109/LSP.2024.3522853
İlker Bayram
Several inhibitory and excitatory factors regulate the beating of the heart. Consequently, the interbeat intervals (IBIs) vary around a mean value. Various statistics have been proposed to capture heart rate variability (HRV) to give a glimpse into this balance. However, these statistics require accurate estimation of IBIs as a first step, which can be challenging especially for signals recorded in ambulatory conditions. We propose a lightweight state-space filter that models the IBIs as samples of an inverse Gaussian distribution with time-varying parameters. We make the filter robust against outliers by adapting the probabilistic data association filter to the setup. We demonstrate that the resulting filter can accurately identify outliers and the parameters of the tracked distribution can be used to compute a specific HRV statistic (standard deviation of normal-to-normal intervals, SDNN) without further analysis.
几种抑制性和兴奋性因素调节心脏的跳动。因此,间隔时间(ibi)在平均值附近变化。人们提出了各种统计数据来捕捉心率变异性(HRV),以了解这种平衡。然而,这些统计数据需要准确估计ibi作为第一步,这可能具有挑战性,特别是对于在流动条件下记录的信号。我们提出了一个轻量级的状态空间滤波器,该滤波器将ibi建模为具有时变参数的逆高斯分布的样本。我们通过使概率数据关联滤波器适应于设置,使滤波器对异常值具有鲁棒性。我们证明了所得到的滤波器可以准确地识别异常值,并且跟踪分布的参数可以用于计算特定的HRV统计量(正态到正态间隔的标准差,SDNN),而无需进一步分析。
{"title":"Interbeat Interval Filtering","authors":"İlker Bayram","doi":"10.1109/LSP.2024.3522853","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522853","url":null,"abstract":"Several inhibitory and excitatory factors regulate the beating of the heart. Consequently, the interbeat intervals (IBIs) vary around a mean value. Various statistics have been proposed to capture heart rate variability (HRV) to give a glimpse into this balance. However, these statistics require accurate estimation of IBIs as a first step, which can be challenging especially for signals recorded in ambulatory conditions. We propose a lightweight state-space filter that models the IBIs as samples of an inverse Gaussian distribution with time-varying parameters. We make the filter robust against outliers by adapting the probabilistic data association filter to the setup. We demonstrate that the resulting filter can accurately identify outliers and the parameters of the tracked distribution can be used to compute a specific HRV statistic (standard deviation of normal-to-normal intervals, SDNN) without further analysis.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"481-485"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STSPhys: Enhanced Remote Heart Rate Measurement With Spatial-Temporal SwiftFormer 基于时空SwiftFormer的增强远程心率测量
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-25 DOI: 10.1109/LSP.2024.3522854
Hyunduk Kim;Sang-Heon Lee;Myoung-Kyu Sohn;Jungkwang Kim;Hyeyoung Park
Estimating heart activities and physiological signals from facial video without any contact, known as remote photoplethysmography and remote heart rate estimation, holds significant potential for numerous applications. In this letter, we present a novel approach for remote heart rate measurement leveraging a Spatial-Temporal SwiftFormer architecture (STSPhys). Our model addresses the limitations of existing methods that rely heavily on 3D CNNs or 3D visual transformers, which often suffer from increased parameters and potential instability during training. By integrating both spatial and temporal information from facial video data, STSPhys achieves robust and accurate heart rate estimation. Additionally, we introduce a hybrid loss function that integrates constraints from both the time and frequency domains, further enhancing the model's accuracy. Experimental results demonstrate that STSPhys significantly outperforms existing state-of-the-art methods on intra-dataset and cross-dataset tests, achieving superior performance with fewer parameters and lower computational complexity.
在没有任何接触的情况下从面部视频中估算心脏活动和生理信号,即远程光电心动图和远程心率估算,在许多应用中都具有巨大的潜力。在这封信中,我们提出了一种利用空间-时间 SwiftFormer 架构(STSPhys)进行远程心率测量的新方法。我们的模型解决了严重依赖三维 CNN 或三维视觉变换器的现有方法的局限性,这些方法在训练过程中往往会出现参数增加和潜在不稳定性的问题。通过整合面部视频数据的空间和时间信息,STSPhys 实现了稳健而准确的心率估计。此外,我们还引入了混合损失函数,该函数综合了时域和频域的约束,进一步提高了模型的准确性。实验结果表明,在数据集内和跨数据集测试中,STSPhys 明显优于现有的先进方法,以更少的参数和更低的计算复杂度实现了卓越的性能。
{"title":"STSPhys: Enhanced Remote Heart Rate Measurement With Spatial-Temporal SwiftFormer","authors":"Hyunduk Kim;Sang-Heon Lee;Myoung-Kyu Sohn;Jungkwang Kim;Hyeyoung Park","doi":"10.1109/LSP.2024.3522854","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522854","url":null,"abstract":"Estimating heart activities and physiological signals from facial video without any contact, known as remote photoplethysmography and remote heart rate estimation, holds significant potential for numerous applications. In this letter, we present a novel approach for remote heart rate measurement leveraging a Spatial-Temporal SwiftFormer architecture (STSPhys). Our model addresses the limitations of existing methods that rely heavily on 3D CNNs or 3D visual transformers, which often suffer from increased parameters and potential instability during training. By integrating both spatial and temporal information from facial video data, STSPhys achieves robust and accurate heart rate estimation. Additionally, we introduce a hybrid loss function that integrates constraints from both the time and frequency domains, further enhancing the model's accuracy. Experimental results demonstrate that STSPhys significantly outperforms existing state-of-the-art methods on intra-dataset and cross-dataset tests, achieving superior performance with fewer parameters and lower computational complexity.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"521-525"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Surveillance Video Compression With Background Hyperprior 基于背景超先验的自适应监控视频压缩
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-25 DOI: 10.1109/LSP.2024.3521663
Yu Zhao;Song Tang;Mao Ye
Neural surveillance video compression methods have demonstrated significant improvements over traditional video compression techniques. In current surveillance video compression frameworks, the first frame in a Group of Pictures (GOP) is usually compressed fully as an I frame, and the subsequent P frames are compressed by referencing this I frame at Low Delay P (LDP) encoding mode. However, this compression approach overlooks the utilization of background information, which limits its adaptability to different scenarios. In this paper, we propose a novel Adaptive Surveillance Video Compression framework based on background hyperprior, dubbed as ASVC. This background hyperprior is related with side information to assist in coding both the temporal and spatial domains. Our method mainly consists of two components. First, the background information from a GOP is extracted, modeled as hyperprior and is compressed by exiting methods. Then these hyperprior is used as side information to compress both I frames and P frames. ASVC effectively captures the temporal dependencies in the latent representations of surveillance videos by leveraging background hyperprior for auxiliary video encoding. The experimental results demonstrate that applying ASVC to traditional and learning based methods significantly improves performance.
与传统的视频压缩技术相比,神经监控视频压缩方法有了显著的改进。在目前的监控视频压缩框架中,GOP (Group of Pictures)中的第一帧通常被完全压缩为I帧,随后的P帧在LDP (Low Delay P)编码模式下引用该I帧进行压缩。然而,这种压缩方法忽略了对背景信息的利用,限制了它对不同场景的适应性。本文提出了一种基于背景超先验的自适应监控视频压缩框架,称为ASVC。这种背景超先验与辅助编码时间和空间域的侧信息有关。我们的方法主要由两部分组成。首先,从GOP中提取背景信息,建立超先验模型,并用现有方法进行压缩。然后用这些超先验作为边信息来压缩I帧和P帧。ASVC通过利用背景超先验进行辅助视频编码,有效地捕获了监控视频潜在表示中的时间依赖性。实验结果表明,将ASVC应用于传统方法和基于学习的方法可以显著提高性能。
{"title":"Adaptive Surveillance Video Compression With Background Hyperprior","authors":"Yu Zhao;Song Tang;Mao Ye","doi":"10.1109/LSP.2024.3521663","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521663","url":null,"abstract":"Neural surveillance video compression methods have demonstrated significant improvements over traditional video compression techniques. In current surveillance video compression frameworks, the first frame in a Group of Pictures (GOP) is usually compressed fully as an I frame, and the subsequent P frames are compressed by referencing this I frame at Low Delay P (LDP) encoding mode. However, this compression approach overlooks the utilization of background information, which limits its adaptability to different scenarios. In this paper, we propose a novel Adaptive Surveillance Video Compression framework based on background hyperprior, dubbed as ASVC. This background hyperprior is related with side information to assist in coding both the temporal and spatial domains. Our method mainly consists of two components. First, the background information from a GOP is extracted, modeled as hyperprior and is compressed by exiting methods. Then these hyperprior is used as side information to compress both I frames and P frames. ASVC effectively captures the temporal dependencies in the latent representations of surveillance videos by leveraging background hyperprior for auxiliary video encoding. The experimental results demonstrate that applying ASVC to traditional and learning based methods significantly improves performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"456-460"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Beam Pattern Synthesis Based on Vector Accelerated Alternating Direction Multiplier Method 基于矢量加速交替方向乘子法的快速波束方向图合成
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-25 DOI: 10.1109/LSP.2024.3522858
Qiyan Song
The alternating direction multiplier method (ADMM) has been employed to iteratively solve convex optimization problems with multiple constraints in be amforming scenarios. Faster beamforming can help improve the response speed of acoustic devices in scenarios such as sound field reconstruction and speech enhancement. In this study, an accelerated ADMM for faster beam pattern synthesis is proposed and compared to traditional ADMMs. Based on the principle of vector acceleration, the computation of dual and auxiliary variables is expedited to improve the computational speed of ADMM beamforming algorithm. Simulation results show that the proposed algorithm reduces the overall computational time by approximately 30$%$ and achieves more accurate results in less time compared to traditional ADMM beamforming algorithms.
采用交替方向乘子法(ADMM)迭代求解变形场景下的多约束凸优化问题。在声场重建和语音增强等场景中,更快的波束形成可以帮助提高声学设备的响应速度。在本研究中,提出了一种加速ADMM,用于更快的光束方向图合成,并与传统ADMM进行了比较。基于矢量加速原理,加快了对偶变量和辅助变量的计算速度,提高了ADMM波束形成算法的计算速度。仿真结果表明,与传统的ADMM波束形成算法相比,该算法的总计算时间减少了约30%,在更短的时间内获得了更精确的结果。
{"title":"Fast Beam Pattern Synthesis Based on Vector Accelerated Alternating Direction Multiplier Method","authors":"Qiyan Song","doi":"10.1109/LSP.2024.3522858","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522858","url":null,"abstract":"The alternating direction multiplier method (ADMM) has been employed to iteratively solve convex optimization problems with multiple constraints in be amforming scenarios. Faster beamforming can help improve the response speed of acoustic devices in scenarios such as sound field reconstruction and speech enhancement. In this study, an accelerated ADMM for faster beam pattern synthesis is proposed and compared to traditional ADMMs. Based on the principle of vector acceleration, the computation of dual and auxiliary variables is expedited to improve the computational speed of ADMM beamforming algorithm. Simulation results show that the proposed algorithm reduces the overall computational time by approximately 30<inline-formula><tex-math>$%$</tex-math></inline-formula> and achieves more accurate results in less time compared to traditional ADMM beamforming algorithms.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"526-530"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ResEmoteNet: Bridging Accuracy and Loss Reduction in Facial Emotion Recognition 目的:在面部情绪识别的准确性和减少损失之间架起桥梁
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-23 DOI: 10.1109/LSP.2024.3521321
Arnab Kumar Roy;Hemant Kumar Kathania;Adhitiya Sharma;Abhishek Dey;Md. Sarfaraj Alam Ansari
The human face is a silent communicator, expressing emotions and thoughts through it's facial expressions. With the advancements in computer vision in recent years, facial emotion recognition technology has made significant strides, enabling machines to decode the intricacies of facial cues. In this work, we propose ResEmoteNet, a novel deep learning architecture for facial emotion recognition designed with the combination of Convolutional, Squeeze-Excitation (SE) and Residual Networks. The inclusion of SE block selectively focuses on the important features of the human face, enhances the feature representation and suppresses the less relevant ones. This helps in reducing the loss and enhancing the overall model performance. We also integrate the SE block with three residual blocks that help in learning more complex representation of the data through deeper layers. We evaluated ResEmoteNet on four open-source databases: FER2013, RAF-DB, AffectNet-7 and ExpW, achieving accuracies of 79.79%, 94.76%, 72.39% and 75.67% respectively. The proposed network outperforms state-of-the-art models across all four databases.
人类的脸是一个无声的沟通者,通过面部表情来表达情感和思想。随着近年来计算机视觉的进步,面部情感识别技术取得了重大进展,使机器能够解码复杂的面部线索。在这项工作中,我们提出了reemotenet,这是一种新颖的面部情绪识别深度学习架构,结合了卷积、挤压激励(SE)和残差网络。SE块的包含选择性地关注人脸的重要特征,增强特征表征,抑制不相关的特征。这有助于减少损失并增强整体模型性能。我们还将SE块与三个残差块集成在一起,这有助于通过更深的层学习更复杂的数据表示。我们在FER2013、RAF-DB、AffectNet-7和ExpW四个开源数据库上对ResEmoteNet进行了评估,准确率分别为79.79%、94.76%、72.39%和75.67%。提议的网络在所有四个数据库中都优于最先进的模型。
{"title":"ResEmoteNet: Bridging Accuracy and Loss Reduction in Facial Emotion Recognition","authors":"Arnab Kumar Roy;Hemant Kumar Kathania;Adhitiya Sharma;Abhishek Dey;Md. Sarfaraj Alam Ansari","doi":"10.1109/LSP.2024.3521321","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521321","url":null,"abstract":"The human face is a silent communicator, expressing emotions and thoughts through it's facial expressions. With the advancements in computer vision in recent years, facial emotion recognition technology has made significant strides, enabling machines to decode the intricacies of facial cues. In this work, we propose ResEmoteNet, a novel deep learning architecture for facial emotion recognition designed with the combination of Convolutional, Squeeze-Excitation (SE) and Residual Networks. The inclusion of SE block selectively focuses on the important features of the human face, enhances the feature representation and suppresses the less relevant ones. This helps in reducing the loss and enhancing the overall model performance. We also integrate the SE block with three residual blocks that help in learning more complex representation of the data through deeper layers. We evaluated ResEmoteNet on four open-source databases: FER2013, RAF-DB, AffectNet-7 and ExpW, achieving accuracies of 79.79%, 94.76%, 72.39% and 75.67% respectively. The proposed network outperforms state-of-the-art models across all four databases.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"491-495"},"PeriodicalIF":3.2,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple and Efficient Method for Hybrid AOA and DTD Localization With Unknown Transmitter Location 一种简单有效的未知发射机位置的AOA和DTD混合定位方法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-23 DOI: 10.1109/LSP.2024.3521317
Yanbin Zou;Yangpeng Xiao;Weien Zhang
Recently, joint target and transmitter localization using differential time-delay (DTD) and angle-of-arrival (AOA) measurements has attracted researchers' interest. Due to the fact that three Euclidean norms exist in the DTD equation, the DTD equation is difficult to tackle directly. In this paper, we divide the joint localization problem into three subproblems, respectively, the AOA-only localization problem, the hybrid AOA and time-difference-of-arrival (TDOA) localization problem, and the hybrid AOA and time-delay (TD) localization problem with known transmitter location. Then, a two-stage algorithm is developed. In the first stage, solving the AOA-only localization problem provides initial estimates. In the second stage, alternatively and iteratively solving the problem of hybrid AOA and TDOA localization and the problem of hybrid AOA and TD localization provide the improved solutions. Simulation results validate that the proposed algorithm is superior to the existing constrained weighted least-squares (CWLS) algorithm when AOA noise variance is not sufficiently small. Index Term-Angle-of-arrival (AOA), differential time-delay (DTD), time-delay (TD), time-difference-of-arrival (TDOA), elliptic localization.
近年来,利用差分时延(DTD)和到达角(AOA)测量的联合目标和发射机定位引起了研究人员的兴趣。由于DTD方程中存在三个欧氏范数,使得DTD方程难以直接求解。本文将联合定位问题划分为三个子问题,分别是:仅AOA定位问题、AOA与到达时间差(TDOA)混合定位问题和已知发射机位置的AOA与时延(TD)混合定位问题。然后,提出了一种两阶段算法。在第一阶段,解决仅面向对象的定位问题提供了初始估计。在第二阶段,交替迭代地解决AOA和TDOA混合定位问题以及AOA和TD混合定位问题,提供改进的解决方案。仿真结果表明,该算法在AOA噪声方差不够小的情况下优于现有的约束加权最小二乘(CWLS)算法。索引项:到达角(AOA)、差分时延(DTD)、时延(TD)、到达时间差(TDOA)、椭圆定位。
{"title":"A Simple and Efficient Method for Hybrid AOA and DTD Localization With Unknown Transmitter Location","authors":"Yanbin Zou;Yangpeng Xiao;Weien Zhang","doi":"10.1109/LSP.2024.3521317","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521317","url":null,"abstract":"Recently, joint target and transmitter localization using differential time-delay (DTD) and angle-of-arrival (AOA) measurements has attracted researchers' interest. Due to the fact that three Euclidean norms exist in the DTD equation, the DTD equation is difficult to tackle directly. In this paper, we divide the joint localization problem into three subproblems, respectively, the AOA-only localization problem, the hybrid AOA and time-difference-of-arrival (TDOA) localization problem, and the hybrid AOA and time-delay (TD) localization problem with known transmitter location. Then, a two-stage algorithm is developed. In the first stage, solving the AOA-only localization problem provides initial estimates. In the second stage, alternatively and iteratively solving the problem of hybrid AOA and TDOA localization and the problem of hybrid AOA and TD localization provide the improved solutions. Simulation results validate that the proposed algorithm is superior to the existing constrained weighted least-squares (CWLS) algorithm when AOA noise variance is not sufficiently small. \u0000<italic>Index Term</i>\u0000-Angle-of-arrival (AOA), differential time-delay (DTD), time-delay (TD), time-difference-of-arrival (TDOA), elliptic localization.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"401-405"},"PeriodicalIF":3.2,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1