Pub Date : 2024-12-26DOI: 10.1109/LSP.2024.3523226
Lang Wu;Yong Ma;Fan Fan;Jun Huang
Due to the high-luminance (HL) background clutter in infrared (IR) images, the existing IR small target detection methods struggle to achieve a good balance between efficiency and performance. Addressing the issue of HL clutter, which is difficult to suppress, leading to a high false alarm rate, this letter proposes an IR small target detection method based on local-global feature fusion (LGFF). We develop a fast and efficient local feature extraction operator and utilize global rarity to characterize the global feature of small targets, effectively suppressing a significant amount of HL clutter. By integrating local and global features, we achieve further enhancement of the targets and robust suppression of the clutter. Experimental results demonstrate that the proposed method outperforms existing methods in terms of target enhancement, clutter removal, and real-time performance.
{"title":"Infrared Small Target Detection via Local-Global Feature Fusion","authors":"Lang Wu;Yong Ma;Fan Fan;Jun Huang","doi":"10.1109/LSP.2024.3523226","DOIUrl":"https://doi.org/10.1109/LSP.2024.3523226","url":null,"abstract":"Due to the high-luminance (HL) background clutter in infrared (IR) images, the existing IR small target detection methods struggle to achieve a good balance between efficiency and performance. Addressing the issue of HL clutter, which is difficult to suppress, leading to a high false alarm rate, this letter proposes an IR small target detection method based on local-global feature fusion (LGFF). We develop a fast and efficient local feature extraction operator and utilize global rarity to characterize the global feature of small targets, effectively suppressing a significant amount of HL clutter. By integrating local and global features, we achieve further enhancement of the targets and robust suppression of the clutter. Experimental results demonstrate that the proposed method outperforms existing methods in terms of target enhancement, clutter removal, and real-time performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"466-470"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-26DOI: 10.1109/LSP.2024.3522852
Chengzhong Wang;Jianjun Gu;Dingding Yao;Junfeng Li;Yonghong Yan
Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion models have gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the distribution of the signal with isotropic Gaussian noise and recover clean speech distribution from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy recordings. This approach substantially reduces computational complexity while exhibiting robustness against various forms of noise and speech distortion. Experiments demonstrate that the proposed method achieves state-of-the-art results with only approximately 4.5 million parameters, a number significantly lower than that required by other diffusion methods. This effectively narrows the model size disparity between diffusion-based and predictive speech enhancement approaches. Additionally, the proposed method performs well in very noisy scenarios, demonstrating its potential for applications in highly challenging environments.
{"title":"GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement","authors":"Chengzhong Wang;Jianjun Gu;Dingding Yao;Junfeng Li;Yonghong Yan","doi":"10.1109/LSP.2024.3522852","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522852","url":null,"abstract":"Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion models have gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the distribution of the signal with isotropic Gaussian noise and recover clean speech distribution from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy recordings. This approach substantially reduces computational complexity while exhibiting robustness against various forms of noise and speech distortion. Experiments demonstrate that the proposed method achieves state-of-the-art results with only approximately 4.5 million parameters, a number significantly lower than that required by other diffusion methods. This effectively narrows the model size disparity between diffusion-based and predictive speech enhancement approaches. Additionally, the proposed method performs well in very noisy scenarios, demonstrating its potential for applications in highly challenging environments.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"426-430"},"PeriodicalIF":3.2,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the consumption of multimedia content continues to rise, audio and video have become central to everyday entertainment and social interactions. This growing reliance amplifies the demand for effective and objective audio-visual quality assessment (AVQA) to understand the interaction between audio and visual elements, ultimately enhancing user satisfaction. However, existing state-of-the-art AVQA methods often rely on simplistic machine learning models or fully connected networks for audio-visual signal fusion, which limits their ability to exploit the complementary nature of these modalities. In response to this gap, we propose a novel no-reference AVQA method that utilizes joint cross-attention fusion of audio-visual perception. Our approach begins with a dual-stream feature extraction process that simultaneously captures long-range spatiotemporal visual features and audio features. The fusion model then dynamically adjusts the contributions of features from both modalities, effectively integrating them to provide a more comprehensive perception for quality score prediction. Experimental results on the LIVE-SJTU and UnB-AVC datasets demonstrate that our model outperforms state-of-the-art methods, achieving superior performance in audio-visual quality assessment.
{"title":"Enhancing No-Reference Audio-Visual Quality Assessment via Joint Cross-Attention Fusion","authors":"Zhaolin Wan;Xiguang Hao;Xiaopeng Fan;Wangmeng Zuo;Debin Zhao","doi":"10.1109/LSP.2024.3522855","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522855","url":null,"abstract":"As the consumption of multimedia content continues to rise, audio and video have become central to everyday entertainment and social interactions. This growing reliance amplifies the demand for effective and objective audio-visual quality assessment (AVQA) to understand the interaction between audio and visual elements, ultimately enhancing user satisfaction. However, existing state-of-the-art AVQA methods often rely on simplistic machine learning models or fully connected networks for audio-visual signal fusion, which limits their ability to exploit the complementary nature of these modalities. In response to this gap, we propose a novel no-reference AVQA method that utilizes joint cross-attention fusion of audio-visual perception. Our approach begins with a dual-stream feature extraction process that simultaneously captures long-range spatiotemporal visual features and audio features. The fusion model then dynamically adjusts the contributions of features from both modalities, effectively integrating them to provide a more comprehensive perception for quality score prediction. Experimental results on the LIVE-SJTU and UnB-AVC datasets demonstrate that our model outperforms state-of-the-art methods, achieving superior performance in audio-visual quality assessment.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"556-560"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1109/LSP.2024.3521714
Xinze Liu;Xiaojun Yang;Jiale Zhang;Jing Wang;Feiping Nie
The application of hyperspectral image (HSI) clustering has become widely used in the field of remote sensing. Traditional fuzzy K-means clustering methods often struggle with HSI data due to the significant levels of noise, consequently resulting in segmentation inaccuracies. To address this limitation, this letter introduces an innovative outlier indicator-based projection fuzzy K-means clustering (OIPFK) algorithm for clustering of HSI data, enhancing the efficacy and robustness of previous fuzzy K-means methodologies through a two-pronged strategy. Initially, an outlier indicator vector is constructed to identify noise and outliers by computing the distances between each data point in a reduced dimensional space. Subsequently, the OIPFK algorithm incorporates the fuzzy membership relationships between samples and clustering centers within this lower-dimensional framework, along with the integration of the outlier indicator vectors, to significantly mitigates the influence of noise and extraneous features. Moreover, an efficient iterative optimization algorithm is employed to address the optimization challenges inherent to OIPKM. Experimental results from three real-world hyperspectral image datasets demonstrate the effectiveness and superiority of our proposed method.
{"title":"Outlier Indicator Based Projection Fuzzy K-Means Clustering for Hyperspectral Image","authors":"Xinze Liu;Xiaojun Yang;Jiale Zhang;Jing Wang;Feiping Nie","doi":"10.1109/LSP.2024.3521714","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521714","url":null,"abstract":"The application of hyperspectral image (HSI) clustering has become widely used in the field of remote sensing. Traditional fuzzy K-means clustering methods often struggle with HSI data due to the significant levels of noise, consequently resulting in segmentation inaccuracies. To address this limitation, this letter introduces an innovative outlier indicator-based projection fuzzy K-means clustering (OIPFK) algorithm for clustering of HSI data, enhancing the efficacy and robustness of previous fuzzy K-means methodologies through a two-pronged strategy. Initially, an outlier indicator vector is constructed to identify noise and outliers by computing the distances between each data point in a reduced dimensional space. Subsequently, the OIPFK algorithm incorporates the fuzzy membership relationships between samples and clustering centers within this lower-dimensional framework, along with the integration of the outlier indicator vectors, to significantly mitigates the influence of noise and extraneous features. Moreover, an efficient iterative optimization algorithm is employed to address the optimization challenges inherent to OIPKM. Experimental results from three real-world hyperspectral image datasets demonstrate the effectiveness and superiority of our proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"496-500"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1109/LSP.2024.3522853
İlker Bayram
Several inhibitory and excitatory factors regulate the beating of the heart. Consequently, the interbeat intervals (IBIs) vary around a mean value. Various statistics have been proposed to capture heart rate variability (HRV) to give a glimpse into this balance. However, these statistics require accurate estimation of IBIs as a first step, which can be challenging especially for signals recorded in ambulatory conditions. We propose a lightweight state-space filter that models the IBIs as samples of an inverse Gaussian distribution with time-varying parameters. We make the filter robust against outliers by adapting the probabilistic data association filter to the setup. We demonstrate that the resulting filter can accurately identify outliers and the parameters of the tracked distribution can be used to compute a specific HRV statistic (standard deviation of normal-to-normal intervals, SDNN) without further analysis.
{"title":"Interbeat Interval Filtering","authors":"İlker Bayram","doi":"10.1109/LSP.2024.3522853","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522853","url":null,"abstract":"Several inhibitory and excitatory factors regulate the beating of the heart. Consequently, the interbeat intervals (IBIs) vary around a mean value. Various statistics have been proposed to capture heart rate variability (HRV) to give a glimpse into this balance. However, these statistics require accurate estimation of IBIs as a first step, which can be challenging especially for signals recorded in ambulatory conditions. We propose a lightweight state-space filter that models the IBIs as samples of an inverse Gaussian distribution with time-varying parameters. We make the filter robust against outliers by adapting the probabilistic data association filter to the setup. We demonstrate that the resulting filter can accurately identify outliers and the parameters of the tracked distribution can be used to compute a specific HRV statistic (standard deviation of normal-to-normal intervals, SDNN) without further analysis.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"481-485"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1109/LSP.2024.3522854
Hyunduk Kim;Sang-Heon Lee;Myoung-Kyu Sohn;Jungkwang Kim;Hyeyoung Park
Estimating heart activities and physiological signals from facial video without any contact, known as remote photoplethysmography and remote heart rate estimation, holds significant potential for numerous applications. In this letter, we present a novel approach for remote heart rate measurement leveraging a Spatial-Temporal SwiftFormer architecture (STSPhys). Our model addresses the limitations of existing methods that rely heavily on 3D CNNs or 3D visual transformers, which often suffer from increased parameters and potential instability during training. By integrating both spatial and temporal information from facial video data, STSPhys achieves robust and accurate heart rate estimation. Additionally, we introduce a hybrid loss function that integrates constraints from both the time and frequency domains, further enhancing the model's accuracy. Experimental results demonstrate that STSPhys significantly outperforms existing state-of-the-art methods on intra-dataset and cross-dataset tests, achieving superior performance with fewer parameters and lower computational complexity.
{"title":"STSPhys: Enhanced Remote Heart Rate Measurement With Spatial-Temporal SwiftFormer","authors":"Hyunduk Kim;Sang-Heon Lee;Myoung-Kyu Sohn;Jungkwang Kim;Hyeyoung Park","doi":"10.1109/LSP.2024.3522854","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522854","url":null,"abstract":"Estimating heart activities and physiological signals from facial video without any contact, known as remote photoplethysmography and remote heart rate estimation, holds significant potential for numerous applications. In this letter, we present a novel approach for remote heart rate measurement leveraging a Spatial-Temporal SwiftFormer architecture (STSPhys). Our model addresses the limitations of existing methods that rely heavily on 3D CNNs or 3D visual transformers, which often suffer from increased parameters and potential instability during training. By integrating both spatial and temporal information from facial video data, STSPhys achieves robust and accurate heart rate estimation. Additionally, we introduce a hybrid loss function that integrates constraints from both the time and frequency domains, further enhancing the model's accuracy. Experimental results demonstrate that STSPhys significantly outperforms existing state-of-the-art methods on intra-dataset and cross-dataset tests, achieving superior performance with fewer parameters and lower computational complexity.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"521-525"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1109/LSP.2024.3521663
Yu Zhao;Song Tang;Mao Ye
Neural surveillance video compression methods have demonstrated significant improvements over traditional video compression techniques. In current surveillance video compression frameworks, the first frame in a Group of Pictures (GOP) is usually compressed fully as an I frame, and the subsequent P frames are compressed by referencing this I frame at Low Delay P (LDP) encoding mode. However, this compression approach overlooks the utilization of background information, which limits its adaptability to different scenarios. In this paper, we propose a novel Adaptive Surveillance Video Compression framework based on background hyperprior, dubbed as ASVC. This background hyperprior is related with side information to assist in coding both the temporal and spatial domains. Our method mainly consists of two components. First, the background information from a GOP is extracted, modeled as hyperprior and is compressed by exiting methods. Then these hyperprior is used as side information to compress both I frames and P frames. ASVC effectively captures the temporal dependencies in the latent representations of surveillance videos by leveraging background hyperprior for auxiliary video encoding. The experimental results demonstrate that applying ASVC to traditional and learning based methods significantly improves performance.
与传统的视频压缩技术相比,神经监控视频压缩方法有了显著的改进。在目前的监控视频压缩框架中,GOP (Group of Pictures)中的第一帧通常被完全压缩为I帧,随后的P帧在LDP (Low Delay P)编码模式下引用该I帧进行压缩。然而,这种压缩方法忽略了对背景信息的利用,限制了它对不同场景的适应性。本文提出了一种基于背景超先验的自适应监控视频压缩框架,称为ASVC。这种背景超先验与辅助编码时间和空间域的侧信息有关。我们的方法主要由两部分组成。首先,从GOP中提取背景信息,建立超先验模型,并用现有方法进行压缩。然后用这些超先验作为边信息来压缩I帧和P帧。ASVC通过利用背景超先验进行辅助视频编码,有效地捕获了监控视频潜在表示中的时间依赖性。实验结果表明,将ASVC应用于传统方法和基于学习的方法可以显著提高性能。
{"title":"Adaptive Surveillance Video Compression With Background Hyperprior","authors":"Yu Zhao;Song Tang;Mao Ye","doi":"10.1109/LSP.2024.3521663","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521663","url":null,"abstract":"Neural surveillance video compression methods have demonstrated significant improvements over traditional video compression techniques. In current surveillance video compression frameworks, the first frame in a Group of Pictures (GOP) is usually compressed fully as an I frame, and the subsequent P frames are compressed by referencing this I frame at Low Delay P (LDP) encoding mode. However, this compression approach overlooks the utilization of background information, which limits its adaptability to different scenarios. In this paper, we propose a novel Adaptive Surveillance Video Compression framework based on background hyperprior, dubbed as ASVC. This background hyperprior is related with side information to assist in coding both the temporal and spatial domains. Our method mainly consists of two components. First, the background information from a GOP is extracted, modeled as hyperprior and is compressed by exiting methods. Then these hyperprior is used as side information to compress both I frames and P frames. ASVC effectively captures the temporal dependencies in the latent representations of surveillance videos by leveraging background hyperprior for auxiliary video encoding. The experimental results demonstrate that applying ASVC to traditional and learning based methods significantly improves performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"456-460"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1109/LSP.2024.3522858
Qiyan Song
The alternating direction multiplier method (ADMM) has been employed to iteratively solve convex optimization problems with multiple constraints in be amforming scenarios. Faster beamforming can help improve the response speed of acoustic devices in scenarios such as sound field reconstruction and speech enhancement. In this study, an accelerated ADMM for faster beam pattern synthesis is proposed and compared to traditional ADMMs. Based on the principle of vector acceleration, the computation of dual and auxiliary variables is expedited to improve the computational speed of ADMM beamforming algorithm. Simulation results show that the proposed algorithm reduces the overall computational time by approximately 30$%$ and achieves more accurate results in less time compared to traditional ADMM beamforming algorithms.
{"title":"Fast Beam Pattern Synthesis Based on Vector Accelerated Alternating Direction Multiplier Method","authors":"Qiyan Song","doi":"10.1109/LSP.2024.3522858","DOIUrl":"https://doi.org/10.1109/LSP.2024.3522858","url":null,"abstract":"The alternating direction multiplier method (ADMM) has been employed to iteratively solve convex optimization problems with multiple constraints in be amforming scenarios. Faster beamforming can help improve the response speed of acoustic devices in scenarios such as sound field reconstruction and speech enhancement. In this study, an accelerated ADMM for faster beam pattern synthesis is proposed and compared to traditional ADMMs. Based on the principle of vector acceleration, the computation of dual and auxiliary variables is expedited to improve the computational speed of ADMM beamforming algorithm. Simulation results show that the proposed algorithm reduces the overall computational time by approximately 30<inline-formula><tex-math>$%$</tex-math></inline-formula> and achieves more accurate results in less time compared to traditional ADMM beamforming algorithms.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"526-530"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1109/LSP.2024.3521321
Arnab Kumar Roy;Hemant Kumar Kathania;Adhitiya Sharma;Abhishek Dey;Md. Sarfaraj Alam Ansari
The human face is a silent communicator, expressing emotions and thoughts through it's facial expressions. With the advancements in computer vision in recent years, facial emotion recognition technology has made significant strides, enabling machines to decode the intricacies of facial cues. In this work, we propose ResEmoteNet, a novel deep learning architecture for facial emotion recognition designed with the combination of Convolutional, Squeeze-Excitation (SE) and Residual Networks. The inclusion of SE block selectively focuses on the important features of the human face, enhances the feature representation and suppresses the less relevant ones. This helps in reducing the loss and enhancing the overall model performance. We also integrate the SE block with three residual blocks that help in learning more complex representation of the data through deeper layers. We evaluated ResEmoteNet on four open-source databases: FER2013, RAF-DB, AffectNet-7 and ExpW, achieving accuracies of 79.79%, 94.76%, 72.39% and 75.67% respectively. The proposed network outperforms state-of-the-art models across all four databases.
{"title":"ResEmoteNet: Bridging Accuracy and Loss Reduction in Facial Emotion Recognition","authors":"Arnab Kumar Roy;Hemant Kumar Kathania;Adhitiya Sharma;Abhishek Dey;Md. Sarfaraj Alam Ansari","doi":"10.1109/LSP.2024.3521321","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521321","url":null,"abstract":"The human face is a silent communicator, expressing emotions and thoughts through it's facial expressions. With the advancements in computer vision in recent years, facial emotion recognition technology has made significant strides, enabling machines to decode the intricacies of facial cues. In this work, we propose ResEmoteNet, a novel deep learning architecture for facial emotion recognition designed with the combination of Convolutional, Squeeze-Excitation (SE) and Residual Networks. The inclusion of SE block selectively focuses on the important features of the human face, enhances the feature representation and suppresses the less relevant ones. This helps in reducing the loss and enhancing the overall model performance. We also integrate the SE block with three residual blocks that help in learning more complex representation of the data through deeper layers. We evaluated ResEmoteNet on four open-source databases: FER2013, RAF-DB, AffectNet-7 and ExpW, achieving accuracies of 79.79%, 94.76%, 72.39% and 75.67% respectively. The proposed network outperforms state-of-the-art models across all four databases.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"491-495"},"PeriodicalIF":3.2,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1109/LSP.2024.3521317
Yanbin Zou;Yangpeng Xiao;Weien Zhang
Recently, joint target and transmitter localization using differential time-delay (DTD) and angle-of-arrival (AOA) measurements has attracted researchers' interest. Due to the fact that three Euclidean norms exist in the DTD equation, the DTD equation is difficult to tackle directly. In this paper, we divide the joint localization problem into three subproblems, respectively, the AOA-only localization problem, the hybrid AOA and time-difference-of-arrival (TDOA) localization problem, and the hybrid AOA and time-delay (TD) localization problem with known transmitter location. Then, a two-stage algorithm is developed. In the first stage, solving the AOA-only localization problem provides initial estimates. In the second stage, alternatively and iteratively solving the problem of hybrid AOA and TDOA localization and the problem of hybrid AOA and TD localization provide the improved solutions. Simulation results validate that the proposed algorithm is superior to the existing constrained weighted least-squares (CWLS) algorithm when AOA noise variance is not sufficiently small. Index Term