Pub Date : 2024-11-07DOI: 10.1016/j.dsp.2024.104854
Kun Chi , Jun Hu , Liyan Wang , Jihong Shen
Radar behavior prediction is an important task in the field of electronic reconnaissance. For the extensive applied multi-function radar (MFR), which can flexibly transition between various work modes and make certain statistical rule of these radar behaviors exist in the signal sequence. Most of existing radar emission prediction methods are inapplicable to the non-cooperative scenario, since the labeled sequence samples are hard to obtain. To solve this challenge, an unsupervised framework is proposed for learning the behavior rule from the pulse sequence and predicting the radar mode in this paper. The framework includes three modules of sequence segmentation for mode switch boundaries detection, segment clustering for behavior mode recognition, and mode prediction for behavior rule extraction. The application of this framework can predict state and numerical values of next mode at the same time. Experimental results demonstrate that the proposed framework has a considerable prediction performance and shows good robustness under the non-ideal conditions.
{"title":"Learning rule in MFR pulse sequence for behavior mode prediction","authors":"Kun Chi , Jun Hu , Liyan Wang , Jihong Shen","doi":"10.1016/j.dsp.2024.104854","DOIUrl":"10.1016/j.dsp.2024.104854","url":null,"abstract":"<div><div>Radar behavior prediction is an important task in the field of electronic reconnaissance. For the extensive applied multi-function radar (MFR), which can flexibly transition between various work modes and make certain statistical rule of these radar behaviors exist in the signal sequence. Most of existing radar emission prediction methods are inapplicable to the non-cooperative scenario, since the labeled sequence samples are hard to obtain. To solve this challenge, an unsupervised framework is proposed for learning the behavior rule from the pulse sequence and predicting the radar mode in this paper. The framework includes three modules of sequence segmentation for mode switch boundaries detection, segment clustering for behavior mode recognition, and mode prediction for behavior rule extraction. The application of this framework can predict state and numerical values of next mode at the same time. Experimental results demonstrate that the proposed framework has a considerable prediction performance and shows good robustness under the non-ideal conditions.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104854"},"PeriodicalIF":2.9,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.dsp.2024.104855
Meng Wang, Yudong Liu, Haipeng Liu
The application scenarios of object detection models are constantly changing, due to the alternation of day and night and weather changes. Detector often suffers from the scarcity of training sets on potential domains. Recently, this challenge known as domain shift has been relieved by single domain generalization (SDG). To further generalize towards multiple unseen domains, this paper proposes a detector that uses text semantic gaps to enhance scene diversity and utilizes feature disentangling to extract domain-invariant features from different scenes, thereby improving detection accuracy. Firstly, random semantic augmentation (RSA) is adopted leveraging the text modality to capture semantically generalized representations, thereby augmenting the diversity of domain related information. Second, by broadening the decision boundary between domain-invariant and domain-specific features, feature disentangling (FD) branches are applied to improve the detector's object-background differentiation. Additionally, a cross modality alignment (CMA) is performed by estimating the relevances between domain-specific features and textual domain prompts. Experimental results show the proposed detector has excellent performance among existing baselines on diverse weather conditions, such as rainy, foggy and night rainy, which also confirms the enhanced generalization ability on multiple unseen domains.
{"title":"An enhanced domain generalization method for object detection based on text guided feature disentanglement","authors":"Meng Wang, Yudong Liu, Haipeng Liu","doi":"10.1016/j.dsp.2024.104855","DOIUrl":"10.1016/j.dsp.2024.104855","url":null,"abstract":"<div><div>The application scenarios of object detection models are constantly changing, due to the alternation of day and night and weather changes. Detector often suffers from the scarcity of training sets on potential domains. Recently, this challenge known as domain shift has been relieved by single domain generalization (SDG). To further generalize towards multiple unseen domains, this paper proposes a detector that uses text semantic gaps to enhance scene diversity and utilizes feature disentangling to extract domain-invariant features from different scenes, thereby improving detection accuracy. Firstly, random semantic augmentation (RSA) is adopted leveraging the text modality to capture semantically generalized representations, thereby augmenting the diversity of domain related information. Second, by broadening the decision boundary between domain-invariant and domain-specific features, feature disentangling (FD) branches are applied to improve the detector's object-background differentiation. Additionally, a cross modality alignment (CMA) is performed by estimating the relevances between domain-specific features and textual domain prompts. Experimental results show the proposed detector has excellent performance among existing baselines on diverse weather conditions, such as rainy, foggy and night rainy, which also confirms the enhanced generalization ability on multiple unseen domains.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104855"},"PeriodicalIF":2.9,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.dsp.2024.104856
Yayun Wei, Lei Cao, Yilin Dong, Tianyu Liu
Human mental state recognition (MSR) has significant implications for human-machine interactions. Although mental state recognition models based on single-modality signals, such as electroencephalogram (EEG) or peripheral physiological signals (PPS), have achieved encouraging progress, methods leveraging multimodal physiological signals still need to be explored. In this study, we present MCNN-CMCA, a generic model that employs multiscale convolutional neural networks (CNNs) with cross-modal channel attention to realize physiological signals-based MSR. Specifically, we first design an innovative cross-modal channel attention mechanism that adaptively adjusting the weights of each signal channel, effectively learning both intra-modality and inter-modality correlation and expanding the channel information to the depth dimension. Additionally, the study utilizes multiscale temporal CNNs for obtaining short-term and long-term time-frequency features across different modalities. Finally, the multimodal fusion module integrates the representations of all physiological signals and the classification layer implements sparse connections by setting the mask weights to 0. We evaluate the proposed method on the SEED-VIG, DEAP, and self-made datasets, achieving superior results compared to existing state-of-the-art methods. Furthermore, we conduct ablation studies to demonstrate the effectiveness of each component in the MCNN-CMCA and show the use of multimodal physiological signals outperforms single-modality signals.
{"title":"MCNN-CMCA: A multiscale convolutional neural networks with cross-modal channel attention for physiological signal-based mental state recognition","authors":"Yayun Wei, Lei Cao, Yilin Dong, Tianyu Liu","doi":"10.1016/j.dsp.2024.104856","DOIUrl":"10.1016/j.dsp.2024.104856","url":null,"abstract":"<div><div>Human mental state recognition (MSR) has significant implications for human-machine interactions. Although mental state recognition models based on single-modality signals, such as electroencephalogram (EEG) or peripheral physiological signals (PPS), have achieved encouraging progress, methods leveraging multimodal physiological signals still need to be explored. In this study, we present MCNN-CMCA, a generic model that employs multiscale convolutional neural networks (CNNs) with cross-modal channel attention to realize physiological signals-based MSR. Specifically, we first design an innovative cross-modal channel attention mechanism that adaptively adjusting the weights of each signal channel, effectively learning both intra-modality and inter-modality correlation and expanding the channel information to the depth dimension. Additionally, the study utilizes multiscale temporal CNNs for obtaining short-term and long-term time-frequency features across different modalities. Finally, the multimodal fusion module integrates the representations of all physiological signals and the classification layer implements sparse connections by setting the mask weights to 0. We evaluate the proposed method on the SEED-VIG, DEAP, and self-made datasets, achieving superior results compared to existing state-of-the-art methods. Furthermore, we conduct ablation studies to demonstrate the effectiveness of each component in the MCNN-CMCA and show the use of multimodal physiological signals outperforms single-modality signals.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104856"},"PeriodicalIF":2.9,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.dsp.2024.104838
Wen Chen, Hongquan Huang, Xingke Ma, Xinhang Xu, Yi Guan, Guorui Wei, Lin Xiong, Chenglin Zhong, Dejie Chen, Zhonglin Wu
Wind power generation is influenced by various meteorological factors, exhibiting significant volatility and unpredictability. This variability presents considerable challenges for accurate wind power forecasting. In this study, we propose an innovative method for short-term wind power prediction that integrates a Bayesian-optimized Convolutional Neural Network (CNN), Bidirectional Gated Recurrent Units (BiGRU), and a Self-Attention Mechanism (SA) within a multi-layer architecture. Initially, we preprocess features using Pearson correlation analysis and input them into the CNN to investigate complex nonlinear spatial relationships among multiple feature variables and the current load. Subsequently, the BiGRU captures long-term dependencies from both forward and backward time sequences. Finally, we implement the Self-Attention Mechanism to weigh the features and generate the predicted wind power. We optimize the model's numerous hyperparameters utilizing a Bayesian algorithm. Through comparative ablation experiments with varying time segment lengths on wind farm datasets from four regions, our method significantly outperforms 11 models, including Long Short-Term Memory (LSTM), and surpasses several state-of-the-art (SOTA) prediction models, such as iTransformer, PatchTST, Non-stationary Transformers, TSMixer, and DLinear. The highest coefficient of determination (R²) achieved was 0.981, with the Symmetric Mean Absolute Percentage Error (SMAPE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) decreasing by 11.22 % to 62.04 % compared to other models. The results demonstrate the predictive accuracy and generalization performance of our proposed model.
{"title":"The short-term wind power prediction based on a multi-layer stacked model of BOCNN-BiGRU-SA","authors":"Wen Chen, Hongquan Huang, Xingke Ma, Xinhang Xu, Yi Guan, Guorui Wei, Lin Xiong, Chenglin Zhong, Dejie Chen, Zhonglin Wu","doi":"10.1016/j.dsp.2024.104838","DOIUrl":"10.1016/j.dsp.2024.104838","url":null,"abstract":"<div><div>Wind power generation is influenced by various meteorological factors, exhibiting significant volatility and unpredictability. This variability presents considerable challenges for accurate wind power forecasting. In this study, we propose an innovative method for short-term wind power prediction that integrates a Bayesian-optimized Convolutional Neural Network (CNN), Bidirectional Gated Recurrent Units (BiGRU), and a Self-Attention Mechanism (SA) within a multi-layer architecture. Initially, we preprocess features using Pearson correlation analysis and input them into the CNN to investigate complex nonlinear spatial relationships among multiple feature variables and the current load. Subsequently, the BiGRU captures long-term dependencies from both forward and backward time sequences. Finally, we implement the Self-Attention Mechanism to weigh the features and generate the predicted wind power. We optimize the model's numerous hyperparameters utilizing a Bayesian algorithm. Through comparative ablation experiments with varying time segment lengths on wind farm datasets from four regions, our method significantly outperforms 11 models, including Long Short-Term Memory (LSTM), and surpasses several state-of-the-art (SOTA) prediction models, such as iTransformer, PatchTST, Non-stationary Transformers, TSMixer, and DLinear. The highest coefficient of determination (R²) achieved was 0.981, with the Symmetric Mean Absolute Percentage Error (SMAPE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) decreasing by 11.22 % to 62.04 % compared to other models. The results demonstrate the predictive accuracy and generalization performance of our proposed model.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104838"},"PeriodicalIF":2.9,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.dsp.2024.104853
Zhilin Yan , Rencan Nie , Jinde Cao , Guangxu Xie , Zhengze Ding
Infrared and visible image fusion (IVIF) facing different information in two modal scenarios, the focus of research is to better extract different information. In this work, we propose a multi-stage self-supervised learning based cross-modality contrastive representation network for infrared and visible image fusion (MSL-CCRN). Firstly, considering that the scene differences between different modalities affect the fusion of cross-modal images, we propose a contrastive representation network (CRN). CRN enhances the interaction between the fused image and the source image, and significantly improves the similarity between the meaningful features in each modality and the fused image. Secondly, due to the lack of ground truth in IVIF, the quality of directly obtained fused image is seriously affected. We design a multi-stage fusion strategy to address the loss of important information in this process. Notably, our method is a self-supervised network. In fusion stage I, we reconstruct the initial fused image as the new view of fusion stage II. In fusion stage II, we use the fused image obtained in the previous stage to carry out three-view contrastive representation, thereby constraining the feature extraction of the source image. This makes the final fused image introduce more important information in the source image. Through a large number of qualitative, quantitative experiments and downstream object detection experiments, our propose method shows excellent performance compared with most advanced methods.
红外图像与可见光图像融合(IVIF)面临着两种模式下的不同信息,研究的重点是如何更好地提取不同的信息。在这项工作中,我们提出了一种基于多阶段自监督学习的红外与可见光图像融合的跨模态对比表示网络(MSL-CCRN)。首先,考虑到不同模态之间的场景差异会影响跨模态图像的融合,我们提出了一种对比性表示网络(CRN)。CRN 增强了融合图像与源图像之间的交互,并显著提高了各模态有意义特征与融合图像之间的相似性。其次,由于 IVIF 缺乏地面实况,直接获得的融合图像质量受到严重影响。我们设计了一种多阶段融合策略来解决这一过程中重要信息丢失的问题。值得注意的是,我们的方法是一种自监督网络。在融合阶段 I,我们重建初始融合图像作为融合阶段 II 的新视图。在融合阶段 II 中,我们使用前一阶段获得的融合图像进行三视图对比表示,从而约束源图像的特征提取。这使得最终的融合图像引入了源图像中更多的重要信息。通过大量的定性、定量实验和下游物体检测实验,我们提出的方法与大多数先进方法相比表现出了卓越的性能。
{"title":"MSL-CCRN: Multi-stage self-supervised learning based cross-modality contrastive representation network for infrared and visible image fusion","authors":"Zhilin Yan , Rencan Nie , Jinde Cao , Guangxu Xie , Zhengze Ding","doi":"10.1016/j.dsp.2024.104853","DOIUrl":"10.1016/j.dsp.2024.104853","url":null,"abstract":"<div><div>Infrared and visible image fusion (IVIF) facing different information in two modal scenarios, the focus of research is to better extract different information. In this work, we propose a multi-stage self-supervised learning based cross-modality contrastive representation network for infrared and visible image fusion (MSL-CCRN). Firstly, considering that the scene differences between different modalities affect the fusion of cross-modal images, we propose a contrastive representation network (CRN). CRN enhances the interaction between the fused image and the source image, and significantly improves the similarity between the meaningful features in each modality and the fused image. Secondly, due to the lack of ground truth in IVIF, the quality of directly obtained fused image is seriously affected. We design a multi-stage fusion strategy to address the loss of important information in this process. Notably, our method is a self-supervised network. In fusion stage I, we reconstruct the initial fused image as the new view of fusion stage II. In fusion stage II, we use the fused image obtained in the previous stage to carry out three-view contrastive representation, thereby constraining the feature extraction of the source image. This makes the final fused image introduce more important information in the source image. Through a large number of qualitative, quantitative experiments and downstream object detection experiments, our propose method shows excellent performance compared with most advanced methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104853"},"PeriodicalIF":2.9,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.dsp.2024.104846
Zhifa Liu , Ruide Zhang , Yujie Wang , Haowei Zhang , Gang Wang , Ying Zhang
This paper proposes continuous discrete linear Kalman filtering algorithm based on the minimum error entropy criterion under non-Gaussian noise environments. Traditional Kalman filters struggle in such environments due to their reliance on Gaussian assumptions. Our approach leverages stochastic differential equations to precisely model system dynamics and integrates the minimum error entropy criterion to capture higher-order statistical properties of non-Gaussian noise. Simulations confirm that the proposed algorithm significantly enhances estimation accuracy and robustness compared to conventional methods, demonstrating its effectiveness in handling complex, noisy environments.
{"title":"Continuous discrete minimum error entropy Kalman filter in non-Gaussian noises system","authors":"Zhifa Liu , Ruide Zhang , Yujie Wang , Haowei Zhang , Gang Wang , Ying Zhang","doi":"10.1016/j.dsp.2024.104846","DOIUrl":"10.1016/j.dsp.2024.104846","url":null,"abstract":"<div><div>This paper proposes continuous discrete linear Kalman filtering algorithm based on the minimum error entropy criterion under non-Gaussian noise environments. Traditional Kalman filters struggle in such environments due to their reliance on Gaussian assumptions. Our approach leverages stochastic differential equations to precisely model system dynamics and integrates the minimum error entropy criterion to capture higher-order statistical properties of non-Gaussian noise. Simulations confirm that the proposed algorithm significantly enhances estimation accuracy and robustness compared to conventional methods, demonstrating its effectiveness in handling complex, noisy environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104846"},"PeriodicalIF":2.9,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.dsp.2024.104850
Quan Huang , Shaopeng Wei , Lei Zhang
Interrupted sampling repeater jamming (ISRJ) is a category of coherent jamming that greatly influences radars' detection performance. Since the ISRJ has greater power than true targets, ISRJ signals can be removed in the time domain. Due to frequency band loss, grating lobes will be produced if pulse compression (PC) is performed directly, which may generate false targets. Compressive sensing (CS) is an effective method to restore the original PC signal. However, it is challenging for classic CS approaches to manually select the optimization parameters (e.g., penalty parameters, step sizes, etc.) in different ISRJ backgrounds. In this article, a network method based on the Alternating Direction Method of Multipliers (ADMM), named ADMM-CSNet, is introduced to solve the problem. Based on the strong learning capacity of the deep network, all parameters in the ADMM are learned from radar data utilizing back-propagation rather than manually selecting in traditional CS techniques. Compared with classic CS approaches, a higher ISRJ removal signal restoration accuracy is reached faster. Simulation experiments indicate the proposal performs effectively and accurately for ISRJ removal signal reconstruction.
{"title":"Interpretable ADMM-CSNet for interrupted sampling repeater jamming suppression","authors":"Quan Huang , Shaopeng Wei , Lei Zhang","doi":"10.1016/j.dsp.2024.104850","DOIUrl":"10.1016/j.dsp.2024.104850","url":null,"abstract":"<div><div>Interrupted sampling repeater jamming (ISRJ) is a category of coherent jamming that greatly influences radars' detection performance. Since the ISRJ has greater power than true targets, ISRJ signals can be removed in the time domain. Due to frequency band loss, grating lobes will be produced if pulse compression (PC) is performed directly, which may generate false targets. Compressive sensing (CS) is an effective method to restore the original PC signal. However, it is challenging for classic CS approaches to manually select the optimization parameters (<em>e.g.</em>, penalty parameters, step sizes, etc.) in different ISRJ backgrounds. In this article, a network method based on the Alternating Direction Method of Multipliers (ADMM), named ADMM-CSNet, is introduced to solve the problem. Based on the strong learning capacity of the deep network, all parameters in the ADMM are learned from radar data utilizing back-propagation rather than manually selecting in traditional CS techniques. Compared with classic CS approaches, a higher ISRJ removal signal restoration accuracy is reached faster. Simulation experiments indicate the proposal performs effectively and accurately for ISRJ removal signal reconstruction.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104850"},"PeriodicalIF":2.9,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.dsp.2024.104837
Hansol Kim , Sukho Lee , Moon Gi Kang
Multi Image Super-resolution (MISR) refers to the task of enhancing the spatial resolution of a stack of low-resolution (LR) images representing the same scene. Although many deep learning-based single image super-resolution (SISR) technologies have recently been developed, deep learning has not been widely exploited for MISR, even though it can achieve higher reconstruction accuracy because more information can be extracted from the stack of LR images. One of the primary obstacles encountered by deep networks when addressing the MISR problem is the variability in the number of LR images that act as input to the network. This impedes the feasibility of adopting an end-to-end learning approach, because the varying number of input images makes it difficult to construct a training dataset for the network. Another challenge arises from the requirement to align the LR input images to generate high-resolution (HR) image of high quality, which requires complex and sophisticated methods.
In this paper, we propose a self-learning based method that can simultaneously perform super-resolution and sub-pixel registration of multiple LR images. The proposed method trains a neural network with only the LR images as input and without any true target HR images; i.e., the proposed method requires no extra training dataset. Therefore, it is easy to use the proposed method to deal with different numbers of input images. To our knowledge this is the first time that a neural network is trained using only LR images to perform a joint MISR and sub-pixel registration. Experimental results confirmed that the HR images generated by the proposed method achieved better results in both quantitative and qualitative evaluations than those generated by other deep learning-based methods.
多图像超分辨率(MISR)是指增强代表同一场景的低分辨率(LR)图像堆栈的空间分辨率。尽管最近开发出了许多基于深度学习的单图像超分辨率(SISR)技术,但深度学习尚未被广泛用于 MISR,尽管它可以实现更高的重建精度,因为可以从一叠低分辨率图像中提取更多信息。深度网络在解决 MISR 问题时遇到的主要障碍之一是作为网络输入的 LR 图像数量的不稳定性。这阻碍了采用端到端学习方法的可行性,因为输入图像数量的变化使得网络难以构建训练数据集。本文提出了一种基于自学习的方法,可同时对多幅 LR 图像进行超分辨率和子像素配准。本文提出的方法只将 LR 图像作为输入,而不使用任何真实的目标 HR 图像来训练神经网络;也就是说,本文提出的方法不需要额外的训练数据集。因此,建议的方法很容易处理不同数量的输入图像。据我们所知,这是第一次仅使用 LR 图像来训练神经网络,以执行 MISR 和子像素联合配准。实验结果证实,与其他基于深度学习的方法相比,拟议方法生成的 HR 图像在定量和定性评估方面都取得了更好的结果。
{"title":"Self-learning based joint multi image super-resolution and sub-pixel registration","authors":"Hansol Kim , Sukho Lee , Moon Gi Kang","doi":"10.1016/j.dsp.2024.104837","DOIUrl":"10.1016/j.dsp.2024.104837","url":null,"abstract":"<div><div>Multi Image Super-resolution (MISR) refers to the task of enhancing the spatial resolution of a stack of low-resolution (LR) images representing the same scene. Although many deep learning-based single image super-resolution (SISR) technologies have recently been developed, deep learning has not been widely exploited for MISR, even though it can achieve higher reconstruction accuracy because more information can be extracted from the stack of LR images. One of the primary obstacles encountered by deep networks when addressing the MISR problem is the variability in the number of LR images that act as input to the network. This impedes the feasibility of adopting an end-to-end learning approach, because the varying number of input images makes it difficult to construct a training dataset for the network. Another challenge arises from the requirement to align the LR input images to generate high-resolution (HR) image of high quality, which requires complex and sophisticated methods.</div><div>In this paper, we propose a self-learning based method that can simultaneously perform super-resolution and sub-pixel registration of multiple LR images. The proposed method trains a neural network with only the LR images as input and without any true target HR images; i.e., the proposed method requires no extra training dataset. Therefore, it is easy to use the proposed method to deal with different numbers of input images. To our knowledge this is the first time that a neural network is trained using only LR images to perform a joint MISR and sub-pixel registration. Experimental results confirmed that the HR images generated by the proposed method achieved better results in both quantitative and qualitative evaluations than those generated by other deep learning-based methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104837"},"PeriodicalIF":2.9,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.dsp.2024.104833
Alavala Siva Sankar Reddy, Ram Bilas Pachori
This paper presents a new method for time-frequency representation (TFR) using dynamic mode decomposition (DMD) and Wigner-Ville distribution (WVD), which is termed as DMD-WVD. The proposed method helps in removing cross-term in WVD-based TFR. In the suggested method, the DMD decomposes the multi-component signal into a set of modes where each mode is considered as mono-component signal. The analytic modes of these obtained mono-component signals are computed using the Hilbert transform. The WVD is computed for each analytic mode and added together to obtain cross-term free TFR based on the WVD. The effectiveness of the proposed method for TFR is evaluated using Rényi entropy (RE). Experimental results for synthetic signals namely, multi-component amplitude modulated signal, multi-component linear frequency modulated (LFM) signal, multi-component nonlinear frequency modulated (NLFM) signal, multi-component signal consisting of LFM and NLFM mono-component signal, multi-component signal consisting of sinusoidal and quadratic frequency modulated mono-component signals, and synthetic mechanical bearing fault signal and natural signals namely, electroencephalogram (EEG) and bat echolocation signals are presented in order to show the effectiveness of the proposed method for TFR. It is clear from the results that the proposed method suppresses cross-term effectively as compared to the other existing methods namely, smoothed pseudo WVD (SPWVD), empirical mode decomposition (EMD)-WVD, EMD-SPWVD, variational mode decomposition (VMD)-WVD, VMD-SPWVD, and DMD-SPWVD.
{"title":"Dynamic mode decomposition-based technique for cross-term suppression in the Wigner-Ville distribution","authors":"Alavala Siva Sankar Reddy, Ram Bilas Pachori","doi":"10.1016/j.dsp.2024.104833","DOIUrl":"10.1016/j.dsp.2024.104833","url":null,"abstract":"<div><div>This paper presents a new method for time-frequency representation (TFR) using dynamic mode decomposition (DMD) and Wigner-Ville distribution (WVD), which is termed as DMD-WVD. The proposed method helps in removing cross-term in WVD-based TFR. In the suggested method, the DMD decomposes the multi-component signal into a set of modes where each mode is considered as mono-component signal. The analytic modes of these obtained mono-component signals are computed using the Hilbert transform. The WVD is computed for each analytic mode and added together to obtain cross-term free TFR based on the WVD. The effectiveness of the proposed method for TFR is evaluated using Rényi entropy (RE). Experimental results for synthetic signals namely, multi-component amplitude modulated signal, multi-component linear frequency modulated (LFM) signal, multi-component nonlinear frequency modulated (NLFM) signal, multi-component signal consisting of LFM and NLFM mono-component signal, multi-component signal consisting of sinusoidal and quadratic frequency modulated mono-component signals, and synthetic mechanical bearing fault signal and natural signals namely, electroencephalogram (EEG) and bat echolocation signals are presented in order to show the effectiveness of the proposed method for TFR. It is clear from the results that the proposed method suppresses cross-term effectively as compared to the other existing methods namely, smoothed pseudo WVD (SPWVD), empirical mode decomposition (EMD)-WVD, EMD-SPWVD, variational mode decomposition (VMD)-WVD, VMD-SPWVD, and DMD-SPWVD.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104833"},"PeriodicalIF":2.9,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.dsp.2024.104835
Xiaotong Wang , Yibin Tang , Cheng Yao , Yuan Gao , Ying Chen
Image denoising is a fundamental task in image processing and low-level computer vision, often necessitating a delicate balance between noise removal and the preservation of fine details. In recent years, deep learning approaches, particularly those utilizing various neural network architectures, have shown significant promise in addressing this challenge. In this study, we propose DuINet, a novel dual-branch network specifically designed to capture complementary aspects of image information. DuINet integrates an information exchange module that facilitates effective feature sharing between the branches, and it incorporates a perceptual loss function aimed at enhancing the visual quality of the denoised images. Extensive experimental results demonstrate that DuINet surpasses existing dual-branch models and several state-of-the-art convolutional neural network (CNN)-based methods, particularly under conditions of severe noise where preserving fine details and textures is critical. Moreover, DuINet maintains competitive performance in terms of the LPIPS index when compared to deeper or larger networks such as Restormer and MIRNet, underscoring its ability to deliver high visual quality in denoised images.
{"title":"DuINet: A dual-branch network with information exchange and perceptual loss for enhanced image denoising","authors":"Xiaotong Wang , Yibin Tang , Cheng Yao , Yuan Gao , Ying Chen","doi":"10.1016/j.dsp.2024.104835","DOIUrl":"10.1016/j.dsp.2024.104835","url":null,"abstract":"<div><div>Image denoising is a fundamental task in image processing and low-level computer vision, often necessitating a delicate balance between noise removal and the preservation of fine details. In recent years, deep learning approaches, particularly those utilizing various neural network architectures, have shown significant promise in addressing this challenge. In this study, we propose DuINet, a novel dual-branch network specifically designed to capture complementary aspects of image information. DuINet integrates an information exchange module that facilitates effective feature sharing between the branches, and it incorporates a perceptual loss function aimed at enhancing the visual quality of the denoised images. Extensive experimental results demonstrate that DuINet surpasses existing dual-branch models and several state-of-the-art convolutional neural network (CNN)-based methods, particularly under conditions of severe noise where preserving fine details and textures is critical. Moreover, DuINet maintains competitive performance in terms of the LPIPS index when compared to deeper or larger networks such as Restormer and MIRNet, underscoring its ability to deliver high visual quality in denoised images.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104835"},"PeriodicalIF":2.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}