Pub Date : 2024-10-02DOI: 10.1016/j.dsp.2024.104793
Miroslav Dimitrov
The merit factor problem is of practical importance to manifold domains, such as digital communications engineering, radars, system modulation, system testing, information theory, physics, chemistry. In this work, some useful mathematical properties related to the flip operation of the skew-symmetric binary sequences are presented. By exploiting those properties, the space complexity of state-of-the-art stochastic merit factor optimization algorithms could be reduced from to . As a proof of concept, a lightweight stochastic algorithm was constructed, which can optimize pseudo-randomly generated skew-symmetric binary sequences with long lengths (up to ) to skew-symmetric binary sequences with a merit factor greater than 5. An approximation of the required time is also provided. The numerical experiments suggest that the algorithm is universal and could be applied to skew-symmetric binary sequences with arbitrary lengths.
{"title":"On the skew-symmetric binary sequences and the merit factor problem","authors":"Miroslav Dimitrov","doi":"10.1016/j.dsp.2024.104793","DOIUrl":"10.1016/j.dsp.2024.104793","url":null,"abstract":"<div><div>The merit factor problem is of practical importance to manifold domains, such as digital communications engineering, radars, system modulation, system testing, information theory, physics, chemistry. In this work, some useful mathematical properties related to the flip operation of the skew-symmetric binary sequences are presented. By exploiting those properties, the space complexity of state-of-the-art stochastic merit factor optimization algorithms could be reduced from <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></math></span> to <span><math><mi>O</mi><mo>(</mo><mi>n</mi><mo>)</mo></math></span>. As a proof of concept, a lightweight stochastic algorithm was constructed, which can optimize pseudo-randomly generated skew-symmetric binary sequences with long lengths (up to <span><math><msup><mrow><mn>10</mn></mrow><mrow><mn>5</mn></mrow></msup><mo>+</mo><mn>1</mn></math></span>) to skew-symmetric binary sequences with a merit factor greater than 5. An approximation of the required time is also provided. The numerical experiments suggest that the algorithm is universal and could be applied to skew-symmetric binary sequences with arbitrary lengths.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104793"},"PeriodicalIF":2.9,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.dsp.2024.104782
Hadi Zayyani , Mehdi Korki
Secure distributed estimation algorithms are designed to protect against a spectrum of attacks by exploring different attack models and implementing strategies to enhance the resilience of the algorithm. These models encompass diverse scenarios such as measurement sensor attacks and communication link attacks, which have been extensively investigated in existing literature. This paper, however, focuses on a specific type of attack: the multiplicative sensor attack model. To counter this, the paper introduces the Average diffusion least mean square (ADLMS) algorithm as a viable solution. Furthermore, the paper introduces the Average Likelihood Ratio Test (ALRT) detector, which provides a straightforward detection criterion. In the presence of communication link attacks, the paper considers the manipulation attack model and presents an ALRT adversary detector. The analysis extends to these ALRT detectors, encompassing the calculation of adversary detection probability and false alarm probability, both achieved in closed form. The paper also provides the mean convergence analysis of the proposed ADLMS algorithm. Simulation results reveal that the proposed algorithms exhibit enhanced performance compared to the DLMS algorithm, while the incremental complexity remains only marginally higher than that of the DLMS algorithm.
{"title":"Secure distributed estimation via an average diffusion LMS and average likelihood ratio test","authors":"Hadi Zayyani , Mehdi Korki","doi":"10.1016/j.dsp.2024.104782","DOIUrl":"10.1016/j.dsp.2024.104782","url":null,"abstract":"<div><div>Secure distributed estimation algorithms are designed to protect against a spectrum of attacks by exploring different attack models and implementing strategies to enhance the resilience of the algorithm. These models encompass diverse scenarios such as measurement sensor attacks and communication link attacks, which have been extensively investigated in existing literature. This paper, however, focuses on a specific type of attack: the multiplicative sensor attack model. To counter this, the paper introduces the Average diffusion least mean square (ADLMS) algorithm as a viable solution. Furthermore, the paper introduces the Average Likelihood Ratio Test (ALRT) detector, which provides a straightforward detection criterion. In the presence of communication link attacks, the paper considers the manipulation attack model and presents an ALRT adversary detector. The analysis extends to these ALRT detectors, encompassing the calculation of adversary detection probability and false alarm probability, both achieved in closed form. The paper also provides the mean convergence analysis of the proposed ADLMS algorithm. Simulation results reveal that the proposed algorithms exhibit enhanced performance compared to the DLMS algorithm, while the incremental complexity remains only marginally higher than that of the DLMS algorithm.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104782"},"PeriodicalIF":2.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.dsp.2024.104777
Tuncay Eren
In the evolution of fifth generation (5G) and beyond wireless communication systems, non-coherent (NC) short packet communication (SPC) is crucial for achieving ultra-reliable low-latency communication (URLLC). Frame design, latency, and reliability are some of the challenges associated with short-packet communication. Recently, to address these challenges, a novel modulation scheme known as modulation on conjugate-reciprocal zeros (MOCZ) has been proposed. MOCZ modulates information on conjugate reciprocal zeros in the z-domain, thereby eliminating the need for channel estimation and providing a robust solution for NC communication. However, in multi-user MOCZ (MU-MOCZ) scheme, adding guard intervals to each short packet to mitigate channel impact remains an issue, as it increases the transmission time and consequently reduces efficiency. To address the aforementioned problem, this paper introduces a novel frame design approach called z-domain user multiplexing MOCZ (ZDUM-MOCZ or ZDM-MOCZ). Unlike traditional time division multiplexing (TDM), which serves users consecutively in the time domain, this method multiplexes users in the z-domain. In this approach, each user is allocated a specific set of zeros in the z-domain, which collectively form a unique sequence in the time domain. The findings illustrate the potential for reduced latency in downlink transmission, highlighting the benefits of this novel methodology over conventional MU-MOCZ method. The proposed ZDUM-MOCZ scheme not only addresses the existing issues in the frame design of the MU-MOCZ scheme but also facilitates more efficient and reliable short packet communication in 5G and beyond wireless systems.
在第五代(5G)及以后的无线通信系统中,非相干(NC)短数据包通信(SPC)对于实现超可靠低延迟通信(URLLC)至关重要。帧设计、延迟和可靠性是与短数据包通信相关的一些挑战。最近,为了应对这些挑战,有人提出了一种新的调制方案,即共轭倒数零点调制(MOCZ)。MOCZ 将信息调制在 z 域的共轭倒数零点上,因此无需进行信道估计,为数控通信提供了一种稳健的解决方案。然而,在多用户 MOCZ(MU-MOCZ)方案中,为每个短数据包添加保护间隔以减轻信道影响仍是一个问题,因为这会增加传输时间,从而降低效率。为解决上述问题,本文提出了一种新颖的帧设计方法,称为 z 域用户复用 MOCZ(ZDUM-MOCZ 或 ZDM-MOCZ)。与在时域连续服务用户的传统时分复用(TDM)不同,这种方法在 z 域复用用户。在这种方法中,每个用户在 z 域被分配一组特定的零,这些零在时域中共同形成一个独特的序列。研究结果表明,ZDUM-MOCZ 有可能缩短下行链路传输的延迟时间,凸显了这种新方法与传统 MU-MOCZ 方法相比的优势。所提出的 ZDUM-MOCZ 方案不仅解决了 MU-MOCZ 方案帧设计中的现有问题,还有助于在 5G 及其他无线系统中实现更高效、更可靠的短数据包通信。
{"title":"Non-coherent short-packet communications: Novel z-domain user multiplexing","authors":"Tuncay Eren","doi":"10.1016/j.dsp.2024.104777","DOIUrl":"10.1016/j.dsp.2024.104777","url":null,"abstract":"<div><div>In the evolution of fifth generation (5G) and beyond wireless communication systems, non-coherent (NC) short packet communication (SPC) is crucial for achieving ultra-reliable low-latency communication (URLLC). Frame design, latency, and reliability are some of the challenges associated with short-packet communication. Recently, to address these challenges, a novel modulation scheme known as modulation on conjugate-reciprocal zeros (MOCZ) has been proposed. MOCZ modulates information on conjugate reciprocal zeros in the z-domain, thereby eliminating the need for channel estimation and providing a robust solution for NC communication. However, in multi-user MOCZ (MU-MOCZ) scheme, adding guard intervals to each short packet to mitigate channel impact remains an issue, as it increases the transmission time and consequently reduces efficiency. To address the aforementioned problem, this paper introduces a novel frame design approach called z-domain user multiplexing MOCZ (ZDUM-MOCZ or ZDM-MOCZ). Unlike traditional time division multiplexing (TDM), which serves users consecutively in the time domain, this method multiplexes users in the z-domain. In this approach, each user is allocated a specific set of zeros in the z-domain, which collectively form a unique sequence in the time domain. The findings illustrate the potential for reduced latency in downlink transmission, highlighting the benefits of this novel methodology over conventional MU-MOCZ method. The proposed ZDUM-MOCZ scheme not only addresses the existing issues in the frame design of the MU-MOCZ scheme but also facilitates more efficient and reliable short packet communication in 5G and beyond wireless systems.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104777"},"PeriodicalIF":2.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.dsp.2024.104791
Zhiqiang Hou , Minjie Qu , Minjie Cheng , Sugang Ma , Yunchen Wang , Xiaobao Yang
In the semantic segmentation field, the dual-branch structure is a highly effective segmentation model. However, the frequent downsampling in the semantic branch reduces the accuracy of features expression with increasing network depth, resulting in suboptimal segmentation performance. To address the above issues, this paper proposes a real-time semantic segmentation network based on Edge Feature Refinement (Edge Feature Refinement Network, EFRNet). A dual-branch structure is used in the encoder. To enhance the accuracy of deep features expression in the network, an edge refinement module (ERM) is designed in the dual-branch interaction stage to refine the features of the two branches and improve segmentation accuracy. In the decoder, a Bilateral Channel Attention (BCA) module is designed, which is used to extract detailed information and semantic information of features at different levels of the network, and gradually restore small target features. To capture multi-scale context information, we introduce a Multi-scale Context Aggregation Module (MCAM), which efficiently integrates multi-scale information in a parallel manner. The proposed algorithm has experimented on Cityscapes and CamVid datasets, and reaches 78.8% mIoU and 79.6% mIoU, with speeds of 81FPS and 115FPS, respectively. Experimental results show that the proposed algorithm effectively improves segmentation performance while maintaining a high segmentation speed.
{"title":"EFRNet: Edge feature refinement network for real-time semantic segmentation of driving scenes","authors":"Zhiqiang Hou , Minjie Qu , Minjie Cheng , Sugang Ma , Yunchen Wang , Xiaobao Yang","doi":"10.1016/j.dsp.2024.104791","DOIUrl":"10.1016/j.dsp.2024.104791","url":null,"abstract":"<div><div>In the semantic segmentation field, the dual-branch structure is a highly effective segmentation model. However, the frequent downsampling in the semantic branch reduces the accuracy of features expression with increasing network depth, resulting in suboptimal segmentation performance. To address the above issues, this paper proposes a real-time semantic segmentation network based on Edge Feature Refinement (Edge Feature Refinement Network, EFRNet). A dual-branch structure is used in the encoder. To enhance the accuracy of deep features expression in the network, an edge refinement module (ERM) is designed in the dual-branch interaction stage to refine the features of the two branches and improve segmentation accuracy. In the decoder, a Bilateral Channel Attention (BCA) module is designed, which is used to extract detailed information and semantic information of features at different levels of the network, and gradually restore small target features. To capture multi-scale context information, we introduce a Multi-scale Context Aggregation Module (MCAM), which efficiently integrates multi-scale information in a parallel manner. The proposed algorithm has experimented on Cityscapes and CamVid datasets, and reaches 78.8% mIoU and 79.6% mIoU, with speeds of 81FPS and 115FPS, respectively. Experimental results show that the proposed algorithm effectively improves segmentation performance while maintaining a high segmentation speed.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104791"},"PeriodicalIF":2.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1016/j.dsp.2024.104792
Caiyu Li, Yan Wo
Face forgery detection is crucial for the security of digital identities. However, existing methods often struggle to generalize effectively to unseen domains due to the domain shift between training and testing data. We propose a Domain-robust Representation Learning (DRRL) method for generalized face forgery detection. Specifically, we observe that domain shifts in face forgery detection tasks are often caused by forgery differences and content differences between domain data, while the limitations of training data lead the model to overfit to these feature expressions in the seen domain. Therefore, DRRL enhances the model's generalization to unseen domains by first adding representative data representations to mitigate overfitting to seen data and then removing the features of expressed domain information to learn a robust, discriminative representation of domain variation. Data augmentation is achieved by stylizing sample representations and exploring representative new styles to generate rich data variants, with the Content-style Augmentation (CSA) module and Forgery-style Augmentation (FSA) module implemented for content and forgery expression, respectively. Based on this, the Content Decorrelation (CTD) module and Sensitive Channels Drop (SCD) module are used to remove content features irrelevant to forgery and domain-sensitive forgery features, encouraging the model to focus on clean and robust forgery features, thereby achieving the goal of learning domain-robust representations. Extensive experiments on five large-scale datasets demonstrate that our method exhibits advanced and stable generalization performance in practical scenarios.
{"title":"Towards generalized face forgery detection with domain-robust representation learning","authors":"Caiyu Li, Yan Wo","doi":"10.1016/j.dsp.2024.104792","DOIUrl":"10.1016/j.dsp.2024.104792","url":null,"abstract":"<div><div>Face forgery detection is crucial for the security of digital identities. However, existing methods often struggle to generalize effectively to unseen domains due to the domain shift between training and testing data. We propose a Domain-robust Representation Learning (DRRL) method for generalized face forgery detection. Specifically, we observe that domain shifts in face forgery detection tasks are often caused by forgery differences and content differences between domain data, while the limitations of training data lead the model to overfit to these feature expressions in the seen domain. Therefore, DRRL enhances the model's generalization to unseen domains by first adding representative data representations to mitigate overfitting to seen data and then removing the features of expressed domain information to learn a robust, discriminative representation of domain variation. Data augmentation is achieved by stylizing sample representations and exploring representative new styles to generate rich data variants, with the Content-style Augmentation (CSA) module and Forgery-style Augmentation (FSA) module implemented for content and forgery expression, respectively. Based on this, the Content Decorrelation (CTD) module and Sensitive Channels Drop (SCD) module are used to remove content features irrelevant to forgery and domain-sensitive forgery features, encouraging the model to focus on clean and robust forgery features, thereby achieving the goal of learning domain-robust representations. Extensive experiments on five large-scale datasets demonstrate that our method exhibits advanced and stable generalization performance in practical scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104792"},"PeriodicalIF":2.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1016/j.dsp.2024.104798
Yaobin Zou , Shutong Chen
To automatically threshold images with unimodal, bimodal, multimodal or non-modal gray level distributions within a unified framework, an automatic thresholding method using single information entropy under the product transformation of order difference filter response is proposed. The proposed method first performs the product transformation of order difference filter response on an input image at different scales to obtain the product transformation image. Critical or non-critical pixels are labelled on each pixel of the binary images corresponding to different thresholds to construct a series of binary label images that are used for distinguishing critical or non-critical regions. A single information entropy is finally used for characterizing the information obtained from the product transformation image with the critical regions of different binary label images, and the threshold corresponding to maximum information entropy is selected as final threshold. The proposed method is compared with seven state-of-the-art segmentation methods. Experimental results on 12 synthetic images and 98 real-world images show that the average Matthews correlation coefficients of the proposed method reached 0.994 and 0.966 for the synthetic images and the real-world images, which outperform the second-best method by 52.4 % and 27.8 %, respectively. The proposed method has more robust segmentation adaptability to test images with different modalities, despite not offering an advantage in terms of computational efficiency.
{"title":"Automatic thresholding method using single information entropy under product transformation of order difference filter response","authors":"Yaobin Zou , Shutong Chen","doi":"10.1016/j.dsp.2024.104798","DOIUrl":"10.1016/j.dsp.2024.104798","url":null,"abstract":"<div><div>To automatically threshold images with unimodal, bimodal, multimodal or non-modal gray level distributions within a unified framework, an automatic thresholding method using single information entropy under the product transformation of order difference filter response is proposed. The proposed method first performs the product transformation of order difference filter response on an input image at different scales to obtain the product transformation image. Critical or non-critical pixels are labelled on each pixel of the binary images corresponding to different thresholds to construct a series of binary label images that are used for distinguishing critical or non-critical regions. A single information entropy is finally used for characterizing the information obtained from the product transformation image with the critical regions of different binary label images, and the threshold corresponding to maximum information entropy is selected as final threshold. The proposed method is compared with seven state-of-the-art segmentation methods. Experimental results on 12 synthetic images and 98 real-world images show that the average Matthews correlation coefficients of the proposed method reached 0.994 and 0.966 for the synthetic images and the real-world images, which outperform the second-best method by 52.4 % and 27.8 %, respectively. The proposed method has more robust segmentation adaptability to test images with different modalities, despite not offering an advantage in terms of computational efficiency.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104798"},"PeriodicalIF":2.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Producing high-quality, noise-free images from noisy or hazy inputs relies on essential tasks such as single image deraining and dehazing. In many advanced multi-stage networks, there is often an imbalance in contextual information, leading to increased complexity. To address these challenges, we propose a simplified method inspired by a U-Net structure, resulting in the “Single-Stage V-Shaped Network” (), capable of handling both deraining and dehazing tasks. A key innovation in our approach is the introduction of a Feature Fusion Module (FFM), which facilitates the sharing of information across multiple scales and hierarchical layers within the encoder-decoder structure. As the network progresses towards deeper layers, the FFM gradually integrates insights from higher levels, ensuring that spatial details are preserved while contextual feature maps are balanced. This integration enhances the image processing capability, producing noise-free, high-quality outputs. To maintain efficiency and reduce system complexity, we replaced or removed several non-essential non-linear activation functions, opting instead for simple multiplication operations. Additionally, we introduced a “Multi-Head Attention Integrated Module” (MHAIM) as an intermediary layer between encoder-decoder levels. This module addresses the limited receptive fields of traditional Convolutional Neural Networks (CNNs), allowing for the capture of more comprehensive feature-map information. Our focus on deraining and dehazing led to extensive experiments on a wide range of synthetic and real-world datasets. To further validate the robustness of our network, we implemented on a low-end edge device, achieving deraining in 2.46 seconds.
要从嘈杂或朦胧的输入图像中生成高质量、无噪音的图像,需要完成一些基本任务,如单幅图像去毛刺和去阴影。在许多先进的多级网络中,上下文信息往往不平衡,导致复杂性增加。为了应对这些挑战,我们提出了一种受 U 型网络结构启发的简化方法,即 "单级 V 型网络"(S2VSNet),它能够同时处理去毛刺和去雾化任务。我们的方法的一个关键创新是引入了特征融合模块(FFM),该模块有助于在编码器-解码器结构中的多个尺度和层次层之间共享信息。随着网络向更深层次发展,FFM 会逐渐整合来自更高层次的洞察力,确保保留空间细节,同时平衡上下文特征图。这种整合增强了图像处理能力,产生无噪声的高质量输出。为了保持效率并降低系统复杂性,我们替换或删除了几个非必要的非线性激活函数,转而使用简单的乘法运算。此外,我们还引入了 "多头注意力集成模块"(MHAIM),作为编码器-解码器层之间的中间层。该模块解决了传统卷积神经网络(CNN)感受野有限的问题,从而可以捕捉到更全面的特征图信息。我们将重点放在了去毛刺和去马赛克上,并在大量的合成数据集和真实数据集上进行了广泛的实验。为了进一步验证我们网络的鲁棒性,我们在低端边缘设备上实施了 S2VSNet,在 2.46 秒内实现了去链。
{"title":"S2VSNet: Single stage V-shaped network for image deraining & dehazing","authors":"Thatikonda Ragini , Kodali Prakash , Ramalinga Swamy Cheruku","doi":"10.1016/j.dsp.2024.104786","DOIUrl":"10.1016/j.dsp.2024.104786","url":null,"abstract":"<div><div>Producing high-quality, noise-free images from noisy or hazy inputs relies on essential tasks such as single image deraining and dehazing. In many advanced multi-stage networks, there is often an imbalance in contextual information, leading to increased complexity. To address these challenges, we propose a simplified method inspired by a U-Net structure, resulting in the “Single-Stage V-Shaped Network” (<span><math><msup><mrow><mtext>S</mtext></mrow><mrow><mn>2</mn></mrow></msup><mtext>VSNet</mtext></math></span>), capable of handling both deraining and dehazing tasks. A key innovation in our approach is the introduction of a Feature Fusion Module (FFM), which facilitates the sharing of information across multiple scales and hierarchical layers within the encoder-decoder structure. As the network progresses towards deeper layers, the FFM gradually integrates insights from higher levels, ensuring that spatial details are preserved while contextual feature maps are balanced. This integration enhances the image processing capability, producing noise-free, high-quality outputs. To maintain efficiency and reduce system complexity, we replaced or removed several non-essential non-linear activation functions, opting instead for simple multiplication operations. Additionally, we introduced a “Multi-Head Attention Integrated Module” (MHAIM) as an intermediary layer between encoder-decoder levels. This module addresses the limited receptive fields of traditional Convolutional Neural Networks (CNNs), allowing for the capture of more comprehensive feature-map information. Our focus on deraining and dehazing led to extensive experiments on a wide range of synthetic and real-world datasets. To further validate the robustness of our network, we implemented <span><math><msup><mrow><mtext>S</mtext></mrow><mrow><mn>2</mn></mrow></msup><mtext>VSNet</mtext></math></span> on a low-end edge device, achieving deraining in 2.46 seconds.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104786"},"PeriodicalIF":2.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces the kernel recursive q-Rényi-like (KRqRL) algorithm, based on the q-Rényi kernel function and the kernel recursive least squares (KRLS) algorithm. To reduce the computational complexity and memory requirements of the KRqRL algorithm, an online vector quantization (VQ) method is employed to quantize the network size to a codebook size, resulting in the quantized KRqRL (QKRqRL) algorithm. This paper provides a detailed analysis of the convergence and computational complexity of the QKRqRL algorithm. In the simulation experiments, the network size of each algorithm is reduced to 25% of its original size. The performance of the QKRqRL algorithm is evaluated in terms of convergence speed, prediction error, and computation time under non-Gaussian noise conditions. Finally, the QKRqRL algorithm is further validated using sunspot data, demonstrating its superior stability and online prediction performance.
{"title":"Quantized kernel recursive q-Rényi-like algorithm","authors":"Wenwen Zhou , Yanmin Zhang , Chunlong Huang , Sergey V. Volvenko , Wei Xue","doi":"10.1016/j.dsp.2024.104790","DOIUrl":"10.1016/j.dsp.2024.104790","url":null,"abstract":"<div><div>This paper introduces the kernel recursive <em>q</em>-Rényi-like (KR<em>q</em>RL) algorithm, based on the <em>q</em>-Rényi kernel function and the kernel recursive least squares (KRLS) algorithm. To reduce the computational complexity and memory requirements of the KR<em>q</em>RL algorithm, an online vector quantization (VQ) method is employed to quantize the network size to a codebook size, resulting in the quantized KR<em>q</em>RL (QKR<em>q</em>RL) algorithm. This paper provides a detailed analysis of the convergence and computational complexity of the QKR<em>q</em>RL algorithm. In the simulation experiments, the network size of each algorithm is reduced to 25% of its original size. The performance of the QKR<em>q</em>RL algorithm is evaluated in terms of convergence speed, prediction error, and computation time under non-Gaussian noise conditions. Finally, the QKR<em>q</em>RL algorithm is further validated using sunspot data, demonstrating its superior stability and online prediction performance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104790"},"PeriodicalIF":2.9,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the improvement of spatial resolution of remote sensing images, object detection of remote sensing images has gradually become a difficult task. Extracted object features are usually hidden in a large amount of interference information in the background due to the complexity and large area of backgrounds, as well as the multi-scale nature of objects in remote sensing images. Still, many existing background weakening methods face difficulties in practical applications and are prone to high rates of false positives and false negatives. Therefore, remote sensing object detection has become increasingly challenging. To address these challenges, a novel background weakening method called Difference of Gaussian (DoG) to weaken background (DWB) module is proposed. Then, we develop a dual-branch network, named DoG-Enhanced Dual-Branch Object Detection Network (DEDBNet) for Remote Sensing Object Detection. The base branch network is responsible for detecting objects, while the DWB's branch network corrects the detected objects using feature-level attention. To combine the features of these branches, we propose two new methods Self-Mutual-Correcter with Detect heads (SMCD) for corrective learning and Map Channel Attention (MCA) for channel attention. Self-Corrector (SC) enables modification and integration of features, while the Mutual-Corrector (MC) enhances the features and further fuses them. We evaluate our proposed network, DEDBNet, through extensive experiments on four public datasets (DOTA with an mAP of 0.836, DIOR with an mAP of 0.871, NWPU VHR-10 with an mAP of 0.973, and RSOD with an mAP of 0.975). The results demonstrate that our method outperforms other state-of-the-art object detection methods significantly for remote sensing images.
{"title":"DEDBNet: DoG-enhanced dual-branch object detection network for remote sensing object detection","authors":"Dongbo Pan, Jingfeng Zhao, Tianchi Zhu, Jianjun Yuan","doi":"10.1016/j.dsp.2024.104789","DOIUrl":"10.1016/j.dsp.2024.104789","url":null,"abstract":"<div><div>With the improvement of spatial resolution of remote sensing images, object detection of remote sensing images has gradually become a difficult task. Extracted object features are usually hidden in a large amount of interference information in the background due to the complexity and large area of backgrounds, as well as the multi-scale nature of objects in remote sensing images. Still, many existing background weakening methods face difficulties in practical applications and are prone to high rates of false positives and false negatives. Therefore, remote sensing object detection has become increasingly challenging. To address these challenges, a novel background weakening method called Difference of Gaussian (DoG) to weaken background (DWB) module is proposed. Then, we develop a dual-branch network, named DoG-Enhanced Dual-Branch Object Detection Network (DEDBNet) for Remote Sensing Object Detection. The base branch network is responsible for detecting objects, while the DWB's branch network corrects the detected objects using feature-level attention. To combine the features of these branches, we propose two new methods Self-Mutual-Correcter with Detect heads (SMCD) for corrective learning and Map Channel Attention (MCA) for channel attention. Self-Corrector (SC) enables modification and integration of features, while the Mutual-Corrector (MC) enhances the features and further fuses them. We evaluate our proposed network, DEDBNet, through extensive experiments on four public datasets (DOTA with an mAP of 0.836, DIOR with an mAP of 0.871, NWPU VHR-10 with an mAP of 0.973, and RSOD with an mAP of 0.975). The results demonstrate that our method outperforms other state-of-the-art object detection methods significantly for remote sensing images.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104789"},"PeriodicalIF":2.9,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142428112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24DOI: 10.1016/j.dsp.2024.104788
Qiqiang Wu , Xianmin Zhang , Bo Zhao
High-end mechanical equipment often operates under non-stationary conditions, such as varying loads, changing speeds, and transient impacts, which can lead to failures. Time-frequency analysis (TFA) integrates time and frequency parameters, allowing for detailed signal analysis and is widely used in this context. To improve the accuracy of assessing the operational status of mechanical equipment, this paper proposed a multi-modal signal adaptive time reassignment multiple synchrosqueezing transform (MSST) TFA method. This method enhances the MSST method by using a local maximum technique to address energy ambiguity in TFA. Additionally, the optimal window width for each function is determined through iterative processes to better concentrate energy in the TFA. Multi-modal signals are jointly analyzed using an impulse feature extraction method for signal reconstruction, enabling multi-dimensional fault analysis. The proposed method is validated with both simulation and experimental data from a planar parallel mechanism (PPM) and is compared against classical and advanced techniques. The results show that the method effectively captures shock features in multi-modal signals, offering a more consolidated time-frequency representation (TFR) than existing TFA algorithms.
{"title":"Multi-modal signal adaptive time-reassigned multisynchrosqueezing transform of mechanism","authors":"Qiqiang Wu , Xianmin Zhang , Bo Zhao","doi":"10.1016/j.dsp.2024.104788","DOIUrl":"10.1016/j.dsp.2024.104788","url":null,"abstract":"<div><div>High-end mechanical equipment often operates under non-stationary conditions, such as varying loads, changing speeds, and transient impacts, which can lead to failures. Time-frequency analysis (TFA) integrates time and frequency parameters, allowing for detailed signal analysis and is widely used in this context. To improve the accuracy of assessing the operational status of mechanical equipment, this paper proposed a multi-modal signal adaptive time reassignment multiple synchrosqueezing transform (MSST) TFA method. This method enhances the MSST method by using a local maximum technique to address energy ambiguity in TFA. Additionally, the optimal window width for each function is determined through iterative processes to better concentrate energy in the TFA. Multi-modal signals are jointly analyzed using an impulse feature extraction method for signal reconstruction, enabling multi-dimensional fault analysis. The proposed method is validated with both simulation and experimental data from a planar parallel mechanism (PPM) and is compared against classical and advanced techniques. The results show that the method effectively captures shock features in multi-modal signals, offering a more consolidated time-frequency representation (TFR) than existing TFA algorithms.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104788"},"PeriodicalIF":2.9,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142359606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}