Pub Date : 2024-10-31DOI: 10.1016/j.dsp.2024.104850
Quan Huang , Shaopeng Wei , Lei Zhang
Interrupted sampling repeater jamming (ISRJ) is a category of coherent jamming that greatly influences radars' detection performance. Since the ISRJ has greater power than true targets, ISRJ signals can be removed in the time domain. Due to frequency band loss, grating lobes will be produced if pulse compression (PC) is performed directly, which may generate false targets. Compressive sensing (CS) is an effective method to restore the original PC signal. However, it is challenging for classic CS approaches to manually select the optimization parameters (e.g., penalty parameters, step sizes, etc.) in different ISRJ backgrounds. In this article, a network method based on the Alternating Direction Method of Multipliers (ADMM), named ADMM-CSNet, is introduced to solve the problem. Based on the strong learning capacity of the deep network, all parameters in the ADMM are learned from radar data utilizing back-propagation rather than manually selecting in traditional CS techniques. Compared with classic CS approaches, a higher ISRJ removal signal restoration accuracy is reached faster. Simulation experiments indicate the proposal performs effectively and accurately for ISRJ removal signal reconstruction.
{"title":"Interpretable ADMM-CSNet for interrupted sampling repeater jamming suppression","authors":"Quan Huang , Shaopeng Wei , Lei Zhang","doi":"10.1016/j.dsp.2024.104850","DOIUrl":"10.1016/j.dsp.2024.104850","url":null,"abstract":"<div><div>Interrupted sampling repeater jamming (ISRJ) is a category of coherent jamming that greatly influences radars' detection performance. Since the ISRJ has greater power than true targets, ISRJ signals can be removed in the time domain. Due to frequency band loss, grating lobes will be produced if pulse compression (PC) is performed directly, which may generate false targets. Compressive sensing (CS) is an effective method to restore the original PC signal. However, it is challenging for classic CS approaches to manually select the optimization parameters (<em>e.g.</em>, penalty parameters, step sizes, etc.) in different ISRJ backgrounds. In this article, a network method based on the Alternating Direction Method of Multipliers (ADMM), named ADMM-CSNet, is introduced to solve the problem. Based on the strong learning capacity of the deep network, all parameters in the ADMM are learned from radar data utilizing back-propagation rather than manually selecting in traditional CS techniques. Compared with classic CS approaches, a higher ISRJ removal signal restoration accuracy is reached faster. Simulation experiments indicate the proposal performs effectively and accurately for ISRJ removal signal reconstruction.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104850"},"PeriodicalIF":2.9,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.dsp.2024.104837
Hansol Kim , Sukho Lee , Moon Gi Kang
Multi Image Super-resolution (MISR) refers to the task of enhancing the spatial resolution of a stack of low-resolution (LR) images representing the same scene. Although many deep learning-based single image super-resolution (SISR) technologies have recently been developed, deep learning has not been widely exploited for MISR, even though it can achieve higher reconstruction accuracy because more information can be extracted from the stack of LR images. One of the primary obstacles encountered by deep networks when addressing the MISR problem is the variability in the number of LR images that act as input to the network. This impedes the feasibility of adopting an end-to-end learning approach, because the varying number of input images makes it difficult to construct a training dataset for the network. Another challenge arises from the requirement to align the LR input images to generate high-resolution (HR) image of high quality, which requires complex and sophisticated methods.
In this paper, we propose a self-learning based method that can simultaneously perform super-resolution and sub-pixel registration of multiple LR images. The proposed method trains a neural network with only the LR images as input and without any true target HR images; i.e., the proposed method requires no extra training dataset. Therefore, it is easy to use the proposed method to deal with different numbers of input images. To our knowledge this is the first time that a neural network is trained using only LR images to perform a joint MISR and sub-pixel registration. Experimental results confirmed that the HR images generated by the proposed method achieved better results in both quantitative and qualitative evaluations than those generated by other deep learning-based methods.
多图像超分辨率(MISR)是指增强代表同一场景的低分辨率(LR)图像堆栈的空间分辨率。尽管最近开发出了许多基于深度学习的单图像超分辨率(SISR)技术,但深度学习尚未被广泛用于 MISR,尽管它可以实现更高的重建精度,因为可以从一叠低分辨率图像中提取更多信息。深度网络在解决 MISR 问题时遇到的主要障碍之一是作为网络输入的 LR 图像数量的不稳定性。这阻碍了采用端到端学习方法的可行性,因为输入图像数量的变化使得网络难以构建训练数据集。本文提出了一种基于自学习的方法,可同时对多幅 LR 图像进行超分辨率和子像素配准。本文提出的方法只将 LR 图像作为输入,而不使用任何真实的目标 HR 图像来训练神经网络;也就是说,本文提出的方法不需要额外的训练数据集。因此,建议的方法很容易处理不同数量的输入图像。据我们所知,这是第一次仅使用 LR 图像来训练神经网络,以执行 MISR 和子像素联合配准。实验结果证实,与其他基于深度学习的方法相比,拟议方法生成的 HR 图像在定量和定性评估方面都取得了更好的结果。
{"title":"Self-learning based joint multi image super-resolution and sub-pixel registration","authors":"Hansol Kim , Sukho Lee , Moon Gi Kang","doi":"10.1016/j.dsp.2024.104837","DOIUrl":"10.1016/j.dsp.2024.104837","url":null,"abstract":"<div><div>Multi Image Super-resolution (MISR) refers to the task of enhancing the spatial resolution of a stack of low-resolution (LR) images representing the same scene. Although many deep learning-based single image super-resolution (SISR) technologies have recently been developed, deep learning has not been widely exploited for MISR, even though it can achieve higher reconstruction accuracy because more information can be extracted from the stack of LR images. One of the primary obstacles encountered by deep networks when addressing the MISR problem is the variability in the number of LR images that act as input to the network. This impedes the feasibility of adopting an end-to-end learning approach, because the varying number of input images makes it difficult to construct a training dataset for the network. Another challenge arises from the requirement to align the LR input images to generate high-resolution (HR) image of high quality, which requires complex and sophisticated methods.</div><div>In this paper, we propose a self-learning based method that can simultaneously perform super-resolution and sub-pixel registration of multiple LR images. The proposed method trains a neural network with only the LR images as input and without any true target HR images; i.e., the proposed method requires no extra training dataset. Therefore, it is easy to use the proposed method to deal with different numbers of input images. To our knowledge this is the first time that a neural network is trained using only LR images to perform a joint MISR and sub-pixel registration. Experimental results confirmed that the HR images generated by the proposed method achieved better results in both quantitative and qualitative evaluations than those generated by other deep learning-based methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104837"},"PeriodicalIF":2.9,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1016/j.dsp.2024.104833
Alavala Siva Sankar Reddy, Ram Bilas Pachori
This paper presents a new method for time-frequency representation (TFR) using dynamic mode decomposition (DMD) and Wigner-Ville distribution (WVD), which is termed as DMD-WVD. The proposed method helps in removing cross-term in WVD-based TFR. In the suggested method, the DMD decomposes the multi-component signal into a set of modes where each mode is considered as mono-component signal. The analytic modes of these obtained mono-component signals are computed using the Hilbert transform. The WVD is computed for each analytic mode and added together to obtain cross-term free TFR based on the WVD. The effectiveness of the proposed method for TFR is evaluated using Rényi entropy (RE). Experimental results for synthetic signals namely, multi-component amplitude modulated signal, multi-component linear frequency modulated (LFM) signal, multi-component nonlinear frequency modulated (NLFM) signal, multi-component signal consisting of LFM and NLFM mono-component signal, multi-component signal consisting of sinusoidal and quadratic frequency modulated mono-component signals, and synthetic mechanical bearing fault signal and natural signals namely, electroencephalogram (EEG) and bat echolocation signals are presented in order to show the effectiveness of the proposed method for TFR. It is clear from the results that the proposed method suppresses cross-term effectively as compared to the other existing methods namely, smoothed pseudo WVD (SPWVD), empirical mode decomposition (EMD)-WVD, EMD-SPWVD, variational mode decomposition (VMD)-WVD, VMD-SPWVD, and DMD-SPWVD.
{"title":"Dynamic mode decomposition-based technique for cross-term suppression in the Wigner-Ville distribution","authors":"Alavala Siva Sankar Reddy, Ram Bilas Pachori","doi":"10.1016/j.dsp.2024.104833","DOIUrl":"10.1016/j.dsp.2024.104833","url":null,"abstract":"<div><div>This paper presents a new method for time-frequency representation (TFR) using dynamic mode decomposition (DMD) and Wigner-Ville distribution (WVD), which is termed as DMD-WVD. The proposed method helps in removing cross-term in WVD-based TFR. In the suggested method, the DMD decomposes the multi-component signal into a set of modes where each mode is considered as mono-component signal. The analytic modes of these obtained mono-component signals are computed using the Hilbert transform. The WVD is computed for each analytic mode and added together to obtain cross-term free TFR based on the WVD. The effectiveness of the proposed method for TFR is evaluated using Rényi entropy (RE). Experimental results for synthetic signals namely, multi-component amplitude modulated signal, multi-component linear frequency modulated (LFM) signal, multi-component nonlinear frequency modulated (NLFM) signal, multi-component signal consisting of LFM and NLFM mono-component signal, multi-component signal consisting of sinusoidal and quadratic frequency modulated mono-component signals, and synthetic mechanical bearing fault signal and natural signals namely, electroencephalogram (EEG) and bat echolocation signals are presented in order to show the effectiveness of the proposed method for TFR. It is clear from the results that the proposed method suppresses cross-term effectively as compared to the other existing methods namely, smoothed pseudo WVD (SPWVD), empirical mode decomposition (EMD)-WVD, EMD-SPWVD, variational mode decomposition (VMD)-WVD, VMD-SPWVD, and DMD-SPWVD.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104833"},"PeriodicalIF":2.9,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.dsp.2024.104835
Xiaotong Wang , Yibin Tang , Cheng Yao , Yuan Gao , Ying Chen
Image denoising is a fundamental task in image processing and low-level computer vision, often necessitating a delicate balance between noise removal and the preservation of fine details. In recent years, deep learning approaches, particularly those utilizing various neural network architectures, have shown significant promise in addressing this challenge. In this study, we propose DuINet, a novel dual-branch network specifically designed to capture complementary aspects of image information. DuINet integrates an information exchange module that facilitates effective feature sharing between the branches, and it incorporates a perceptual loss function aimed at enhancing the visual quality of the denoised images. Extensive experimental results demonstrate that DuINet surpasses existing dual-branch models and several state-of-the-art convolutional neural network (CNN)-based methods, particularly under conditions of severe noise where preserving fine details and textures is critical. Moreover, DuINet maintains competitive performance in terms of the LPIPS index when compared to deeper or larger networks such as Restormer and MIRNet, underscoring its ability to deliver high visual quality in denoised images.
{"title":"DuINet: A dual-branch network with information exchange and perceptual loss for enhanced image denoising","authors":"Xiaotong Wang , Yibin Tang , Cheng Yao , Yuan Gao , Ying Chen","doi":"10.1016/j.dsp.2024.104835","DOIUrl":"10.1016/j.dsp.2024.104835","url":null,"abstract":"<div><div>Image denoising is a fundamental task in image processing and low-level computer vision, often necessitating a delicate balance between noise removal and the preservation of fine details. In recent years, deep learning approaches, particularly those utilizing various neural network architectures, have shown significant promise in addressing this challenge. In this study, we propose DuINet, a novel dual-branch network specifically designed to capture complementary aspects of image information. DuINet integrates an information exchange module that facilitates effective feature sharing between the branches, and it incorporates a perceptual loss function aimed at enhancing the visual quality of the denoised images. Extensive experimental results demonstrate that DuINet surpasses existing dual-branch models and several state-of-the-art convolutional neural network (CNN)-based methods, particularly under conditions of severe noise where preserving fine details and textures is critical. Moreover, DuINet maintains competitive performance in terms of the LPIPS index when compared to deeper or larger networks such as Restormer and MIRNet, underscoring its ability to deliver high visual quality in denoised images.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104835"},"PeriodicalIF":2.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.dsp.2024.104817
Huu Q. Tran , Lam Hoang Kham , Ho Van Khuong
In this research, we propose a system model based on cooperative non-orthogonal multiple access (NOMA) for simultaneous wireless information and power transfer (SWIPT) within a full-duplex (FD) communication framework. We investigate two protocols - time switching protocol (TSR) and power splitting protocol (PSR) - designed to accommodate delay-tolerant-transmission (DTT) as well as delay-limited-transmission (DLT), thereby improving data processing and energy harvesting (EH). We present explicit formulas for pivotal performance measures such as energy efficiency, ergodic rate, throughput, and outage probability. These performance measures are thoroughly evaluated in numerous specifications, encompassing inter-user separation, required spectral efficiency, EH efficiency, time and power splitting ratios in moderate-to-high signal-to-noise ratio scenarios. The results expose improved EH efficiency, hence meliorated transmission reliability. Importantly, NOMA in the proposed system model is proved to be considerably better than traditional orthogonal multiple access.
{"title":"Full-duplex cooperative relaying systems for simultaneous wireless information and power transfer with non-orthogonal multiple access","authors":"Huu Q. Tran , Lam Hoang Kham , Ho Van Khuong","doi":"10.1016/j.dsp.2024.104817","DOIUrl":"10.1016/j.dsp.2024.104817","url":null,"abstract":"<div><div>In this research, we propose a system model based on cooperative non-orthogonal multiple access (NOMA) for simultaneous wireless information and power transfer (SWIPT) within a full-duplex (FD) communication framework. We investigate two protocols - time switching protocol (TSR) and power splitting protocol (PSR) - designed to accommodate delay-tolerant-transmission (DTT) as well as delay-limited-transmission (DLT), thereby improving data processing and energy harvesting (EH). We present explicit formulas for pivotal performance measures such as energy efficiency, ergodic rate, throughput, and outage probability. These performance measures are thoroughly evaluated in numerous specifications, encompassing inter-user separation, required spectral efficiency, EH efficiency, time and power splitting ratios in moderate-to-high signal-to-noise ratio scenarios. The results expose improved EH efficiency, hence meliorated transmission reliability. Importantly, NOMA in the proposed system model is proved to be considerably better than traditional orthogonal multiple access.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104817"},"PeriodicalIF":2.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.dsp.2024.104820
Luis Felipe Parra-Gallego , Tomás Arias-Vergara , Juan Rafael Orozco-Arroyave
Customer satisfaction (CS) evaluation in call centers is essential for assessing service quality but commonly relies on human evaluations. Automatic evaluation systems can be used to perform CS analyses, enabling the evaluation of larger datasets. This research paper focuses on CS analysis through a multimodal approach that employs speech and language representations derived from the real-world voicemails. Additionally, given the similarity between the evaluation of a provided service (which may elicit different emotions in customers) and the automatic classification of emotions in speech, we also explore the topic of emotion recognition with the well-known corpus IEMOCAP which comprises 4-classes corresponding to different emotional states. We incorporated a language representation with word embeddings based on a CNN-LSTM model, and three different self-supervised learning (SSL) speech encoders, namely Wav2Vec2.0, HuBERT, and WavLM. A bidirectional alignment network based on attention mechanisms is employed for synchronizing speech and language representations. Three different fusion strategies are also explored in the paper. According to our results, the GGF model outperformed both, unimodal and other multimodal methods in the 4-class emotion recognition task on the IEMOCAP dataset and the binary CS classification task on the KONECTADB dataset. The study also demonstrated superior performance of our methodology compared to previous works on KONECTADB in both unimodal and multimodal approaches.
{"title":"Multimodal evaluation of customer satisfaction from voicemails using speech and language representations","authors":"Luis Felipe Parra-Gallego , Tomás Arias-Vergara , Juan Rafael Orozco-Arroyave","doi":"10.1016/j.dsp.2024.104820","DOIUrl":"10.1016/j.dsp.2024.104820","url":null,"abstract":"<div><div>Customer satisfaction (CS) evaluation in call centers is essential for assessing service quality but commonly relies on human evaluations. Automatic evaluation systems can be used to perform CS analyses, enabling the evaluation of larger datasets. This research paper focuses on CS analysis through a multimodal approach that employs speech and language representations derived from the real-world voicemails. Additionally, given the similarity between the evaluation of a provided service (which may elicit different emotions in customers) and the automatic classification of emotions in speech, we also explore the topic of emotion recognition with the well-known corpus IEMOCAP which comprises 4-classes corresponding to different emotional states. We incorporated a language representation with word embeddings based on a CNN-LSTM model, and three different self-supervised learning (SSL) speech encoders, namely Wav2Vec2.0, HuBERT, and WavLM. A bidirectional alignment network based on attention mechanisms is employed for synchronizing speech and language representations. Three different fusion strategies are also explored in the paper. According to our results, the GGF model outperformed both, unimodal and other multimodal methods in the 4-class emotion recognition task on the IEMOCAP dataset and the binary CS classification task on the KONECTADB dataset. The study also demonstrated superior performance of our methodology compared to previous works on KONECTADB in both unimodal and multimodal approaches.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104820"},"PeriodicalIF":2.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.dsp.2024.104819
Yong Guo , Lidong Yang
To enhance the resolution of synchrosqueezing transform (SST) in non-stationary signal representation, an optimization synchrosqueezed fractional wavelet transform (SSFRWT) is proposed, which possesses rigorous mathematical principle and high resolution. First, the definition, properties, and principles of SSFRWT are presented. On this basis, a time-fractional-frequency (TFF) analysis method is established utilizing SSFRWT. The experimental results demonstrate that SSFRWT is capable of establishing a high-resolution TFF representation for chirp-type signals, surpassing existing methods in terms of noise robustness and energy concentration. Lastly, leveraging the signal TFF representation, SSFRWT is successfully applied to the chirp signal parameter estimation and multi-component signal separation, yielding superior estimation results and reconstructed signal compared to SST. Notably, SSFRWT is also innovatively employed in the field of optical measurement, achieving high-precision measurement of the curvature radius of convex lens.
{"title":"An optimization synchrosqueezed fractional wavelet transform for TFF analysis and its applications","authors":"Yong Guo , Lidong Yang","doi":"10.1016/j.dsp.2024.104819","DOIUrl":"10.1016/j.dsp.2024.104819","url":null,"abstract":"<div><div>To enhance the resolution of synchrosqueezing transform (SST) in non-stationary signal representation, an optimization synchrosqueezed fractional wavelet transform (SSFRWT) is proposed, which possesses rigorous mathematical principle and high resolution. First, the definition, properties, and principles of SSFRWT are presented. On this basis, a time-fractional-frequency (TFF) analysis method is established utilizing SSFRWT. The experimental results demonstrate that SSFRWT is capable of establishing a high-resolution TFF representation for chirp-type signals, surpassing existing methods in terms of noise robustness and energy concentration. Lastly, leveraging the signal TFF representation, SSFRWT is successfully applied to the chirp signal parameter estimation and multi-component signal separation, yielding superior estimation results and reconstructed signal compared to SST. Notably, SSFRWT is also innovatively employed in the field of optical measurement, achieving high-precision measurement of the curvature radius of convex lens.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104819"},"PeriodicalIF":2.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142560670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.dsp.2024.104830
Van Son Nguyen , Bui Anh Duc , Tran Manh Hoang , Xuan Nam Tran , Pham Thanh Hiep , Nguyen Thu Phuong
In this paper, we investigate the user throughputs of a Cell-Free (CF) system with multiple aerial relay stations (ARSs), where each ARS is defined as an unmanned aerial vehicle (UAV)-mounted relay station. The system operates under a decode-and-forward (DF) protocol and facilitates connectivity between a terrestrial base station (TBS) and terrestrial users. ARSs are equipped with multiple antennas and simultaneously serve users that are outfitted with single antennas and distributed in a specific area. Additionally, a small-cell (SC) system based on the CF structure, where each ARS serves one user with the best channel conditions, is also considered. We analyze system communication in two stages, including user-ARS links and ARS-TBS links, and then we derive expressions for the data rate of users and ARSs. Moreover, we propose the spatial pilot reassignment (SPR) algorithm to optimize pilot assignment, enhancing channel estimation over random pilot assignment methods. The user throughput is evaluated by altering several system parameters, including the with/without data power control, the number of users, the number of ARSs, and the time interval allocated for channel estimation. The results show that the SPR algorithm improves throughput by about 10% compared to the random pilot assignment method at a 90%-likely user throughput, which is equal to a cumulative distribution function value of 0.1.
本文研究了带有多个空中中继站(ARS)的无蜂窝(CF)系统的用户吞吐量,其中每个 ARS 被定义为无人机(UAV)安装的中继站。该系统根据解码转发(DF)协议运行,促进地面基站(TBS)与地面用户之间的连接。中继站配备多根天线,同时为配备单根天线并分布在特定区域的用户提供服务。此外,我们还考虑了基于 CF 结构的小蜂窝(SC)系统,即每个 ARS 在最佳信道条件下为一个用户提供服务。我们分两个阶段分析系统通信,包括用户-ARS 链路和 ARS-TBS 链路,然后推导出用户和 ARS 的数据速率表达式。此外,我们还提出了优化先导分配的空间先导重分配(SPR)算法,与随机先导分配方法相比,该算法增强了信道估计能力。通过改变几个系统参数,包括有/无数据功率控制、用户数量、ARS 数量以及分配给信道估计的时间间隔,对用户吞吐量进行了评估。结果表明,在 90% 的用户吞吐量(相当于累积分布函数值 0.1)时,SPR 算法比随机先导分配方法提高了约 10% 的吞吐量。
{"title":"Spatial pilot reassignment algorithm for channel estimation stage of cell-free multi-ARS communication systems","authors":"Van Son Nguyen , Bui Anh Duc , Tran Manh Hoang , Xuan Nam Tran , Pham Thanh Hiep , Nguyen Thu Phuong","doi":"10.1016/j.dsp.2024.104830","DOIUrl":"10.1016/j.dsp.2024.104830","url":null,"abstract":"<div><div>In this paper, we investigate the user throughputs of a Cell-Free (CF) system with multiple aerial relay stations (ARSs), where each ARS is defined as an unmanned aerial vehicle (UAV)-mounted relay station. The system operates under a decode-and-forward (DF) protocol and facilitates connectivity between a terrestrial base station (TBS) and terrestrial users. ARSs are equipped with multiple antennas and simultaneously serve users that are outfitted with single antennas and distributed in a specific area. Additionally, a small-cell (SC) system based on the CF structure, where each ARS serves one user with the best channel conditions, is also considered. We analyze system communication in two stages, including user-ARS links and ARS-TBS links, and then we derive expressions for the data rate of users and ARSs. Moreover, we propose the spatial pilot reassignment (SPR) algorithm to optimize pilot assignment, enhancing channel estimation over random pilot assignment methods. The user throughput is evaluated by altering several system parameters, including the with/without data power control, the number of users, the number of ARSs, and the time interval allocated for channel estimation. The results show that the SPR algorithm improves throughput by about 10% compared to the random pilot assignment method at a 90%-likely user throughput, which is equal to a cumulative distribution function value of 0.1.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104830"},"PeriodicalIF":2.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.dsp.2024.104831
Jiaqiang Yang , Danyang Qin , Huapeng Tang , Sili Tao , Haoze Bie , Lin Ma
As a key application of Internet of Things (IoT) technology, visual localization plays an important role in everyday life. However, pedestrians in images can obstruct environmental features, negatively impacting the performance of visual localization systems. To address this issue, we propose a Spatial Pyramid-Enhanced MixVPR visual localization method (SPE-VL) that aims to enhance image feature descriptions through multi-scale spatial information, thereby mitigating the effects of pedestrian occlusion on localization accuracy. The SPE-VL method is divided into two main phases: sensor-based matching range constraint and image feature extraction and matching. In the matching range constraint phase, we propose a direction decision method based on a machine learning classifier that utilizes smartphone sensor data to restrict the direction of image matching, reducing the likelihood of mismatches. In the feature extraction and matching phase, we propose a Transformer-based feature cross-enhancement method that leverages local features and spatial contextual information to enhance features, improving both image retrieval accuracy and localization precision. Experimental results indicate that the SPE-VL method demonstrates higher localization accuracy and robustness compared to existing state-of-the-art methods, providing new insights and solutions for the application of visual localization in complex environments.
{"title":"A novel spatial pyramid-enhanced indoor visual positioning method","authors":"Jiaqiang Yang , Danyang Qin , Huapeng Tang , Sili Tao , Haoze Bie , Lin Ma","doi":"10.1016/j.dsp.2024.104831","DOIUrl":"10.1016/j.dsp.2024.104831","url":null,"abstract":"<div><div>As a key application of Internet of Things (IoT) technology, visual localization plays an important role in everyday life. However, pedestrians in images can obstruct environmental features, negatively impacting the performance of visual localization systems. To address this issue, we propose a Spatial Pyramid-Enhanced MixVPR visual localization method (SPE-VL) that aims to enhance image feature descriptions through multi-scale spatial information, thereby mitigating the effects of pedestrian occlusion on localization accuracy. The SPE-VL method is divided into two main phases: sensor-based matching range constraint and image feature extraction and matching. In the matching range constraint phase, we propose a direction decision method based on a machine learning classifier that utilizes smartphone sensor data to restrict the direction of image matching, reducing the likelihood of mismatches. In the feature extraction and matching phase, we propose a Transformer-based feature cross-enhancement method that leverages local features and spatial contextual information to enhance features, improving both image retrieval accuracy and localization precision. Experimental results indicate that the SPE-VL method demonstrates higher localization accuracy and robustness compared to existing state-of-the-art methods, providing new insights and solutions for the application of visual localization in complex environments.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104831"},"PeriodicalIF":2.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.dsp.2024.104834
Qilei Xu , Longen Liu , Fangkun Zhang , Xu Ma , Ke Sun , Fengying Cui
The real-time and accurate recognition of abnormal behavior among factory personnel helps enhance their awareness of hazardous environments, thereby reducing the occurrence of accidents. This paper proposes a behavior recognition network based on an attention mechanism and a high-efficiency convolution module. The Bi-Level Routing Attention was introduced to the backbone network, thus enhancing the attention of the recognition network to the target region effectively. The recognition accuracy was further strengthened by improving the neck network based on the ConvNeXt Block module while reducing the model complexity. Thirteen additional recognition models were constructed to enhance the original network from various perspectives. Subsequently, the mean average precision and detection speed of each model were evaluated. Experimental results demonstrated that the detection accuracy of the target recognition network proposed in this paper has been significantly improved, the detection speed meets the real-time requirements, and the comprehensive performance is the most superior compared with other diverse and improved networks. The proposed recognition model can accurately identify a variety of factory personnel's abnormal behaviors in real-time, and it has practical application significance for the problem of personnel safety identification in the factory.
{"title":"An intelligent recognition method of factory personnel behavior based on deep learning","authors":"Qilei Xu , Longen Liu , Fangkun Zhang , Xu Ma , Ke Sun , Fengying Cui","doi":"10.1016/j.dsp.2024.104834","DOIUrl":"10.1016/j.dsp.2024.104834","url":null,"abstract":"<div><div>The real-time and accurate recognition of abnormal behavior among factory personnel helps enhance their awareness of hazardous environments, thereby reducing the occurrence of accidents. This paper proposes a behavior recognition network based on an attention mechanism and a high-efficiency convolution module. The Bi-Level Routing Attention was introduced to the backbone network, thus enhancing the attention of the recognition network to the target region effectively. The recognition accuracy was further strengthened by improving the neck network based on the ConvNeXt Block module while reducing the model complexity. Thirteen additional recognition models were constructed to enhance the original network from various perspectives. Subsequently, the mean average precision and detection speed of each model were evaluated. Experimental results demonstrated that the detection accuracy of the target recognition network proposed in this paper has been significantly improved, the detection speed meets the real-time requirements, and the comprehensive performance is the most superior compared with other diverse and improved networks. The proposed recognition model can accurately identify a variety of factory personnel's abnormal behaviors in real-time, and it has practical application significance for the problem of personnel safety identification in the factory.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104834"},"PeriodicalIF":2.9,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}