首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame Methods 用单帧方法增强低照度视频中时间一致性的 RNN
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-08 DOI: 10.1109/LSP.2024.3475969
Claudio Rota;Marco Buzzelli;Simone Bianco;Raimondo Schettini
Low-light video enhancement (LLVE) has received little attention compared to low-light image enhancement (LLIE) mainly due to the lack of paired low-/normal-light video datasets. Consequently, a common approach to LLVE is to enhance each video frame individually using LLIE methods. However, this practice introduces temporal inconsistencies in the resulting video. In this work, we propose a recurrent neural network (RNN) that, given a low-light video and its per-frame enhanced version, produces a temporally consistent video preserving the underlying frame-based enhancement. We achieve this by training our network with a combination of a new forward-backward temporal consistency loss and a content-preserving loss. At inference time, we can use our trained network to correct videos processed by any LLIE method. Experimental results show that our method achieves the best trade-off between temporal consistency improvement and fidelity with the per-frame enhanced video, exhibiting a lower memory complexity and comparable time complexity with respect to other state-of-the-art methods for temporal consistency.
与低照度图像增强(LLIE)相比,低照度视频增强(LLVE)很少受到关注,主要原因是缺乏低照度/正常照度成对视频数据集。因此,LLVE 的常见方法是使用 LLIE 方法单独增强每个视频帧。然而,这种做法会在生成的视频中引入时间不一致性。在这项工作中,我们提出了一种递归神经网络 (RNN),它能在给定低照度视频及其每帧增强版本的情况下,生成一个时间上一致的视频,并保留基本的基于帧的增强。为此,我们结合新的前向-后向时间一致性损失和内容保护损失来训练我们的网络。在推理时,我们可以使用训练有素的网络修正任何 LLIE 方法处理过的视频。实验结果表明,我们的方法实现了时间一致性改进与每帧增强视频保真度之间的最佳权衡,与其他最先进的时间一致性方法相比,内存复杂度更低,时间复杂度相当。
{"title":"A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame Methods","authors":"Claudio Rota;Marco Buzzelli;Simone Bianco;Raimondo Schettini","doi":"10.1109/LSP.2024.3475969","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475969","url":null,"abstract":"Low-light video enhancement (LLVE) has received little attention compared to low-light image enhancement (LLIE) mainly due to the lack of paired low-/normal-light video datasets. Consequently, a common approach to LLVE is to enhance each video frame individually using LLIE methods. However, this practice introduces temporal inconsistencies in the resulting video. In this work, we propose a recurrent neural network (RNN) that, given a low-light video and its per-frame enhanced version, produces a temporally consistent video preserving the underlying frame-based enhancement. We achieve this by training our network with a combination of a new forward-backward temporal consistency loss and a content-preserving loss. At inference time, we can use our trained network to correct videos processed by any LLIE method. Experimental results show that our method achieves the best trade-off between temporal consistency improvement and fidelity with the per-frame enhanced video, exhibiting a lower memory complexity and comparable time complexity with respect to other state-of-the-art methods for temporal consistency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototypical Metric Segment Anything Model for Data-Free Few-Shot Semantic Segmentation 用于无数据少镜头语义分割的原型度量分割 Anything 模型
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-08 DOI: 10.1109/LSP.2024.3476208
Zhiyu Jiang;Ye Yuan;Yuan Yuan
Few-shot semantic segmentation (FSS) is crucial for image interpretation, yet it is constrained by requirements for extensive base data and a narrow focus on foreground-background differentiation. This work introduces Data-free Few-shot Semantic Segmentation (DFSS), a task that requires limited labeled images and forgoes the need for extensive base data, allowing for comprehensive image segmentation. The proposed method utilizes the Segment Anything Model (SAM) for its generalization capabilities. The Prototypical Metric Segment Anything Model is introduced, featuring an initial segmentation phase followed by prototype matching, effectively addressing the learning challenges posed by limited data. To enhance discrimination in multi-class segmentation, the Supervised Prototypical Contrastive Loss (SPCL) is designed to refine prototype features, ensuring intra-class cohesion and inter-class separation. To further accommodate intra-class variability, the Adaptive Prototype Update (APU) strategy dynamically refines prototypes, adapting the model to class heterogeneity. The method's effectiveness is demonstrated through superior performance over existing techniques on the DFSS task, marking a significant advancement in UAV image segmentation.
少镜头语义分割(FSS)对图像判读至关重要,但它受制于对大量基础数据的要求和对前景-背景区分的狭隘关注。这项工作引入了无数据少镜头语义分割(DFSS),这项任务需要有限的标记图像,放弃了对大量基础数据的需求,从而实现了全面的图像分割。所提出的方法利用了 Segment Anything Model (SAM) 的泛化能力。该方法引入了原型度量分割模型(Prototyical Metric Segment Anything Model),其特点是在初始分割阶段之后进行原型匹配,从而有效地解决了有限数据带来的学习挑战。为了提高多类分割的辨别能力,设计了监督原型对比损失(SPCL)来完善原型特征,确保类内内聚和类间分离。为了进一步适应类内变异,自适应原型更新(APU)策略可动态完善原型,使模型适应类的异质性。该方法在 DFSS 任务中的表现优于现有技术,证明了其有效性,标志着无人机图像分割技术的重大进步。
{"title":"Prototypical Metric Segment Anything Model for Data-Free Few-Shot Semantic Segmentation","authors":"Zhiyu Jiang;Ye Yuan;Yuan Yuan","doi":"10.1109/LSP.2024.3476208","DOIUrl":"https://doi.org/10.1109/LSP.2024.3476208","url":null,"abstract":"Few-shot semantic segmentation (FSS) is crucial for image interpretation, yet it is constrained by requirements for extensive base data and a narrow focus on foreground-background differentiation. This work introduces Data-free Few-shot Semantic Segmentation (DFSS), a task that requires limited labeled images and forgoes the need for extensive base data, allowing for comprehensive image segmentation. The proposed method utilizes the Segment Anything Model (SAM) for its generalization capabilities. The Prototypical Metric Segment Anything Model is introduced, featuring an initial segmentation phase followed by prototype matching, effectively addressing the learning challenges posed by limited data. To enhance discrimination in multi-class segmentation, the Supervised Prototypical Contrastive Loss (SPCL) is designed to refine prototype features, ensuring intra-class cohesion and inter-class separation. To further accommodate intra-class variability, the Adaptive Prototype Update (APU) strategy dynamically refines prototypes, adapting the model to class heterogeneity. The method's effectiveness is demonstrated through superior performance over existing techniques on the DFSS task, marking a significant advancement in UAV image segmentation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RS-2-BP: A Unified Deep Learning Framework for Deriving EIT-Based Breathing Patterns From Respiratory Sounds RS-2-BP:从呼吸声中得出基于 EIT 的呼吸模式的统一深度学习框架
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-08 DOI: 10.1109/LSP.2024.3475358
Arka Roy;Udit Satija
Respiratory disorders have become the third largest cause of death worldwide, which can be assessed by one of the two key diagnostic modalities: breathing patterns (BPs) or the airflow signals, and respiratory sounds (RSs). In recent years, few studies have been conducted on finding correlation between these two modalities which indicate the structural flaws of lungs under disease condition. In this letter, we propose ‘RS-2-BP’: a unified deep learning framework for deriving the electrical impedance tomography-based airflow signals from respiratory sounds using a hybrid neural network architecture, namely ReSTL, that comprises cascaded standard and residual shrinkage convolution blocks, followed by feature refined transformer encoders and long-short term memory (LSTM) units. The proposed framework is extensively evaluated using the publicly available BRACETS dataset. Experimental results suggest that our ReSTL can accurately derive the BPs from RSs with an average mean absolute error of $0.024pm 0.011, ,0.436pm 0.120, ,0.020pm 0.011,,0.134pm 0.068$, and $0.031pm 0.019$, respectively for five different tasks. Furthermore, these derived BPs can be used for extracting different respiratory vitals, identifying disease conditions efficiently, and retrieving salient breathing cycle information from the RSs.
呼吸系统疾病已成为全球第三大死因,可通过呼吸模式(BPs)或气流信号和呼吸音(RSs)这两种主要诊断方式之一进行评估。近年来,很少有研究发现这两种方式之间的相关性,而这两种方式能显示疾病状态下肺部的结构缺陷。在这封信中,我们提出了 "RS-2-BP":一种统一的深度学习框架,利用混合神经网络架构(即 ReSTL)从呼吸声中推导出基于电阻抗断层扫描的气流信号,该架构包括级联标准和残差收缩卷积块,然后是特征精炼变压器编码器和长短期记忆(LSTM)单元。我们使用公开的 BRACETS 数据集对所提出的框架进行了广泛评估。实验结果表明,对于五个不同的任务,我们的ReSTL可以准确地从RS中推导出BPs,平均绝对误差分别为0.024/pm 0.011美元、0.436/pm 0.120美元、0.020/pm 0.011美元、0.134/pm 0.068美元和0.031/pm 0.019美元。此外,这些推导出的 BPs 可用于提取不同的呼吸生命体征、有效识别疾病状况,以及从 RSs 中检索显著的呼吸周期信息。
{"title":"RS-2-BP: A Unified Deep Learning Framework for Deriving EIT-Based Breathing Patterns From Respiratory Sounds","authors":"Arka Roy;Udit Satija","doi":"10.1109/LSP.2024.3475358","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475358","url":null,"abstract":"Respiratory disorders have become the third largest cause of death worldwide, which can be assessed by one of the two key diagnostic modalities: breathing patterns (BPs) or the airflow signals, and respiratory sounds (RSs). In recent years, few studies have been conducted on finding correlation between these two modalities which indicate the structural flaws of lungs under disease condition. In this letter, we propose ‘RS-2-BP’: a unified deep learning framework for deriving the electrical impedance tomography-based airflow signals from respiratory sounds using a hybrid neural network architecture, namely ReSTL, that comprises cascaded standard and residual shrinkage convolution blocks, followed by feature refined transformer encoders and long-short term memory (LSTM) units. The proposed framework is extensively evaluated using the publicly available BRACETS dataset. Experimental results suggest that our ReSTL can accurately derive the BPs from RSs with an average mean absolute error of \u0000<inline-formula><tex-math>$0.024pm 0.011, ,0.436pm 0.120, ,0.020pm 0.011,,0.134pm 0.068$</tex-math></inline-formula>\u0000, and \u0000<inline-formula><tex-math>$0.031pm 0.019$</tex-math></inline-formula>\u0000, respectively for five different tasks. Furthermore, these derived BPs can be used for extracting different respiratory vitals, identifying disease conditions efficiently, and retrieving salient breathing cycle information from the RSs.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142434635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MUSIC Based Multipath Delay Estimation Method in the Fractional Domain for OFDM-LFM 基于 MUSIC 的分数域 OFDM-LFM 多径延迟估计方法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-07 DOI: 10.1109/LSP.2024.3475356
Jiaojiao Liu;Erdi Chen;Nan Sun;Biyun Ma
This letter proposes a high-resolution multipath time delay estimation (TDE) method for orthogonal frequency division multiplexing linear frequency modulation (OFDM-LFM) signals. Leveraging the expression of OFDM-LFM signals in the fractional domain, where the compressed subcarriers conform to a linear uniform arrangement, the algorithm combines with the multiple signal classification (MUSIC) algorithm for TDE. Simulation results show that regardless of the presence of Doppler effect, OFDM-LFM results in less relative root mean square error (RRMSE) compared to orthogonal frequency division multiplexing (OFDM). Furthermore, the superiority of OFDM-LFM signals is particularly evident at lower signal-to-noise ratios (SNRs). So the proposed algorithm offers promising implications for TDE in mobile scenarios.
本文提出了一种针对正交频分复用线性频率调制(OFDM-LFM)信号的高分辨率多径时延估计(TDE)方法。利用 OFDM-LFM 信号在分数域中的表达式(其中压缩的子载波符合线性均匀排列),该算法结合多信号分类(MUSIC)算法进行 TDE。仿真结果表明,与正交频分复用(OFDM)相比,无论是否存在多普勒效应,OFDM-LFM 的相对均方根误差(RRMSE)都更小。此外,在信噪比(SNR)较低的情况下,OFDM-LFM 信号的优势尤为明显。因此,所提出的算法为移动场景中的 TDE 提供了广阔的前景。
{"title":"MUSIC Based Multipath Delay Estimation Method in the Fractional Domain for OFDM-LFM","authors":"Jiaojiao Liu;Erdi Chen;Nan Sun;Biyun Ma","doi":"10.1109/LSP.2024.3475356","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475356","url":null,"abstract":"This letter proposes a high-resolution multipath time delay estimation (TDE) method for orthogonal frequency division multiplexing linear frequency modulation (OFDM-LFM) signals. Leveraging the expression of OFDM-LFM signals in the fractional domain, where the compressed subcarriers conform to a linear uniform arrangement, the algorithm combines with the multiple signal classification (MUSIC) algorithm for TDE. Simulation results show that regardless of the presence of Doppler effect, OFDM-LFM results in less relative root mean square error (RRMSE) compared to orthogonal frequency division multiplexing (OFDM). Furthermore, the superiority of OFDM-LFM signals is particularly evident at lower signal-to-noise ratios (SNRs). So the proposed algorithm offers promising implications for TDE in mobile scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kendall's Tau Based Spectrum Sensing for Cognitive Radio in the Presence of Laplace Noise 基于 Kendall's Tau 的认知无线电拉普拉斯噪声下的频谱传感
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-07 DOI: 10.1109/LSP.2024.3475916
Yongjian Huang;Huadong Lai;Jisheng Dai;Weichao Xu
In the presence of non-Gaussian noise, traditional spectrum sensing techniques optimized for Gaussian noise may experience significant performance degradation. To address this challenge, this paper employs Kendall's tau (KT) as a detector to detect the primary signal in additive Laplace noise. Unlike techniques relying on fundamental information from raw observation data, this detector utilizes ranks to reduce the impact of impulsive component, thus being robust against large valued outliers. The analytic expressions concerning the expectation and variance of KT under Laplace noise are firstly established. Performance analyses are further conducted in terms of false alarm probability and detection probability. Monte Carlo simulations not only verified the correctness of the established theoretical results, but also demonstrated the superiority of KT over other commonly used methods in terms of detection probability under Laplace noise.
在存在非高斯噪声的情况下,针对高斯噪声进行优化的传统频谱传感技术可能会出现明显的性能下降。为应对这一挑战,本文采用 Kendall's tau (KT) 作为检测器,在加性拉普拉斯噪声中检测主信号。与依赖原始观测数据基本信息的技术不同,该检测器利用等级来减少脉冲成分的影响,从而对大值异常值具有鲁棒性。首先建立了拉普拉斯噪声下 KT 的期望和方差的解析表达式。并进一步从误报概率和检测概率两个方面进行了性能分析。蒙特卡罗模拟不仅验证了所建立的理论结果的正确性,还证明了 KT 在拉普拉斯噪声下的检测概率优于其他常用方法。
{"title":"Kendall's Tau Based Spectrum Sensing for Cognitive Radio in the Presence of Laplace Noise","authors":"Yongjian Huang;Huadong Lai;Jisheng Dai;Weichao Xu","doi":"10.1109/LSP.2024.3475916","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475916","url":null,"abstract":"In the presence of non-Gaussian noise, traditional spectrum sensing techniques optimized for Gaussian noise may experience significant performance degradation. To address this challenge, this paper employs Kendall's tau (KT) as a detector to detect the primary signal in additive Laplace noise. Unlike techniques relying on fundamental information from raw observation data, this detector utilizes ranks to reduce the impact of impulsive component, thus being robust against large valued outliers. The analytic expressions concerning the expectation and variance of KT under Laplace noise are firstly established. Performance analyses are further conducted in terms of false alarm probability and detection probability. Monte Carlo simulations not only verified the correctness of the established theoretical results, but also demonstrated the superiority of KT over other commonly used methods in terms of detection probability under Laplace noise.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142447071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Subband Adaptive Filters for Resilience Against Unanticipated Signal Truncation 优化子带自适应滤波器以抵御意外信号截断
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-07 DOI: 10.1109/LSP.2024.3475349
Yuhong Wang;Xu Zhou;Zongsheng Zheng
This letter addresses a common issue in engineering applications: unanticipated signal truncation events caused by the mismatch between the operational range of measurement devices and the signals to be measured. Under such circumstances, the conventional normalized subband adaptive filtering (NSAF) algorithm significantly underperforms and may even fail to converge. To tackle this issue, we propose an improved NSAF algorithm. We introduce an expectation maximization framework to address the maximum likelihood estimation before the subband adaptive filter, specifically to handle double-sided signal truncation. This new approach leads to an NSAF for unanticipated truncation (UT-NSAF), which has been theoretically and numerically proven to be unbiased. Importantly, our research demonstrates that UT-NSAF significantly outperforms other algorithms in terms of estimation accuracy and convergence speed. Notably, the steady-state solution of UT-NSAF remains almost unaffected by varying truncation thresholds, showing robustness crucial for dealing with various unexpected signal truncation scenarios in engineering applications.
这封信讨论了工程应用中的一个常见问题:由于测量设备的工作范围与待测信号不匹配而导致的意外信号截断事件。在这种情况下,传统的归一化子带自适应滤波(NSAF)算法性能明显不足,甚至可能无法收敛。为了解决这个问题,我们提出了一种改进的 NSAF 算法。我们引入了期望最大化框架来处理子带自适应滤波前的最大似然估计,特别是处理双面信号截断。这种新方法产生了一种用于非预期截断的 NSAF(UT-NSAF),该算法在理论和数值上都证明是无偏的。重要的是,我们的研究表明,UT-NSAF 在估计精度和收敛速度方面明显优于其他算法。值得注意的是,UT-NSAF 的稳态解几乎不受截断阈值变化的影响,这表明其鲁棒性对于处理工程应用中的各种意外信号截断情况至关重要。
{"title":"Optimizing Subband Adaptive Filters for Resilience Against Unanticipated Signal Truncation","authors":"Yuhong Wang;Xu Zhou;Zongsheng Zheng","doi":"10.1109/LSP.2024.3475349","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475349","url":null,"abstract":"This letter addresses a common issue in engineering applications: unanticipated signal truncation events caused by the mismatch between the operational range of measurement devices and the signals to be measured. Under such circumstances, the conventional normalized subband adaptive filtering (NSAF) algorithm significantly underperforms and may even fail to converge. To tackle this issue, we propose an improved NSAF algorithm. We introduce an expectation maximization framework to address the maximum likelihood estimation before the subband adaptive filter, specifically to handle double-sided signal truncation. This new approach leads to an NSAF for unanticipated truncation (UT-NSAF), which has been theoretically and numerically proven to be unbiased. Importantly, our research demonstrates that UT-NSAF significantly outperforms other algorithms in terms of estimation accuracy and convergence speed. Notably, the steady-state solution of UT-NSAF remains almost unaffected by varying truncation thresholds, showing robustness crucial for dealing with various unexpected signal truncation scenarios in engineering applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142434536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection 利用双级优化实现声音事件定位和检测的自动音频数据增强网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-07 DOI: 10.1109/LSP.2024.3475350
Wenjie Zhang;Peng Yu;Jun Yin;Xiaoheng Jiang;Mingliang Xu
In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.
在声音事件定位和检测(SELD)中,传统方法通常将定位和检测算法与数据增强分开处理。在模型训练过程中,数据增强策略通常以不可学习的方式实施。现有的音频数据增强策略难以找到可有效应用于 SELD 系统的最佳数据增强参数解决方案。为了应对这一挑战,我们引入了一种基于网络的创新策略,称为自动音频数据增强(AADA)网络。该策略采用双层优化技术,将音频数据增强技术与 SELD 任务协同整合。在 AADA 网络中,低层次的 SELD 任务是高层次数据增强过程的约束条件。音频数据增强参数利用来自 SELD 任务的中间特征信息传输进行自适应优化,从而为这些任务获得最佳参数。在 Sony-TAU Realistic Spatial Soundscapes 2023 数据集上对我们的方法进行评估后,SELD 得分为 0.4801,大大超过了所有传统 SELD 数据增强策略的性能指标。
{"title":"Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection","authors":"Wenjie Zhang;Peng Yu;Jun Yin;Xiaoheng Jiang;Mingliang Xu","doi":"10.1109/LSP.2024.3475350","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475350","url":null,"abstract":"In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142434568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Momentum Accelerated Algorithm for ReLU-Based Nonlinear Matrix Decomposition 基于 ReLU 的非线性矩阵分解的动量加速算法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-07 DOI: 10.1109/LSP.2024.3475910
Qingsong Wang;Chunfeng Cui;Deren Han
Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD), we propose a Tikhonov regularized ReLU-NMD model, referred to as ReLU-NMD-T. Subsequently, we introduce a momentum accelerated algorithm for handling the ReLU-NMD-T model. A distinctive feature, setting our work apart from most existing studies, is the incorporation of both positive and negative momentum parameters in our algorithm. Our numerical experiments on real-world datasets show the effectiveness of the proposed model and algorithm.
最近,由于非线性矩阵分解(NMD)与神经网络的密切联系,人们对它的探索兴趣日益浓厚。NMD 的目的是从稀疏的非负矩阵中找到一个低秩矩阵,该矩阵的每个元素都具有非线性函数。典型的选择是整流线性单元(ReLU)激活函数。为了解决现有基于 ReLU 的 NMD 模型(ReLU-NMD)中的过拟合问题,我们提出了一种 Tikhonov 正则化 ReLU-NMD 模型,简称为 ReLU-NMD-T。随后,我们介绍了一种处理 ReLU-NMD-T 模型的动量加速算法。我们的算法同时包含正动量参数和负动量参数,这是我们的工作有别于大多数现有研究的一个显著特点。我们在真实世界数据集上进行的数值实验表明,所提出的模型和算法非常有效。
{"title":"A Momentum Accelerated Algorithm for ReLU-Based Nonlinear Matrix Decomposition","authors":"Qingsong Wang;Chunfeng Cui;Deren Han","doi":"10.1109/LSP.2024.3475910","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475910","url":null,"abstract":"Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD), we propose a Tikhonov regularized ReLU-NMD model, referred to as ReLU-NMD-T. Subsequently, we introduce a momentum accelerated algorithm for handling the ReLU-NMD-T model. A distinctive feature, setting our work apart from most existing studies, is the incorporation of both positive and negative momentum parameters in our algorithm. Our numerical experiments on real-world datasets show the effectiveness of the proposed model and algorithm.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142452700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding Correlated Information Diffusion: From a Graphical Evolutionary Game Perspective 理解相关信息扩散:从图形进化博弈的视角出发
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-07 DOI: 10.1109/LSP.2024.3475353
Hong Hu;Zhuoqun Li;H. Vicky Zhao
In online social networks, millions of connected intelligent individuals actively interact with each other, which not only facilitates opinion sharing but also offers the platform to spread detrimental gossips and rumors. Therefore, it is of crucial importance to better understand how the avalanche of information propagates over social networks and affects our social life and economy. However, most model-based works on information diffusion either consider the spreading of one single message or assume that different information spreads independently. In this letter, we investigate how correlated information spreads together and jointly influences users' decisions from a graphical evolutionary game perspective. We model the multi-source information diffusion process, analyze the impact of information's correlation and time delay on the evolutionary dynamics and the evolutionary stable states (ESS). Simulation results on synthetic networks and Facebook real-world networks are consistent with our analytical results. This investigation offers important insights to the understanding and management of multi-source information diffusion.
在在线社交网络中,数以百万计相互联系的智能个体积极互动,这不仅促进了意见分享,也为有害流言和谣言的传播提供了平台。因此,更好地理解信息雪崩如何通过社交网络传播并影响我们的社会生活和经济至关重要。然而,大多数基于模型的信息扩散研究要么考虑单一信息的传播,要么假设不同信息的传播是独立的。在这封信中,我们从图形演化博弈的角度研究了相关信息是如何共同传播并共同影响用户决策的。我们建立了多源信息扩散过程模型,分析了信息的相关性和时间延迟对演化动态和演化稳定状态(ESS)的影响。在合成网络和 Facebook 真实世界网络上的仿真结果与我们的分析结果一致。这项研究为理解和管理多源信息扩散提供了重要启示。
{"title":"Understanding Correlated Information Diffusion: From a Graphical Evolutionary Game Perspective","authors":"Hong Hu;Zhuoqun Li;H. Vicky Zhao","doi":"10.1109/LSP.2024.3475353","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475353","url":null,"abstract":"In online social networks, millions of connected intelligent individuals actively interact with each other, which not only facilitates opinion sharing but also offers the platform to spread detrimental gossips and rumors. Therefore, it is of crucial importance to better understand how the avalanche of information propagates over social networks and affects our social life and economy. However, most model-based works on information diffusion either consider the spreading of one single message or assume that different information spreads independently. In this letter, we investigate how correlated information spreads together and jointly influences users' decisions from a graphical evolutionary game perspective. We model the multi-source information diffusion process, analyze the impact of information's correlation and time delay on the evolutionary dynamics and the evolutionary stable states (ESS). Simulation results on synthetic networks and Facebook real-world networks are consistent with our analytical results. This investigation offers important insights to the understanding and management of multi-source information diffusion.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random Tensor Analysis: Outlier Detection and Sample-Size Determination 随机张量分析:离群点检测和样本大小确定
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-07 DOI: 10.1109/LSP.2024.3475909
Shih Yu Chang;Hsiao-Chun Wu
High-dimensional signal processing and data analysis have been appealing to researchers in recent decades. Outlier detection and sample-size determination are two essential pre-processing tasks for many signal processing applications. However, fast outlier detection for tensor data with arbitrary orders is still in high demand. Furthermore, sample-size determination for random tensor data has not been addressed in the literature. To fill this knowledge gap, we first derive new tensor Chernoff tail-bounds for random Hermitian tensors. According to our derived tail-bounds, we propose a novel approach for joint outlier detection and sample-size determination. The mathematical relationship among outlier-threshold (sample-size-threshold) probability, outlier-threshold spectrum, and critical sample-size along with the computational-complexity reduction brought by our proposed new analytic approach over the existing methods is also investigated through numerical evaluation over a variety of real tensor data.
近几十年来,高维信号处理和数据分析一直吸引着研究人员。离群点检测和样本大小确定是许多信号处理应用中必不可少的两项预处理任务。然而,对于任意阶张量数据的快速离群点检测仍有很高的要求。此外,对于随机张量数据的样本大小确定,文献中还没有涉及。为了填补这一知识空白,我们首先推导出随机赫尔墨斯张量的新张量切尔诺夫尾界。根据我们推导出的尾界,我们提出了一种联合离群点检测和样本大小确定的新方法。我们还通过对各种真实张量数据的数值评估,研究了离群值阈值(样本大小阈值)概率、离群值阈值频谱和临界样本大小之间的数学关系,以及我们提出的新分析方法相对于现有方法所带来的计算复杂性的降低。
{"title":"Random Tensor Analysis: Outlier Detection and Sample-Size Determination","authors":"Shih Yu Chang;Hsiao-Chun Wu","doi":"10.1109/LSP.2024.3475909","DOIUrl":"https://doi.org/10.1109/LSP.2024.3475909","url":null,"abstract":"High-dimensional signal processing and data analysis have been appealing to researchers in recent decades. Outlier detection and sample-size determination are two essential pre-processing tasks for many signal processing applications. However, fast outlier detection for tensor data with arbitrary orders is still in high demand. Furthermore, sample-size determination for random tensor data has not been addressed in the literature. To fill this knowledge gap, we first derive new tensor Chernoff tail-bounds for random Hermitian tensors. According to our derived tail-bounds, we propose a novel approach for joint outlier detection and sample-size determination. The mathematical relationship among outlier-threshold (sample-size-threshold) probability, outlier-threshold spectrum, and critical sample-size along with the computational-complexity reduction brought by our proposed new analytic approach over the existing methods is also investigated through numerical evaluation over a variety of real tensor data.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142447022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1