首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
KFA: Keyword Feature Augmentation for Open Set Keyword Spotting KFA:用于发现开放集关键词的关键词特征增强技术
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-22 DOI: 10.1109/LSP.2024.3484932
Kyungdeuk Ko;Bokyeung Lee;Jonghwan Hong;Hanseok Ko
In recent years, with the advancement of deep learning technology and the emergence of smart devices, there has been a growing interest in keyword spotting (KWS), which is used to activate AI systems with automatic speech recognition and text-to-speech. However, smart devices with KWS often encounter false alarm errors when inputting unexpected words. To address this issue, existing KWS methods typically train non-target words as an unknown class. Despite these efforts, there is still a possibility that unseen words not trained as part of the unknown class could be misclassified as one of the target words. To overcome this limitation, we propose a new method named Keyword Feature Augmentation (KFA) for open-set KWS. KFA performs feature augmentation through adversarial learning to increase the loss. The augmented features are constrained within a limited space using label smoothing. Unlike other generative model-based open set recognition (OSR) methods, KFA does not require any additional training parameters or repeated operation for inference. As a result, KFA has achieved a 0.955 AUROC score and 97.34% target class accuracy for Google Speech Commands V1, and a 0.959 AUROC score and 98.17% target class accuracy for Google Speech Commands V2, which is the highest performance when compared to various OSR methods.
近年来,随着深度学习技术的发展和智能设备的出现,人们对关键词识别(KWS)越来越感兴趣,它被用来激活具有自动语音识别和文本转语音功能的人工智能系统。然而,带有 KWS 的智能设备在输入意外词语时经常会遇到误报错误。为了解决这个问题,现有的 KWS 方法通常将非目标词作为未知类进行训练。尽管做出了这些努力,但未被训练为未知类的未知单词仍有可能被误判为目标单词之一。为了克服这一局限性,我们为开放集 KWS 提出了一种名为关键词特征增强(KFA)的新方法。KFA 通过对抗学习进行特征增强,以增加损失。使用标签平滑法将增强特征限制在有限的空间内。与其他基于生成模型的开放集识别(OSR)方法不同,KFA 不需要任何额外的训练参数或重复推理操作。因此,KFA 在谷歌语音命令 V1 中获得了 0.955 AUROC 分数和 97.34% 的目标类别准确率,在谷歌语音命令 V2 中获得了 0.959 AUROC 分数和 98.17% 的目标类别准确率,是与各种 OSR 方法相比性能最高的。
{"title":"KFA: Keyword Feature Augmentation for Open Set Keyword Spotting","authors":"Kyungdeuk Ko;Bokyeung Lee;Jonghwan Hong;Hanseok Ko","doi":"10.1109/LSP.2024.3484932","DOIUrl":"https://doi.org/10.1109/LSP.2024.3484932","url":null,"abstract":"In recent years, with the advancement of deep learning technology and the emergence of smart devices, there has been a growing interest in keyword spotting (KWS), which is used to activate AI systems with automatic speech recognition and text-to-speech. However, smart devices with KWS often encounter false alarm errors when inputting unexpected words. To address this issue, existing KWS methods typically train non-target words as an \u0000<italic>unknown</i>\u0000 class. Despite these efforts, there is still a possibility that unseen words not trained as part of the \u0000<italic>unknown</i>\u0000 class could be misclassified as one of the target words. To overcome this limitation, we propose a new method named Keyword Feature Augmentation (KFA) for open-set KWS. KFA performs feature augmentation through adversarial learning to increase the loss. The augmented features are constrained within a limited space using label smoothing. Unlike other generative model-based open set recognition (OSR) methods, KFA does not require any additional training parameters or repeated operation for inference. As a result, KFA has achieved a 0.955 AUROC score and 97.34% target class accuracy for Google Speech Commands V1, and a 0.959 AUROC score and 98.17% target class accuracy for Google Speech Commands V2, which is the highest performance when compared to various OSR methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning 音频曼巴用于音频表征学习的双向状态空间模型
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483009
Mehmet Hamza Erol;Arda Senocak;Jiu Feng;Joon Son Chung
Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision tasks in this regard. In this study, we explore whether reliance on self-attention is necessary for audio classification tasks. By introducing Audio Mamba (AuM), the first self-attention-free, purely SSM-based model for audio classification, we aim to address this question. We evaluate AuM on various audio datasets - comprising six different benchmarks - where it achieves comparable or better performance compared to well-established AST model.
变换器已迅速成为音频分类的首选,超过了基于 CNN 的方法。然而,音频频谱图变换器(AST)会因自关注而产生二次缩放。消除这种二次自注意成本是一个很有吸引力的方向。最近,状态空间模型(SSM),如 Mamba,在语言和视觉任务中展示了这方面的潜力。在本研究中,我们将探讨在音频分类任务中是否有必要依赖自我注意。通过引入 Audio Mamba (AuM),我们旨在解决这个问题,AuM 是第一个不依赖自我注意力、纯粹基于 SSM 的音频分类模型。我们在各种音频数据集(包括六个不同的基准)上对 AuM 进行了评估,结果表明它与成熟的 AST 模型相比,性能相当甚至更好。
{"title":"Audio Mamba: Bidirectional State Space Model for Audio Representation Learning","authors":"Mehmet Hamza Erol;Arda Senocak;Jiu Feng;Joon Son Chung","doi":"10.1109/LSP.2024.3483009","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483009","url":null,"abstract":"Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision tasks in this regard. In this study, we explore whether reliance on self-attention is necessary for audio classification tasks. By introducing Audio Mamba (AuM), the first self-attention-free, purely SSM-based model for audio classification, we aim to address this question. We evaluate AuM on various audio datasets - comprising six different benchmarks - where it achieves comparable or better performance compared to well-established AST model.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
System-Informed Neural Network for Frequency Detection 用于频率检测的系统信息神经网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483036
Sunyoung Ko;Myoungin Shin;Geunhwan Kim;Youngmin Choo
We contrive a deep learning-based frequency analysis scheme called system-informed neural network (SINN) by considering the corresponding linear system model. SINN adopts the adaptive learned iterative soft shrinkage algorithm as the NN architecture and includes the system model in loss function. It has good generalization with fast processing time and finds a solution that satisfies the system model as a physics-informed neural network. To further improve SINN, multiple measurements are exploited by assuming the existence of common frequency components over the measurements. SINN is examined using simulated acoustic data, and the performance is compared to Fourier transform and sparse Bayesian learning (SBL) in terms of the detection/false alarm rate and mean squared error. SINN exhibits clear frequency components in in-situ data tests, as in SBL, by reducing noise effectively. Finally, SINN is applied to noisy passive sonar signals, which include 43 frequency components, and many are recovered.
考虑到相应的线性系统模型,我们设计了一种基于深度学习的频率分析方案,称为系统信息神经网络(SINN)。SINN 采用自适应学习迭代软收缩算法作为神经网络架构,并在损失函数中包含系统模型。作为一种物理信息神经网络,它具有良好的泛化能力和快速的处理时间,并能找到满足系统模型的解。为了进一步改进 SINN,通过假设测量中存在共同的频率成分,利用了多重测量。利用模拟声学数据对 SINN 进行了检验,并在检测/误报率和均方误差方面与傅立叶变换和稀疏贝叶斯学习(SBL)进行了性能比较。与 SBL 一样,SINN 通过有效降低噪声,在现场数据测试中表现出清晰的频率成分。最后,将 SINN 应用于包含 43 个频率成分的高噪声被动声纳信号,其中许多频率成分得到了恢复。
{"title":"System-Informed Neural Network for Frequency Detection","authors":"Sunyoung Ko;Myoungin Shin;Geunhwan Kim;Youngmin Choo","doi":"10.1109/LSP.2024.3483036","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483036","url":null,"abstract":"We contrive a deep learning-based frequency analysis scheme called system-informed neural network (SINN) by considering the corresponding linear system model. SINN adopts the adaptive learned iterative soft shrinkage algorithm as the NN architecture and includes the system model in loss function. It has good generalization with fast processing time and finds a solution that satisfies the system model as a physics-informed neural network. To further improve SINN, multiple measurements are exploited by assuming the existence of common frequency components over the measurements. SINN is examined using simulated acoustic data, and the performance is compared to Fourier transform and sparse Bayesian learning (SBL) in terms of the detection/false alarm rate and mean squared error. SINN exhibits clear frequency components in in-situ data tests, as in SBL, by reducing noise effectively. Finally, SINN is applied to noisy passive sonar signals, which include 43 frequency components, and many are recovered.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RFI-Aware and Low-Cost Maximum Likelihood Imaging for High-Sensitivity Radio Telescopes 用于高灵敏度射电望远镜的射频干扰感知和低成本最大似然成像技术
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483011
J. Wang;M. N. El Korso;L. Bacharach;P. Larzabal
This paper addresses the challenge of interference mitigation and reduction of computational cost in the context of radio interferometric imaging. We propose a novel maximum-likelihood-based methodology based on the antenna sub-array switching technique, which strikes a refined balance between imaging accuracy and computational efficiency. In addition, we tackle robustness regarding radio interference by modeling the additive noise as t-distributed. Through simulation results, we demonstrate the superiority of the t-distributed noise model over the conventional Gaussian noise model in scenarios involving interferences. We evidence that our proposed switching approach yields similar imaging performances with far fewer visibilities compared to the full array configuration, thus, diminishing the computational complexity.
本文旨在解决无线电干涉成像中的干扰缓解和计算成本降低问题。我们基于天线子阵列切换技术,提出了一种基于最大似然法的新方法,在成像精度和计算效率之间取得了完美的平衡。此外,我们将加性噪声建模为 t 分布,从而解决了无线电干扰的鲁棒性问题。通过模拟结果,我们证明了在涉及干扰的情况下,t 分布噪声模型优于传统的高斯噪声模型。我们证明,与全阵列配置相比,我们提出的切换方法以更少的可见度获得了类似的成像性能,从而降低了计算复杂度。
{"title":"RFI-Aware and Low-Cost Maximum Likelihood Imaging for High-Sensitivity Radio Telescopes","authors":"J. Wang;M. N. El Korso;L. Bacharach;P. Larzabal","doi":"10.1109/LSP.2024.3483011","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483011","url":null,"abstract":"This paper addresses the challenge of interference mitigation and reduction of computational cost in the context of radio interferometric imaging. We propose a novel maximum-likelihood-based methodology based on the antenna sub-array switching technique, which strikes a refined balance between imaging accuracy and computational efficiency. In addition, we tackle robustness regarding radio interference by modeling the additive noise as t-distributed. Through simulation results, we demonstrate the superiority of the t-distributed noise model over the conventional Gaussian noise model in scenarios involving interferences. We evidence that our proposed switching approach yields similar imaging performances with far fewer visibilities compared to the full array configuration, thus, diminishing the computational complexity.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Order Estimation of Linear-Phase FIR Filters for DAC Equalization in Multiple Nyquist Bands 用于多奈奎斯特频带 DAC 均衡的线性相位 FIR 滤波器的阶次估计
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-17 DOI: 10.1109/LSP.2024.3483008
Deijany Rodriguez Linares;Håkan Johansson;Yinan Wang
This letter considers the design and properties of linear-phase finite-length impulse response (FIR) filters for equalization of the frequency responses of digital-to-analog converters (DACs). The letter derives estimates for the filter orders required, as functions of the bandwidth and equalization accuracy, for four DAC pulses that are used in DACs in multiple Nyquist bands. The estimates are derived through a large set of minimax-optimal equalizers and the use of symbolic regression followed by minimax-optimal curve fitting for further enhancement. Design examples demonstrate the accuracy of the proposed estimates. In addition, the letter discusses the appropriateness of the four types of linear-phase FIR filters, for the different equalizer cases, as well as the corresponding properties of the equalized systems.
这封信探讨了用于均衡数模转换器(DAC)频率响应的线性相位有限长度脉冲响应(FIR)滤波器的设计和特性。信中推导了带宽和均衡精度函数下所需滤波器阶数的估算值,适用于多个奈奎斯特频带的 DAC 中使用的四种 DAC 脉冲。这些估算值是通过一大组最小最优均衡器以及使用符号回归和最小最优曲线拟合进一步增强后得出的。设计实例证明了所提出的估计值的准确性。此外,信中还讨论了四种线性相位 FIR 滤波器在不同均衡器情况下的适用性,以及均衡系统的相应特性。
{"title":"Order Estimation of Linear-Phase FIR Filters for DAC Equalization in Multiple Nyquist Bands","authors":"Deijany Rodriguez Linares;Håkan Johansson;Yinan Wang","doi":"10.1109/LSP.2024.3483008","DOIUrl":"https://doi.org/10.1109/LSP.2024.3483008","url":null,"abstract":"This letter considers the design and properties of linear-phase finite-length impulse response (FIR) filters for equalization of the frequency responses of digital-to-analog converters (DACs). The letter derives estimates for the filter orders required, as functions of the bandwidth and equalization accuracy, for four DAC pulses that are used in DACs in multiple Nyquist bands. The estimates are derived through a large set of minimax-optimal equalizers and the use of symbolic regression followed by minimax-optimal curve fitting for further enhancement. Design examples demonstrate the accuracy of the proposed estimates. In addition, the letter discusses the appropriateness of the four types of linear-phase FIR filters, for the different equalizer cases, as well as the corresponding properties of the equalized systems.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Noise Adapters for Incremental Speech Enhancement 学习噪声适配器以增强语音效果
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-16 DOI: 10.1109/LSP.2024.3482171
Ziye Yang;Xiang Song;Jie Chen;Cédric Richard;Israel Cohen
Incremental speech enhancement (ISE), with the ability to incrementally adapt to new noise domains, represents a critical yet comparatively under-investigated topic. While the regularization-based method has been proposed to solve the ISE task, it usually suffers from the dilemma wherein the gain of one domain directly entails the loss of another. To solve this issue, we propose an effective paradigm, termed Learning Noise Adapters (LNA), which significantly mitigates the catastrophic domain forgetting phenomenon in the ISE task. In our methodology, we employ a frozen pre-trained model to train and retain a domain-specific adapter for each newly encountered domain, enabling the capture of variations in feature distributions within these domains. Subsequently, our approach involves the development of an unsupervised, training-free noise selector for the inference stage, which is responsible for identifying the domains of test speech samples. A comprehensive experimental validation has substantiated the effectiveness of our approach.
增量语音增强(ISE)能够逐步适应新的噪声域,是一个重要但研究相对不足的课题。虽然已经提出了基于正则化的方法来解决 ISE 任务,但这种方法通常存在两难问题,即一个域的增益会直接导致另一个域的损失。为了解决这个问题,我们提出了一种有效的范式,即学习噪声适配器(LNA),它能显著减轻 ISE 任务中的灾难性域遗忘现象。在我们的方法中,我们采用一个冻结的预训练模型,为每个新遇到的领域训练和保留特定领域的适配器,从而捕捉这些领域内特征分布的变化。随后,我们在推理阶段开发了一种无监督、无训练的噪声选择器,负责识别测试语音样本的领域。全面的实验验证证明了我们方法的有效性。
{"title":"Learning Noise Adapters for Incremental Speech Enhancement","authors":"Ziye Yang;Xiang Song;Jie Chen;Cédric Richard;Israel Cohen","doi":"10.1109/LSP.2024.3482171","DOIUrl":"https://doi.org/10.1109/LSP.2024.3482171","url":null,"abstract":"Incremental speech enhancement (ISE), with the ability to incrementally adapt to new noise domains, represents a critical yet comparatively under-investigated topic. While the regularization-based method has been proposed to solve the ISE task, it usually suffers from the dilemma wherein the gain of one domain directly entails the loss of another. To solve this issue, we propose an effective paradigm, termed Learning Noise Adapters (LNA), which significantly mitigates the catastrophic domain forgetting phenomenon in the ISE task. In our methodology, we employ a frozen pre-trained model to train and retain a domain-specific adapter for each newly encountered domain, enabling the capture of variations in feature distributions within these domains. Subsequently, our approach involves the development of an unsupervised, training-free noise selector for the inference stage, which is responsible for identifying the domains of test speech samples. A comprehensive experimental validation has substantiated the effectiveness of our approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Entropy and Quantized Metric Models for Absolute Category Ratings 绝对类别评级的最大熵和量化度量模型
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-15 DOI: 10.1109/LSP.2024.3480832
Dietmar Saupe;Krzysztof Rusek;David Hägele;Daniel Weiskopf;Lucjan Janowski
The datasets of most image quality assessment studies contain ratings on a categorical scale with five levels, from bad (1) to excellent (5). For each stimulus, the number of ratings from 1 to 5 is summarized and given in the form of the mean opinion score. In this study, we investigate families of multinomial probability distributions parameterized by mean and variance that are used to fit the empirical rating distributions. To this end, we consider quantized metric models based on continuous distributions that model perceived stimulus quality on a latent scale. The probabilities for the rating categories are determined by quantizing the corresponding random variables using threshold values. Furthermore, we introduce a novel discrete maximum entropy distribution for a given mean and variance. We compare the performance of these models and the state of the art given by the generalized score distribution for two large data sets, KonIQ-10k and VQEG HDTV. Given an input distribution of ratings, our fitted two-parameter models predict unseen ratings better than the empirical distribution. In contrast to empirical distributions of absolute category ratings and their discrete models, our continuous models can provide fine-grained estimates of quantiles of quality of experience that are relevant to service providers to satisfy a certain fraction of the user population.
大多数图像质量评估研究的数据集都包含从差(1)到优(5)五个等级的分类评分。对于每个刺激,从 1 到 5 的评分数都会汇总,并以平均意见分的形式给出。在本研究中,我们研究了以均值和方差为参数的多项式概率分布族,这些概率分布用于拟合经验评分分布。为此,我们考虑了基于连续分布的量化度量模型,该模型在一个潜在尺度上对感知到的刺激质量进行建模。评分类别的概率通过使用阈值量化相应的随机变量来确定。此外,我们还引入了一种给定均值和方差的新型离散最大熵分布。我们比较了这些模型和广义评分分布在两个大型数据集(KonIQ-10k 和 VQEG HDTV)中的表现。在输入评分分布的情况下,我们的拟合双参数模型对未见评分的预测优于经验分布。与绝对类别收视率的经验分布及其离散模型相比,我们的连续模型可以对体验质量的定量进行精细估算,这与服务提供商满足一部分用户的需求息息相关。
{"title":"Maximum Entropy and Quantized Metric Models for Absolute Category Ratings","authors":"Dietmar Saupe;Krzysztof Rusek;David Hägele;Daniel Weiskopf;Lucjan Janowski","doi":"10.1109/LSP.2024.3480832","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480832","url":null,"abstract":"The datasets of most image quality assessment studies contain ratings on a categorical scale with five levels, from bad (1) to excellent (5). For each stimulus, the number of ratings from 1 to 5 is summarized and given in the form of the mean opinion score. In this study, we investigate families of multinomial probability distributions parameterized by mean and variance that are used to fit the empirical rating distributions. To this end, we consider quantized metric models based on continuous distributions that model perceived stimulus quality on a latent scale. The probabilities for the rating categories are determined by quantizing the corresponding random variables using threshold values. Furthermore, we introduce a novel discrete maximum entropy distribution for a given mean and variance. We compare the performance of these models and the state of the art given by the generalized score distribution for two large data sets, KonIQ-10k and VQEG HDTV. Given an input distribution of ratings, our fitted two-parameter models predict unseen ratings better than the empirical distribution. In contrast to empirical distributions of absolute category ratings and their discrete models, our continuous models can provide fine-grained estimates of quantiles of quality of experience that are relevant to service providers to satisfy a certain fraction of the user population.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pose-Promote: Progressive Visual Perception for Activities of Daily Living 姿势-促进:日常生活活动的渐进式视觉感知
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-14 DOI: 10.1109/LSP.2024.3480046
Qilang Ye;Zitong Yu
Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.
姿势能有效解释细微的人类活动,尤其是在遇到复杂的视觉信息时。由于缺乏更全面的视角,用于动作识别的单模态方法对日常活动的识别效果并不理想。结合姿势和视觉的多模态方法在挖掘互补信息方面仍不够详尽。因此,我们提出了姿势促进(Ppromo)框架,利用姿势关节的先验知识逐步感知视觉信息。我们首先引入了一个时间促进模块,利用时间同步的关节权重激活每个视频片段。然后,我们提出了一个空间促进模块,利用学习到的姿势注意力捕捉视觉中的关键区域。为了进一步完善双模态关联,我们提出了全局相互促进模块,以在特征粒度上调整全局姿势-视觉语义。最后,在视觉和姿势之间采用可学习的后期融合策略,以实现精确推理。Ppromo 在三个公开可用的数据集上实现了最先进的性能。
{"title":"Pose-Promote: Progressive Visual Perception for Activities of Daily Living","authors":"Qilang Ye;Zitong Yu","doi":"10.1109/LSP.2024.3480046","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480046","url":null,"abstract":"Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Multidimensional Spatial Attention for Robust Nighttime Visual Tracking 学习多维空间注意力,实现稳健的夜间视觉跟踪
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-14 DOI: 10.1109/LSP.2024.3480831
Qi Gao;Mingfeng Yin;Yuanzhi Ni;Yuming Bo;Shaoyi Bei
The recent development of advanced trackers, which use nighttime image enhancement technology, has led to marked advances in the performance of visual tracking at night. However, the images recovered by currently available enhancement methods still have some weaknesses, such as blurred target details and obvious image noise. To this end, we propose a novel method for learning multidimensional spatial attention for robust nighttime visual tracking, which is developed over a spatial channel transformer based low light enhancer (SCT), named MSA-SCT. First, a novel multidimensional spatial attention (MSA) is designed. Additional reliable feature responses are generated by aggregating channel and multi-scale spatial information, thus making the model more adaptable to illumination conditions and noise levels in different regions of the image. Second, with optimized skip connections, the effects of redundant information and noise can be limited, which is more useful for the propagation of fine detail features in nighttime images from low to high level features and improves the enhancement effect. Finally, the tracker with enhancers was tested on multiple tracking benchmarks to fully demonstrate the effectiveness and superiority of MSA-SCT.
近年来,利用夜间图像增强技术的先进跟踪器的开发,使夜间视觉跟踪的性能有了显著提高。然而,目前可用的增强方法所恢复的图像仍存在一些缺陷,如目标细节模糊、图像噪声明显等。为此,我们提出了一种学习多维空间注意力的新方法,用于实现稳健的夜间视觉跟踪,该方法是在基于空间通道变换器的微光增强器(SCT)上开发的,命名为 MSA-SCT。首先,设计了一种新型多维空间注意力(MSA)。通过聚合信道和多尺度空间信息,产生更多可靠的特征响应,从而使模型更能适应图像不同区域的光照条件和噪声水平。其次,通过优化跳转连接,可以限制冗余信息和噪声的影响,这更有利于夜间图像中精细细节特征从低级特征向高级特征的传播,并提高增强效果。最后,对带有增强器的跟踪器进行了多个跟踪基准测试,以充分展示 MSA-SCT 的有效性和优越性。
{"title":"Learning Multidimensional Spatial Attention for Robust Nighttime Visual Tracking","authors":"Qi Gao;Mingfeng Yin;Yuanzhi Ni;Yuming Bo;Shaoyi Bei","doi":"10.1109/LSP.2024.3480831","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480831","url":null,"abstract":"The recent development of advanced trackers, which use nighttime image enhancement technology, has led to marked advances in the performance of visual tracking at night. However, the images recovered by currently available enhancement methods still have some weaknesses, such as blurred target details and obvious image noise. To this end, we propose a novel method for learning multidimensional spatial attention for robust nighttime visual tracking, which is developed over a spatial channel transformer based low light enhancer (SCT), named MSA-SCT. First, a novel multidimensional spatial attention (MSA) is designed. Additional reliable feature responses are generated by aggregating channel and multi-scale spatial information, thus making the model more adaptable to illumination conditions and noise levels in different regions of the image. Second, with optimized skip connections, the effects of redundant information and noise can be limited, which is more useful for the propagation of fine detail features in nighttime images from low to high level features and improves the enhancement effect. Finally, the tracker with enhancers was tested on multiple tracking benchmarks to fully demonstrate the effectiveness and superiority of MSA-SCT.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Recurrent Spatio-Temporal Graph Neural Network Based on Latent Time Graph for Multi-Channel Time Series Forecasting 基于潜在时间图的循环时空图神经网络用于多通道时间序列预测
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-10-14 DOI: 10.1109/LSP.2024.3479917
Linzhi Li;Xiaofeng Zhou;Guoliang Hu;Shuai Li;Dongni Jia
With the advancement of technology, the field of multi-channel time series forecasting has emerged as a focal point of research. In this context, spatio-temporal graph neural networks have attracted significant interest due to their outstanding performance. An established approach involves integrating graph convolutional networks into recurrent neural networks. However, this approach faces difficulties in capturing dynamic spatial correlations and discerning the correlation of multi-channel time series signals. Another major problem is that the discrete time interval of recurrent neural networks limits the accuracy of spatio-temporal prediction. To address these challenges, we propose a continuous spatio-temporal framework, termed Recurrent Spatio-Temporal Graph Neural Network based on Latent Time Graph (RST-LTG). RST-LTG incorporates adaptive graph convolution networks with a time embedding generator to construct a latent time graph, which subtly captures evolving spatial characteristics by aggregating spatial information across multiple time steps. Additionally, to improve the accuracy of continuous time modeling, we introduce a gate enhanced neural ordinary differential equation that effectively integrates information across multiple scales. Empirical results on four publicly available datasets demonstrate that the RST-LTG model outperforms 19 competing methods in terms of accuracy.
随着技术的进步,多通道时间序列预测领域已成为研究的焦点。在此背景下,时空图神经网络因其出色的性能而备受关注。一种成熟的方法是将图卷积网络整合到递归神经网络中。然而,这种方法在捕捉动态空间相关性和辨别多通道时间序列信号的相关性方面面临困难。另一个主要问题是,递归神经网络的离散时间间隔限制了时空预测的准确性。为了应对这些挑战,我们提出了一种连续时空框架,即基于潜在时间图的递归时空图神经网络(RST-LTG)。RST-LTG 将自适应图卷积网络与时间嵌入生成器结合在一起,构建了一个潜在时间图,通过聚合多个时间步长的空间信息,巧妙地捕捉到不断变化的空间特征。此外,为了提高连续时间建模的准确性,我们引入了门增强神经常微分方程,有效地整合了多个尺度的信息。四个公开数据集的实证结果表明,RST-LTG 模型的准确性优于 19 种竞争方法。
{"title":"A Recurrent Spatio-Temporal Graph Neural Network Based on Latent Time Graph for Multi-Channel Time Series Forecasting","authors":"Linzhi Li;Xiaofeng Zhou;Guoliang Hu;Shuai Li;Dongni Jia","doi":"10.1109/LSP.2024.3479917","DOIUrl":"https://doi.org/10.1109/LSP.2024.3479917","url":null,"abstract":"With the advancement of technology, the field of multi-channel time series forecasting has emerged as a focal point of research. In this context, spatio-temporal graph neural networks have attracted significant interest due to their outstanding performance. An established approach involves integrating graph convolutional networks into recurrent neural networks. However, this approach faces difficulties in capturing dynamic spatial correlations and discerning the correlation of multi-channel time series signals. Another major problem is that the discrete time interval of recurrent neural networks limits the accuracy of spatio-temporal prediction. To address these challenges, we propose a continuous spatio-temporal framework, termed Recurrent Spatio-Temporal Graph Neural Network based on Latent Time Graph (RST-LTG). RST-LTG incorporates adaptive graph convolution networks with a time embedding generator to construct a latent time graph, which subtly captures evolving spatial characteristics by aggregating spatial information across multiple time steps. Additionally, to improve the accuracy of continuous time modeling, we introduce a gate enhanced neural ordinary differential equation that effectively integrates information across multiple scales. Empirical results on four publicly available datasets demonstrate that the RST-LTG model outperforms 19 competing methods in terms of accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142452677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1