首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
P-TAME: Explain Any Image Classifier With Trained Perturbations 解释任何带有训练扰动的图像分类器
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-09 DOI: 10.1109/OJSP.2025.3568756
Mariano V. Ntrougkas;Vasileios Mezaris;Ioannis Patras
The adoption of Deep Neural Networks (DNNs) in critical fields where predictions need to be accompanied by justifications is hindered by their inherent black-box nature. This paper introduces P-TAME (Perturbation-based Trainable Attention Mechanism for Explanations), a model-agnostic method for explaining DNN-based image classifiers. P-TAME employs an auxiliary image classifier to extract features from the input image, bypassing the need to tailor the explanation method to the internal architecture of the backbone classifier being explained. Unlike traditional perturbation-based methods, which have high computational requirements, P-TAME offers an efficient alternative by generating high-resolution explanations in a single forward pass during inference. We apply P-TAME to explain the decisions of VGG-16, ResNet-50, and ViT-B-16, three distinct and widely used image classifiers. Quantitative and qualitative results show that P-TAME matches or outperforms previous explainability methods, including model-specific ones.
深度神经网络(dnn)在预测需要证明的关键领域的采用受到其固有黑箱性质的阻碍。本文介绍了基于微扰的可训练注意解释机制(P-TAME),这是一种用于解释基于dnn的图像分类器的模型不可知方法。P-TAME使用一个辅助图像分类器从输入图像中提取特征,而不需要根据被解释的主分类器的内部架构定制解释方法。与传统的基于微扰的方法不同,P-TAME提供了一种高效的替代方案,通过在推理过程中的单个前向传递中生成高分辨率的解释。我们应用P-TAME来解释VGG-16、ResNet-50和vitb -16这三种不同且广泛使用的图像分类器的决策。定量和定性结果表明,P-TAME匹配或优于以前的可解释性方法,包括特定于模型的方法。
{"title":"P-TAME: Explain Any Image Classifier With Trained Perturbations","authors":"Mariano V. Ntrougkas;Vasileios Mezaris;Ioannis Patras","doi":"10.1109/OJSP.2025.3568756","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3568756","url":null,"abstract":"The adoption of Deep Neural Networks (DNNs) in critical fields where predictions need to be accompanied by justifications is hindered by their inherent black-box nature. This paper introduces P-TAME (Perturbation-based Trainable Attention Mechanism for Explanations), a model-agnostic method for explaining DNN-based image classifiers. P-TAME employs an auxiliary image classifier to extract features from the input image, bypassing the need to tailor the explanation method to the internal architecture of the backbone classifier being explained. Unlike traditional perturbation-based methods, which have high computational requirements, P-TAME offers an efficient alternative by generating high-resolution explanations in a single forward pass during inference. We apply P-TAME to explain the decisions of VGG-16, ResNet-50, and ViT-B-16, three distinct and widely used image classifiers. Quantitative and qualitative results show that P-TAME matches or outperforms previous explainability methods, including model-specific ones.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"536-545"},"PeriodicalIF":2.9,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10994422","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144171031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized Metaplectic Convolution-Based Cohen's Class Time-Frequency Distribution: Theory and Application 基于广义广义卷积的Cohen类时频分布:理论与应用
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-02-25 DOI: 10.1109/OJSP.2025.3545337
Manjun Cui;Zhichao Zhang;Jie Han;Yunjie Chen;Chunzheng Cao
The convolution type of the Cohen's class time-frequency distribution (CCTFD) is a useful and effective time-frequency analysis tool for additive noises jamming signals. However, it can't meet the requirement of high-performance denoising under low signal-to-noise ratio conditions. In this paper, we define the generalized metaplectic convolution-based Cohen's class time-frequency distribution (GMC-CCTFD) by replacing the traditional convolution operator in CCTFD with the generalized convolution operator of metaplectic transform (MT). This new definition leverages the high degrees of freedom and flexibility of MT, improving performance in non-stationary signal analysis. We then establish a fundamental theory about the GMC-CCTFD's essential properties. By integrating the Wiener filter principle with the time-frequency filtering mechanism of GMC-CCTFD, we design a least-squares adaptive filter in the Wigner distribution-MT domain. This allows us to achieve adaptive filtering denoising based on GMC-CCTFD, giving birth to the least-squares adaptive filter-based GMC-CCTFD. Furthermore, we conduct several examples and apply the proposed filtering method to real-world datasets, demonstrating its superior performance in noise suppression compared to some state-of-the-art methods.
科恩类时频分布(CCTFD)的卷积型是分析加性噪声干扰信号的有效时频分析工具。然而,在低信噪比条件下,它不能满足高性能去噪的要求。本文通过用广义广义卷积变换的广义卷积算子(MT)代替广义广义卷积卷积中的传统卷积算子,定义了基于广义广义卷积的Cohen类时频分布(GMC-CCTFD)。这个新定义利用了MT的高度自由度和灵活性,提高了非平稳信号分析的性能。然后,我们建立了一个关于GMC-CCTFD基本性质的基本理论。将Wiener滤波原理与GMC-CCTFD的时频滤波机制相结合,设计了Wigner分布- mt域的最小二乘自适应滤波器。这使我们能够实现基于GMC-CCTFD的自适应滤波去噪,从而产生了基于最小二乘自适应滤波器的GMC-CCTFD。此外,我们进行了几个例子,并将所提出的滤波方法应用于实际数据集,与一些最先进的方法相比,证明了其在噪声抑制方面的优越性能。
{"title":"Generalized Metaplectic Convolution-Based Cohen's Class Time-Frequency Distribution: Theory and Application","authors":"Manjun Cui;Zhichao Zhang;Jie Han;Yunjie Chen;Chunzheng Cao","doi":"10.1109/OJSP.2025.3545337","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3545337","url":null,"abstract":"The convolution type of the Cohen's class time-frequency distribution (CCTFD) is a useful and effective time-frequency analysis tool for additive noises jamming signals. However, it can't meet the requirement of high-performance denoising under low signal-to-noise ratio conditions. In this paper, we define the generalized metaplectic convolution-based Cohen's class time-frequency distribution (GMC-CCTFD) by replacing the traditional convolution operator in CCTFD with the generalized convolution operator of metaplectic transform (MT). This new definition leverages the high degrees of freedom and flexibility of MT, improving performance in non-stationary signal analysis. We then establish a fundamental theory about the GMC-CCTFD's essential properties. By integrating the Wiener filter principle with the time-frequency filtering mechanism of GMC-CCTFD, we design a least-squares adaptive filter in the Wigner distribution-MT domain. This allows us to achieve adaptive filtering denoising based on GMC-CCTFD, giving birth to the least-squares adaptive filter-based GMC-CCTFD. Furthermore, we conduct several examples and apply the proposed filtering method to real-world datasets, demonstrating its superior performance in noise suppression compared to some state-of-the-art methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"348-368"},"PeriodicalIF":2.9,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10902015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143628714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Angularly Consistent 4D Light Field Segmentation Using Hyperpixels and a Graph Neural Network 使用超像素和图神经网络的无监督角一致四维光场分割
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-02-25 DOI: 10.1109/OJSP.2025.3545356
Maryam Hamad;Caroline Conti;Paulo Nunes;Luís Ducla Soares
Image segmentation is an essential initial stage in several computer vision applications. However, unsupervised image segmentation is still a challenging task in some cases such as when objects with a similar visual appearance overlap. Unlike 2D images, 4D Light Fields (LFs) convey both spatial and angular scene information facilitating depth/disparity estimation, which can be further used to guide the segmentation. Existing 4D LF segmentation methods that target object level (i.e., mid-level and high-level) segmentation are typically semi-supervised or supervised with ground truth labels and mostly support only densely sampled 4D LFs. This paper proposes a novel unsupervised mid-level 4D LF Segmentation method using Graph Neural Networks (LFSGNN), which segments all LF views consistently. To achieve that, the 4D LF is represented as a hypergraph, whose hypernodes are obtained based on hyperpixel over-segmentation. Then, a graph neural network is used to extract deep features from the LF and assign segmentation labels to all hypernodes. Afterwards, the network parameters are updated iteratively to achieve better object separation using backpropagation. The proposed segmentation method supports both densely and sparsely sampled 4D LFs. Experimental results on synthetic and real 4D LF datasets show that the proposed method outperforms benchmark methods both in terms of segmentation spatial accuracy and angular consistency.
在一些计算机视觉应用中,图像分割是必不可少的初始阶段。然而,在某些情况下,如视觉外观相似的物体重叠时,无监督图像分割仍是一项具有挑战性的任务。与二维图像不同,4D 光场(LF)同时传递空间和角度场景信息,有利于深度/差异估计,可进一步用于指导分割。现有的 4D 光场分割方法以物体层(即中层和高层)分割为目标,通常采用地面实况标签进行半监督或监督,而且大多只支持密集采样的 4D 光场。本文提出了一种使用图神经网络(LFSGNN)的新型无监督中层 4D LF 分割方法,它能对所有 LF 视图进行一致的分割。为此,4D LF 被表示为一个超图,其超节点是根据超像素过度分割得到的。然后,使用图神经网络从 LF 中提取深度特征,并为所有超节点分配分割标签。之后,利用反向传播迭代更新网络参数,以实现更好的对象分离。所提出的分割方法同时支持高密度和稀疏采样的 4D LF。在合成和真实 4D LF 数据集上的实验结果表明,所提出的方法在分割空间精度和角度一致性方面都优于基准方法。
{"title":"Unsupervised Angularly Consistent 4D Light Field Segmentation Using Hyperpixels and a Graph Neural Network","authors":"Maryam Hamad;Caroline Conti;Paulo Nunes;Luís Ducla Soares","doi":"10.1109/OJSP.2025.3545356","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3545356","url":null,"abstract":"Image segmentation is an essential initial stage in several computer vision applications. However, unsupervised image segmentation is still a challenging task in some cases such as when objects with a similar visual appearance overlap. Unlike 2D images, 4D Light Fields (LFs) convey both spatial and angular scene information facilitating depth/disparity estimation, which can be further used to guide the segmentation. Existing 4D LF segmentation methods that target object level (i.e., mid-level and high-level) segmentation are typically semi-supervised or supervised with ground truth labels and mostly support only densely sampled 4D LFs. This paper proposes a novel unsupervised mid-level 4D LF Segmentation method using Graph Neural Networks (LFSGNN), which segments all LF views consistently. To achieve that, the 4D LF is represented as a hypergraph, whose hypernodes are obtained based on hyperpixel over-segmentation. Then, a graph neural network is used to extract deep features from the LF and assign segmentation labels to all hypernodes. Afterwards, the network parameters are updated iteratively to achieve better object separation using backpropagation. The proposed segmentation method supports both densely and sparsely sampled 4D LFs. Experimental results on synthetic and real 4D LF datasets show that the proposed method outperforms benchmark methods both in terms of segmentation spatial accuracy and angular consistency.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"333-347"},"PeriodicalIF":2.9,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10902629","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Stationary Delayed Combinatorial Semi-Bandit With Causally Related Rewards 具有因果相关奖励的非平稳延迟组合半强盗
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-02-24 DOI: 10.1109/OJSP.2025.3545379
Saeed Ghoorchian;Steven Bilaj;Setareh Maghsudi
Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms' rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making while adapting to drifts. We prove a regret bound for the performance of the proposed algorithm. Besides, we evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.
不确定性条件下的顺序决策往往与较长的反馈延迟有关。这种延迟会降低学习代理的性能,使其无法识别出具有长期最优集体奖励的武器子集。在非稳态环境中,与武器相关的奖励分布之间存在结构依赖关系,因此这一问题变得极具挑战性。因此,除了适应延迟和环境变化外,学习因果关系还能减轻反馈延迟对决策过程的不利影响。我们将所述环境形式化为一个具有因果关系奖励的非稳态延迟组合半比特问题。我们通过静态结构方程模型中的有向图来模拟因果关系。代理最大化长期平均报酬,该报酬被定义为基臂报酬的线性函数。我们开发了一种策略,可以从延迟反馈中学习结构依赖性,并利用它来优化决策,同时适应漂移。我们证明了所提算法性能的遗憾约束。此外,我们还通过使用合成数据集和真实数据集进行数值分析来评估我们的方法,从而检测出哪些地区对 Covid-19 在意大利的传播贡献最大。
{"title":"Non-Stationary Delayed Combinatorial Semi-Bandit With Causally Related Rewards","authors":"Saeed Ghoorchian;Steven Bilaj;Setareh Maghsudi","doi":"10.1109/OJSP.2025.3545379","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3545379","url":null,"abstract":"Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms' rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making while adapting to drifts. We prove a regret bound for the performance of the proposed algorithm. Besides, we evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"369-384"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10902019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143627851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending Guided Filters Through Effective Utilization of Multi-Channel Guide Images Based on Singular Value Decomposition 基于奇异值分解有效利用多通道引导图像扩展引导滤波器
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-02-24 DOI: 10.1109/OJSP.2025.3545304
Kazu Mishiba
This paper proposes the SVD-based Guided Filter, designed to address key limitations of the original guided filter and its improved methods, providing better use of multi-channel guide images. First, we analyzed the guided filter framework, reinterpreting it from a patch-based perspective using singular value decomposition (SVD). This revealed that the original guided filter suppresses oscillatory components based on their eigenvalues. Building on this insight, we proposed a new filtering method that selectively suppresses or enhances these components through functions that respond to their eigenvalues. The proposed SVD-based Guided Filter offers improved control over edge preservation and noise reduction compared to the original guided filter and its improved methods, which often struggle to balance these tasks. We validated the proposed method across various image processing applications, including denoising, edge-preserving smoothing, detail enhancement, and edge-enhancing smoothing. The results demonstrated that the SVD-based Guided Filter consistently outperforms the original guided filter and its improved methods by making more effective use of color guide images. While the computational cost is slightly higher than the original guided filter, the method remains efficient and highly effective. Overall, the proposed SVD-based Guided Filter delivers notable improvements, offering a solid foundation for further advancements in guided filtering techniques.
本文提出了基于奇异值分解的引导滤波器,旨在解决原有引导滤波器及其改进方法的主要局限性,从而更好地利用多通道引导图像。首先,我们分析了引导滤波器框架,使用奇异值分解(SVD)从基于patch的角度对其进行了重新解释。这表明,原来的引导滤波器抑制振荡分量是基于它们的特征值。基于这一见解,我们提出了一种新的过滤方法,通过响应其特征值的函数选择性地抑制或增强这些成分。与原始的引导滤波器及其改进方法相比,基于奇异值分解的引导滤波器在边缘保持和降噪方面提供了更好的控制,而原始的引导滤波器通常难以平衡这些任务。我们在各种图像处理应用中验证了该方法,包括去噪、边缘保持平滑、细节增强和边缘增强平滑。结果表明,基于奇异值分解的引导滤波器通过更有效地利用彩色引导图像,始终优于原始的引导滤波器及其改进方法。虽然计算成本略高于原始的引导滤波器,但该方法仍然是高效的。总体而言,本文提出的基于奇异值分解的引导滤波器有了显著的改进,为引导滤波技术的进一步发展奠定了坚实的基础。
{"title":"Extending Guided Filters Through Effective Utilization of Multi-Channel Guide Images Based on Singular Value Decomposition","authors":"Kazu Mishiba","doi":"10.1109/OJSP.2025.3545304","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3545304","url":null,"abstract":"This paper proposes the SVD-based Guided Filter, designed to address key limitations of the original guided filter and its improved methods, providing better use of multi-channel guide images. First, we analyzed the guided filter framework, reinterpreting it from a patch-based perspective using singular value decomposition (SVD). This revealed that the original guided filter suppresses oscillatory components based on their eigenvalues. Building on this insight, we proposed a new filtering method that selectively suppresses or enhances these components through functions that respond to their eigenvalues. The proposed SVD-based Guided Filter offers improved control over edge preservation and noise reduction compared to the original guided filter and its improved methods, which often struggle to balance these tasks. We validated the proposed method across various image processing applications, including denoising, edge-preserving smoothing, detail enhancement, and edge-enhancing smoothing. The results demonstrated that the SVD-based Guided Filter consistently outperforms the original guided filter and its improved methods by making more effective use of color guide images. While the computational cost is slightly higher than the original guided filter, the method remains efficient and highly effective. Overall, the proposed SVD-based Guided Filter delivers notable improvements, offering a solid foundation for further advancements in guided filtering techniques.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"385-397"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10902178","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VITMST++: Efficient Hyperspectral Reconstruction Through Vision Transformer-Based Spatial Compression 基于视觉变换空间压缩的高效高光谱重建
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-02-24 DOI: 10.1109/OJSP.2025.3544891
Ana C. Caznok Silveira;Diedre S. do Carmo;Lucas H. Ueda;Denis G. Fantinato;Paula D. P. Costa;Leticia Rittner
Hyperspectralchannel reconstruction transforms a subsampled multispectral image into hyperspectral imaging, providing higher spectral resolution without a dedicated acquisition hardware and camera. Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (MST++) is a state-of-the-art channel reconstruction technique, but it faces memory limitations for high spatial-resolution images. In this context, we introduced VITMST++, a novel architecture incorporating Vision Transformer embeddings for spatial compression, multi-resolution image context, and a custom channel-weighted loss. Developed for the ICASSP 2024 HyperSkin Challenge, VITMST++ outperforms the state-of-the-art MST++ in both performance and computational efficiency in channel reconstruction. In this work, we perform a deeper analysis on the main aspects of VITMST++ efficiency, quantitative performance, and generalization to other datasets. Results show that VITMST++ achieves similar values of SAM and SSIM hyperspectral reconstruction metrics when compared to state-of-the-art methods, while consuming up to three fold less memory and needing up to 10 times fewer multiply-add operations.
高光谱通道重建将次采样的多光谱图像转换为高光谱成像,在没有专用采集硬件和相机的情况下提供更高的光谱分辨率。mst++是一种先进的信道重建技术,但它在处理高空间分辨率图像时面临内存限制。在此背景下,我们介绍了vitmst++,这是一种结合视觉转换器嵌入的新架构,用于空间压缩、多分辨率图像上下文和自定义信道加权损失。为ICASSP 2024超级皮肤挑战赛而开发的vitmst++在信道重建的性能和计算效率方面都优于最先进的mst++。在这项工作中,我们对vitmst++的效率、定量性能和推广到其他数据集的主要方面进行了更深入的分析。结果表明,与最先进的方法相比,vitmst++实现了相似的SAM和SSIM高光谱重建指标,同时消耗的内存减少了3倍,需要的乘法加运算减少了10倍。
{"title":"VITMST++: Efficient Hyperspectral Reconstruction Through Vision Transformer-Based Spatial Compression","authors":"Ana C. Caznok Silveira;Diedre S. do Carmo;Lucas H. Ueda;Denis G. Fantinato;Paula D. P. Costa;Leticia Rittner","doi":"10.1109/OJSP.2025.3544891","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3544891","url":null,"abstract":"Hyperspectralchannel reconstruction transforms a subsampled multispectral image into hyperspectral imaging, providing higher spectral resolution without a dedicated acquisition hardware and camera. Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (MST++) is a state-of-the-art channel reconstruction technique, but it faces memory limitations for high spatial-resolution images. In this context, we introduced VITMST++, a novel architecture incorporating Vision Transformer embeddings for spatial compression, multi-resolution image context, and a custom channel-weighted loss. Developed for the ICASSP 2024 HyperSkin Challenge, VITMST++ outperforms the state-of-the-art MST++ in both performance and computational efficiency in channel reconstruction. In this work, we perform a deeper analysis on the main aspects of VITMST++ efficiency, quantitative performance, and generalization to other datasets. Results show that VITMST++ achieves similar values of SAM and SSIM hyperspectral reconstruction metrics when compared to state-of-the-art methods, while consuming up to three fold less memory and needing up to 10 times fewer multiply-add operations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"398-404"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10900394","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Task Nuisance Filtration for Unsupervised Domain Adaptation 无监督域自适应的任务干扰过滤
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-30 DOI: 10.1109/OJSP.2025.3536850
David Uliel;Raja Giryes
In unsupervised domain adaptation (UDA) labeled data is available for one domain (Source Domain) which is generated according to some distribution, and unlabeled data is available for a second domain (Target Domain) which is generated from a possibly different distribution but has the same task. The goal is to learn a model that performs well on the target domain although labels are available only for the source data. Many recent works attempt to align the source and the target domains by matching their marginal distributions in a learned feature space. In this paper, we address the domain difference as a nuisance, and enables better adaptability of the domains, by encouraging minimality of the target domain representation, disentanglement of the features, and a smoother feature space that cluster better the target data. To this end, we use the information bottleneck theory and a classical technique from the blind source separation framework, namely, ICA (independent components analysis). We show that these concepts can improve performance of leading domain adaptation methods on various domain adaptation benchmarks.
在无监督域自适应(UDA)中,根据某种分布生成的一个域(源域)可以使用标记数据,而从可能不同的分布生成的另一个域(目标域)可以使用未标记数据。目标是学习一个在目标领域上表现良好的模型,尽管标签仅对源数据可用。最近的许多工作试图通过在学习的特征空间中匹配源域和目标域的边缘分布来对齐源域和目标域。在本文中,我们将领域差异视为一种麻烦,并通过鼓励目标领域表示的最小化,特征的解纠缠以及更平滑的特征空间来更好地聚类目标数据,从而实现更好的领域适应性。为此,我们使用了信息瓶颈理论和盲源分离框架中的经典技术,即ICA(独立分量分析)。我们证明了这些概念可以提高领先的领域自适应方法在各种领域自适应基准上的性能。
{"title":"Task Nuisance Filtration for Unsupervised Domain Adaptation","authors":"David Uliel;Raja Giryes","doi":"10.1109/OJSP.2025.3536850","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3536850","url":null,"abstract":"In unsupervised domain adaptation (UDA) labeled data is available for one domain (Source Domain) which is generated according to some distribution, and unlabeled data is available for a second domain (Target Domain) which is generated from a possibly different distribution but has the same task. The goal is to learn a model that performs well on the target domain although labels are available only for the source data. Many recent works attempt to align the source and the target domains by matching their marginal distributions in a learned feature space. In this paper, we address the domain difference as a nuisance, and enables better adaptability of the domains, by encouraging minimality of the target domain representation, disentanglement of the features, and a smoother feature space that cluster better the target data. To this end, we use the information bottleneck theory and a classical technique from the blind source separation framework, namely, ICA (independent components analysis). We show that these concepts can improve performance of leading domain adaptation methods on various domain adaptation benchmarks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"303-311"},"PeriodicalIF":2.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers 稀疏混合专家视觉变压器对输入扰动的鲁棒增强
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-30 DOI: 10.1109/OJSP.2025.3536853
Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato
Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.
基于稀疏专家选择规则的混合专家算法由于其不影响推理时间的可扩展性而受到了广泛的关注。然而,与标准神经网络不同,稀疏混合专家模型在输出空间中固有地表现出不连续,这可能会阻碍对输入扰动的适当不变性的获取,从而导致模型在分类等任务中的性能下降。为了解决这个问题,我们提出了配对路由器一致性(Pairwise Router Consistency, PRC),它可以有效地惩罚输入图像自然变形下出现的不连续。使用监督损失,与基线方法相比,使用PRC损失经验地提高了imagenet - 1k、CIFAR-10和CIFAR-100数据集的分类精度。值得注意的是,我们使用1位专家选择的方法略微优于使用2位专家选择的基线方法。我们还证实,用我们的方法训练的模型在输入扰动下经历不连续变化的频率较低。
{"title":"Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers","authors":"Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato","doi":"10.1109/OJSP.2025.3536853","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3536853","url":null,"abstract":"Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"276-283"},"PeriodicalIF":2.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858379","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers 生成音乐变形器的自我监控推理时间干预
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-28 DOI: 10.1109/OJSP.2025.3534686
Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux
We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention head in the transformer using a small dataset of audio examples both exhibiting and missing a specific musical trait (e.g., the presence/absence of drums, or real/synthetic music). We then steer the attention heads in the probe direction, ensuring the generative model output captures the desired musical trait. Additionally, we monitor the probe output to avoid adding an excessive amount of intervention into the autoregressive generation, which could lead to temporally incoherent music. We validate our results objectively and subjectively for both audio continuation and text-to-music applications, demonstrating the ability to add controls to large generative models for which retraining or even fine-tuning is impractical for most musicians. Audio samples of the proposed intervention approach are available on our demo page.
我们介绍了自我监控推理时间干预(SMITIN),一种使用分类器探针控制自回归生成音乐转换器的方法。这些简单的逻辑回归探针使用展示和缺少特定音乐特征(例如,鼓的存在/缺失,或真实/合成音乐)的音频示例的小数据集在变压器中的每个注意力头部的输出上进行训练。然后,我们将注意力转向探针方向,确保生成模型输出捕获所需的音乐特征。此外,我们监控探头输出,以避免在自回归生成中添加过多的干预,这可能导致暂时不连贯的音乐。我们客观和主观地验证了音频延续和文本到音乐应用程序的结果,展示了将控制添加到大型生成模型的能力,对于大多数音乐家来说,重新训练甚至微调都是不切实际的。建议的干预方法的音频样本可以在我们的演示页面上找到。
{"title":"SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers","authors":"Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3534686","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534686","url":null,"abstract":"We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention head in the transformer using a small dataset of audio examples both exhibiting and missing a specific musical trait (e.g., the presence/absence of drums, or real/synthetic music). We then steer the attention heads in the probe direction, ensuring the generative model output captures the desired musical trait. Additionally, we monitor the probe output to avoid adding an excessive amount of intervention into the autoregressive generation, which could lead to temporally incoherent music. We validate our results objectively and subjectively for both audio continuation and text-to-music applications, demonstrating the ability to add controls to large generative models for which retraining or even fine-tuning is impractical for most musicians. Audio samples of the proposed intervention approach are available on our <underline>demo page</u>.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"266-275"},"PeriodicalIF":2.9,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10856829","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auditory EEG Decoding Challenge for ICASSP 2024 ICASSP 2024听觉脑电图解码挑战
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-27 DOI: 10.1109/OJSP.2025.3534122
Lies Bollens;Corentin Puffay;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart
This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2024. The challenge provides electroencephalogram (EEG) recordings of 105 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. The challenge consists of two tasks that relate EEG signals to the presented speech stimulus. The first task, called match-mismatch, is to determine which of five speech segments induced a given EEG segment. The second task, called regression, is to reconstruct the Mel spectrogram from the EEG. EEG recordings of 85 subjects were provided as a training set so that challenge participants could train their models on a relatively large dataset. The remaining 20 subjects were used as held-out subjects for the evaluation step of the challenge.
本文描述了听觉脑电图挑战,作为ICASSP 2024年信号处理大挑战之一组织。这项挑战提供了105名受试者的脑电图(EEG)记录,这些受试者连续听有声读物或播客,同时记录他们的大脑活动。挑战包括两个任务,将脑电图信号与呈现的语音刺激联系起来。第一个任务,称为匹配不匹配,是确定五个语音片段中哪一个诱发了给定的EEG片段。第二个任务,称为回归,是从脑电图中重建Mel谱图。85名受试者的脑电图记录作为训练集提供,以便挑战参与者可以在相对较大的数据集上训练他们的模型。剩下的20名被试被用作测试的评估步骤。
{"title":"Auditory EEG Decoding Challenge for ICASSP 2024","authors":"Lies Bollens;Corentin Puffay;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart","doi":"10.1109/OJSP.2025.3534122","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534122","url":null,"abstract":"This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2024. The challenge provides electroencephalogram (EEG) recordings of 105 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. The challenge consists of two tasks that relate EEG signals to the presented speech stimulus. The first task, called match-mismatch, is to determine which of five speech segments induced a given EEG segment. The second task, called regression, is to reconstruct the Mel spectrogram from the EEG. EEG recordings of 85 subjects were provided as a training set so that challenge participants could train their models on a relatively large dataset. The remaining 20 subjects were used as held-out subjects for the evaluation step of the challenge.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"478-488"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1