首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target Dataset 利用未见目标数据集为单源跨数据集掌纹识别生成风格化特征
Huikai Shao;Pengxu Li;Dexing Zhong
As a promising topic in palmprint recognition, cross-dataset palmprint recognition is attracting more and more research interests. In this paper, a more difficult yet realistic scenario is studied, i.e., Single-Source Cross-Dataset Palmprint Recognition with Unseen Target dataset (S2CDPR-UT). It is aimed to generalize a palmprint feature extractor trained only on a single source dataset to multiple unseen target datasets collected by different devices or environments. To combat this challenge, we propose a novel method to improve the generalization of feature extractor for S2CDPR-UT, named Generating stylIzed FeaTures (GIFT). Firstly, the raw features are decoupled into high- and low- frequency components. Then, a feature stylization module is constructed to perturb the mean and variance of low-frequency components to generate more stylized features, which can provided more valuable knowledge. Furthermore, two diversity enhancement and consistency preservation supervisions are introduced at feature level to help to learn the model. The former is aimed to enhance the diversity of stylized features to expand the feature space. Meanwhile, the later is aimed to maintain the semantic consistency to ensure accurate palmprint recognition. Extensive experiments carried out on CASIA Multi-Spectral, XJTU-UP, and MPD palmprint databases show that our GIFT method can achieve significant improvement of performance over other methods. The codes will be released at https://github.com/HuikaiShao/GIFT.
跨数据集掌纹识别作为掌纹识别领域的一个有前途的课题,正吸引着越来越多的研究兴趣。本文研究了一个难度更大但更现实的场景,即带有未见目标数据集的单源跨数据集掌纹识别(S2CDPR-UT)。其目的是将仅在单一来源数据集上训练的掌纹特征提取器推广到由不同设备或环境收集的多个未见目标数据集上。为了应对这一挑战,我们提出了一种新方法来提高 S2CDPR-UT 特征提取器的泛化能力,该方法被命名为生成风格化特征(GIFT)。首先,将原始特征解耦为高频和低频成分。然后,构建一个特征风格化模块,对低频成分的均值和方差进行扰动,生成更多风格化特征,从而提供更有价值的知识。此外,在特征层面还引入了多样性增强和一致性保持两个监督机制来帮助学习模型。前者旨在增强风格化特征的多样性,以扩展特征空间。同时,后者旨在保持语义一致性,以确保掌纹识别的准确性。在 CASIA 多光谱、XJTU-UP 和 MPD 掌纹数据库中进行的大量实验表明,与其他方法相比,我们的 GIFT 方法可以显著提高性能。代码将在 https://github.com/HuikaiShao/GIFT 上发布。
{"title":"Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target Dataset","authors":"Huikai Shao;Pengxu Li;Dexing Zhong","doi":"10.1109/TIP.2024.3451933","DOIUrl":"10.1109/TIP.2024.3451933","url":null,"abstract":"As a promising topic in palmprint recognition, cross-dataset palmprint recognition is attracting more and more research interests. In this paper, a more difficult yet realistic scenario is studied, i.e., Single-Source Cross-Dataset Palmprint Recognition with Unseen Target dataset (S2CDPR-UT). It is aimed to generalize a palmprint feature extractor trained only on a single source dataset to multiple unseen target datasets collected by different devices or environments. To combat this challenge, we propose a novel method to improve the generalization of feature extractor for S2CDPR-UT, named Generating stylIzed FeaTures (GIFT). Firstly, the raw features are decoupled into high- and low- frequency components. Then, a feature stylization module is constructed to perturb the mean and variance of low-frequency components to generate more stylized features, which can provided more valuable knowledge. Furthermore, two diversity enhancement and consistency preservation supervisions are introduced at feature level to help to learn the model. The former is aimed to enhance the diversity of stylized features to expand the feature space. Meanwhile, the later is aimed to maintain the semantic consistency to ensure accurate palmprint recognition. Extensive experiments carried out on CASIA Multi-Spectral, XJTU-UP, and MPD palmprint databases show that our GIFT method can achieve significant improvement of performance over other methods. The codes will be released at \u0000<uri>https://github.com/HuikaiShao/GIFT</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Self-Training for Semi-Supervised Landmark Detection: A Selection-Free Approach 反思半监督地标检测的自我训练:无选择方法
Haibo Jin;Haoxuan Che;Hao Chen
Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection faces three problems: 1) The selected confident pseudo-labels often contain data bias, which may hurt model performance; 2) It is not easy to decide a proper threshold for sample selection as the localization task can be sensitive to noisy pseudo-labels; 3) coordinate regression does not output confidence, making selection-based self-training infeasible. To address the above issues, we propose Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection. Instead, STLD constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training. Pseudo pretraining and shrink regression are two essential components for such a curriculum, where the former is the first task of the curriculum for providing a better model initialization and the latter is further added in the later rounds to directly leverage the pseudo-labels in a coarse-to-fine manner. Experiments on three facial and one medical landmark detection benchmark show that STLD outperforms the existing methods consistently in both semi- and omni-supervised settings. The code is available at https://github.com/jhb86253817/STLD.
自我训练是一种简单而有效的半监督学习方法,其中伪标签选择在处理确认偏差方面发挥着重要作用。尽管这种方法很受欢迎,但将自我训练应用于地标检测却面临三个问题:1)选择的置信伪标签往往包含数据偏差,这可能会损害模型性能;2)由于定位任务可能对噪声伪标签很敏感,因此决定一个合适的样本选择阈值并不容易;3)坐标回归不输出置信度,这使得基于选择的自我训练变得不可行。为了解决上述问题,我们提出了地标检测自我训练(STLD)方法,这种方法不需要明确的伪标签选择。取而代之的是,STLD 构建了一个处理确认偏差的任务课程,在一轮又一轮的自我训练中,从信心较高的任务逐步过渡到信心较低的任务。伪预训练和收缩回归是这种课程的两个基本组成部分,前者是课程的第一个任务,用于提供更好的模型初始化,后者则在后几轮任务中进一步添加,以从粗到细的方式直接利用伪标签。在三个面部检测基准和一个医疗地标检测基准上的实验表明,STLD 在半监督和全监督环境下的表现始终优于现有方法。代码见 https://github.com/jhb86253817/STLD。
{"title":"Rethinking Self-Training for Semi-Supervised Landmark Detection: A Selection-Free Approach","authors":"Haibo Jin;Haoxuan Che;Hao Chen","doi":"10.1109/TIP.2024.3451937","DOIUrl":"10.1109/TIP.2024.3451937","url":null,"abstract":"Self-training is a simple yet effective method for semi-supervised learning, during which pseudo-label selection plays an important role for handling confirmation bias. Despite its popularity, applying self-training to landmark detection faces three problems: 1) The selected confident pseudo-labels often contain data bias, which may hurt model performance; 2) It is not easy to decide a proper threshold for sample selection as the localization task can be sensitive to noisy pseudo-labels; 3) coordinate regression does not output confidence, making selection-based self-training infeasible. To address the above issues, we propose Self-Training for Landmark Detection (STLD), a method that does not require explicit pseudo-label selection. Instead, STLD constructs a task curriculum to deal with confirmation bias, which progressively transitions from more confident to less confident tasks over the rounds of self-training. Pseudo pretraining and shrink regression are two essential components for such a curriculum, where the former is the first task of the curriculum for providing a better model initialization and the latter is further added in the later rounds to directly leverage the pseudo-labels in a coarse-to-fine manner. Experiments on three facial and one medical landmark detection benchmark show that STLD outperforms the existing methods consistently in both semi- and omni-supervised settings. The code is available at \u0000<uri>https://github.com/jhb86253817/STLD</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration 多标签信度校准的动态相关性学习和正则化。
Tianshui Chen;Weihang Wang;Tao Pu;Jinghui Qin;Zhijing Yang;Jie Liu;Liang Lin
Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios.
现代视觉识别模型由于依赖于复杂的深度神经网络和单击目标监督,常常表现出过度自信,从而导致不可靠的置信度分数,因此有必要进行校准。目前的置信度校准技术主要针对单标签场景,而对于更实用、更通用的多标签场景则缺乏关注。本文介绍了多标签置信度校准(MLCC)任务,旨在为多标签场景提供校准良好的置信度分数。与单标签图像不同,多标签图像包含多个对象,会导致语义混淆,进一步提高置信度分数的不稳定性。现有的基于标签平滑的单标签校准方法未能考虑到类别相关性,而类别相关性对于解决语义混淆问题至关重要,因此无法达到最佳性能。为了克服这些局限性,我们提出了动态相关性学习和正则化(DCLR)算法,该算法利用多粒度语义相关性更好地模拟语义混淆,从而实现自适应正则化。DCLR 学习每个类别特有的动态实例级和原型级相似性,并利用这些相似性来衡量不同类别之间的语义相关性。有了这种理解,我们就能构建自适应标签向量,为具有强相关性的类别分配更高的值,从而促进更有效的正则化。我们建立了一个评估基准,重新实施了几种先进的置信度校准算法,并将它们应用到领先的多标签识别(MLR)模型中进行公平比较。通过大量实验,我们证明了 DCLR 在多标签场景中提供可靠置信度分数方面的性能优于现有方法。
{"title":"Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration","authors":"Tianshui Chen;Weihang Wang;Tao Pu;Jinghui Qin;Zhijing Yang;Jie Liu;Liang Lin","doi":"10.1109/TIP.2024.3448248","DOIUrl":"10.1109/TIP.2024.3448248","url":null,"abstract":"Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IdeNet: Making Neural Network Identify Camouflaged Objects Like Creatures IdeNet:让神经网络像识别生物一样识别伪装物体
Juwei Guan;Xiaolin Fang;Tongxin Zhu;Zhipeng Cai;Zhen Ling;Ming Yang;Junzhou Luo
Camouflaged objects often blend in with their surroundings, making the perception of a camouflaged object a more complex procedure. However, most neural-network-based methods that simulate the visual information processing pathway of creatures only roughly define the general process, which deficiently reproduces the process of identifying camouflaged objects. How to make modeled neural networks perceive camouflaged objects as effectively as creatures is a significant topic that deserves further consideration. After meticulous analysis of biological visual information processing, we propose an end-to-end prudent and comprehensive neural network, termed IdeNet, to model the critical information processing. Specifically, IdeNet divides the entire perception process into five stages: information collection, information augmentation, information filtering, information localization, and information correction and object identification. In addition, we design tailored visual information processing mechanisms for each stage, including the information augmentation module (IAM), the information filtering module (IFM), the information localization module (ILM), and the information correction module (ICM), to model the critical visual information processing and establish the inextricable association of biological behavior and visual information processing. The extensive experiments show that IdeNet outperforms state-of-the-art methods in all benchmarks, demonstrating the effectiveness of the five-stage partitioning of visual information processing pathway and the tailored visual information processing mechanisms for camouflaged object detection. Our code is publicly available at: https://github.com/whyandbecause/IdeNet.
伪装物体通常会与周围环境融为一体,因此对伪装物体的感知过程更为复杂。然而,大多数基于神经网络的方法在模拟生物的视觉信息处理途径时,只能粗略地定义一般过程,不能充分再现识别伪装物体的过程。如何让模拟的神经网络像生物一样有效地感知伪装物体,是一个值得深入研究的重要课题。经过对生物视觉信息处理的细致分析,我们提出了一种端到端的谨慎而全面的神经网络,即 IdeNet,来模拟关键信息的处理过程。具体来说,IdeNet 将整个感知过程分为五个阶段:信息收集、信息增强、信息过滤、信息定位、信息校正和对象识别。此外,我们还为每个阶段设计了量身定制的视觉信息处理机制,包括信息增强模块(IAM)、信息过滤模块(IFM)、信息定位模块(ILM)和信息校正模块(ICM),以模拟关键的视觉信息处理,并建立生物行为与视觉信息处理之间密不可分的联系。大量实验表明,IdeNet 在所有基准测试中的表现都优于最先进的方法,证明了视觉信息处理路径的五阶段划分和为伪装物体检测量身定制的视觉信息处理机制的有效性。我们的代码可在以下网址公开获取:https://github.com/whyandbecause/IdeNet。
{"title":"IdeNet: Making Neural Network Identify Camouflaged Objects Like Creatures","authors":"Juwei Guan;Xiaolin Fang;Tongxin Zhu;Zhipeng Cai;Zhen Ling;Ming Yang;Junzhou Luo","doi":"10.1109/TIP.2024.3449574","DOIUrl":"10.1109/TIP.2024.3449574","url":null,"abstract":"Camouflaged objects often blend in with their surroundings, making the perception of a camouflaged object a more complex procedure. However, most neural-network-based methods that simulate the visual information processing pathway of creatures only roughly define the general process, which deficiently reproduces the process of identifying camouflaged objects. How to make modeled neural networks perceive camouflaged objects as effectively as creatures is a significant topic that deserves further consideration. After meticulous analysis of biological visual information processing, we propose an end-to-end prudent and comprehensive neural network, termed IdeNet, to model the critical information processing. Specifically, IdeNet divides the entire perception process into five stages: information collection, information augmentation, information filtering, information localization, and information correction and object identification. In addition, we design tailored visual information processing mechanisms for each stage, including the information augmentation module (IAM), the information filtering module (IFM), the information localization module (ILM), and the information correction module (ICM), to model the critical visual information processing and establish the inextricable association of biological behavior and visual information processing. The extensive experiments show that IdeNet outperforms state-of-the-art methods in all benchmarks, demonstrating the effectiveness of the five-stage partitioning of visual information processing pathway and the tailored visual information processing mechanisms for camouflaged object detection. Our code is publicly available at: \u0000<uri>https://github.com/whyandbecause/IdeNet</uri>\u0000.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation 超越外观:用于高效稳健视频对象分割的多帧时空语境记忆网络
Jisheng Dang;Huicheng Zheng;Xiaohao Xu;Longguang Wang;Yulan Guo
Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the object’s appearance over time. Moreover, previous matching mechanisms suffer from redundant computation and noise interference as the number of accumulated frames increases. In this paper, we introduce a multi-frame spatio-temporal context memory (STCM) network to exploit discriminative spatio-temporal cues in multiple adjacent frames by utilizing a multi-frame context interaction module (MCI) for memory construction. Based on the proposed MCI module, a sparse group memory reader is developed to enable efficient sparse matching during memory reading. Our proposed method is generic and achieves state-of-the-art performance with real-time speed on benchmark datasets such as DAVIS and YouTube-VOS. In addition, our model exhibits robustness to sparse videos with low frame rates.
目前的视频对象分割方法主要依靠帧的外观信息来进行匹配。尽管取得了重大进展,但由于物体的外观会随时间快速变化,可靠的匹配仍具有挑战性。此外,随着累积帧数的增加,以往的匹配机制还存在冗余计算和噪声干扰问题。在本文中,我们引入了多帧时空上下文记忆(STCM)网络,利用多帧上下文交互模块(MCI)构建记忆,从而利用多个相邻帧中的时空分辨线索。基于所提出的 MCI 模块,我们开发了一种稀疏组记忆读取器,以便在记忆读取过程中实现高效的稀疏匹配。我们提出的方法具有通用性,在 DAVIS 和 YouTube-VOS 等基准数据集上实现了最先进的实时性能。此外,我们的模型对低帧率的稀疏视频具有鲁棒性。
{"title":"Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation","authors":"Jisheng Dang;Huicheng Zheng;Xiaohao Xu;Longguang Wang;Yulan Guo","doi":"10.1109/TIP.2024.3423390","DOIUrl":"10.1109/TIP.2024.3423390","url":null,"abstract":"Current video object segmentation approaches primarily rely on frame-wise appearance information to perform matching. Despite significant progress, reliable matching becomes challenging due to rapid changes of the object’s appearance over time. Moreover, previous matching mechanisms suffer from redundant computation and noise interference as the number of accumulated frames increases. In this paper, we introduce a multi-frame spatio-temporal context memory (STCM) network to exploit discriminative spatio-temporal cues in multiple adjacent frames by utilizing a multi-frame context interaction module (MCI) for memory construction. Based on the proposed MCI module, a sparse group memory reader is developed to enable efficient sparse matching during memory reading. Our proposed method is generic and achieves state-of-the-art performance with real-time speed on benchmark datasets such as DAVIS and YouTube-VOS. In addition, our model exhibits robustness to sparse videos with low frame rates.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RoMo: Robust Unsupervised Multimodal Learning with Noisy Pseudo Labels. RoMo:使用噪声伪标签的鲁棒无监督多模态学习。
Yongxiang Li, Yang Qin, Yuan Sun, Dezhong Peng, Xi Peng, Peng Hu

The rise of the metaverse and the increasing volume of heterogeneous 2D and 3D data have led to a growing demand for cross-modal retrieval, which allows users to query semantically relevant data across different modalities. Existing methods heavily rely on class labels to bridge semantic correlations, but it is expensive or even impossible to collect large-scale welll-abeled data in practice, thus making unsupervised learning more attractive and practical. However, unsupervised cross-modal learning is challenging to bridge semantic correlations across different modalities due to the lack of label information, which inevitably leads to unreliable discrimination. Based on the observations, we reveal and study a novel problem in this paper, namely unsupervised cross-modal learning with noisy pseudo labels. To address this problem, we propose a 2D-3D unsupervised multimodal learning framework that harnesses multimodal data. Our framework consists of three key components: 1) Self-matching Supervision Mechanism (SSM) warms up the model to encapsulate discrimination into the representations in a self-supervised learning manner. 2) Robust Discriminative Learning (RDL) further mines the discrimination from the learned imperfect predictions after warming up. To tackle the noise in the predicted pseudo labels, RDL leverages a novel Robust Concentrating Learning Loss (RCLL) to alleviate the influence of the uncertain samples, thus embracing robustness against noisy pseudo labels. 3) Modality-invariance Learning Mechanism (MLM) minimizes the cross-modal discrepancy to enforce SSM and RDL to produce common representations. We perform comprehensive experiments on four 2D-3D multimodal datasets, comparing our method against 14 state-of-the-art approaches, thereby demonstrating its effectiveness and superiority.

元宇宙的兴起以及异构二维和三维数据量的不断增加,导致对跨模态检索的需求日益增长,这种检索允许用户跨不同模态查询语义相关的数据。现有方法在很大程度上依赖于类标签来弥合语义相关性,但在实践中收集大规模的标注良好的数据成本高昂,甚至根本不可能,因此无监督学习更具吸引力和实用性。然而,由于缺乏标签信息,无监督跨模态学习在弥合不同模态的语义相关性方面具有挑战性,这不可避免地会导致不可靠的判别。基于上述观察,我们在本文中揭示并研究了一个新问题,即带有噪声伪标签的无监督跨模态学习。为了解决这个问题,我们提出了一个利用多模态数据的 2D-3D 无监督多模态学习框架。我们的框架由三个关键部分组成:1) 自匹配监督机制(SSM)对模型进行预热,以自我监督学习的方式将判别封装到表征中。2) 强健判别学习(RDL)从预热后学习到的不完美预测中进一步挖掘判别。为了解决预测伪标签中的噪声问题,RDL 利用一种新颖的鲁棒集中学习损失(RCLL)来减轻不确定样本的影响,从而增强了对噪声伪标签的鲁棒性。3) 模态方差学习机制(MLM)最小化了跨模态差异,从而强制 SSM 和 RDL 产生共同的表征。我们在四个 2D-3D 多模态数据集上进行了全面的实验,将我们的方法与 14 种最先进的方法进行了比较,从而证明了它的有效性和优越性。
{"title":"RoMo: Robust Unsupervised Multimodal Learning with Noisy Pseudo Labels.","authors":"Yongxiang Li, Yang Qin, Yuan Sun, Dezhong Peng, Xi Peng, Peng Hu","doi":"10.1109/TIP.2024.3426482","DOIUrl":"https://doi.org/10.1109/TIP.2024.3426482","url":null,"abstract":"<p><p>The rise of the metaverse and the increasing volume of heterogeneous 2D and 3D data have led to a growing demand for cross-modal retrieval, which allows users to query semantically relevant data across different modalities. Existing methods heavily rely on class labels to bridge semantic correlations, but it is expensive or even impossible to collect large-scale welll-abeled data in practice, thus making unsupervised learning more attractive and practical. However, unsupervised cross-modal learning is challenging to bridge semantic correlations across different modalities due to the lack of label information, which inevitably leads to unreliable discrimination. Based on the observations, we reveal and study a novel problem in this paper, namely unsupervised cross-modal learning with noisy pseudo labels. To address this problem, we propose a 2D-3D unsupervised multimodal learning framework that harnesses multimodal data. Our framework consists of three key components: 1) Self-matching Supervision Mechanism (SSM) warms up the model to encapsulate discrimination into the representations in a self-supervised learning manner. 2) Robust Discriminative Learning (RDL) further mines the discrimination from the learned imperfect predictions after warming up. To tackle the noise in the predicted pseudo labels, RDL leverages a novel Robust Concentrating Learning Loss (RCLL) to alleviate the influence of the uncertain samples, thus embracing robustness against noisy pseudo labels. 3) Modality-invariance Learning Mechanism (MLM) minimizes the cross-modal discrepancy to enforce SSM and RDL to produce common representations. We perform comprehensive experiments on four 2D-3D multimodal datasets, comparing our method against 14 state-of-the-art approaches, thereby demonstrating its effectiveness and superiority.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding 用于增强视频编码中相互预测的时空卷积神经网络
Philipp Merkle;Martin Winken;Jonathan Pfaff;Heiko Schwarz;Detlev Marpe;Thomas Wiegand
This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.
本文提出了一种基于卷积神经网络(CNN)的增强多用途视频编码(VVC)中的区间预测方法。我们的方法旨在利用结合了空间和时间参考样本的残差 CNN 改进块间预测信号。其理论依据是,与传统信号处理方法相比,基于神经网络的方法具有更高的信号适应性,而且空间上相邻的参考样本有可能通过适应其附近的重建信号来改善预测信号。我们的研究表明,在 CNN 中加入多相分解阶段,可以在计算复杂性和编码性能之间实现更好的权衡。在预测过程中加入空间参考样本具有挑战性:事实上,一个区块的 CNN 输入可能取决于前几个区块的 CNN 输出,这阻碍了并行处理。我们通过引入包含特定限制参考样本的新型信号平面来解决这一问题,从而在保持较高压缩效率的同时实现并行解码。总体而言,实验结果表明,在 JVET 常见测试条件下,随机存取 (RA) 和低延迟 B (LB) 配置的平均比特率分别节省了 4.07% 和 3.47%。
{"title":"Spatio-Temporal Convolutional Neural Network for Enhanced Inter Prediction in Video Coding","authors":"Philipp Merkle;Martin Winken;Jonathan Pfaff;Heiko Schwarz;Detlev Marpe;Thomas Wiegand","doi":"10.1109/TIP.2024.3446228","DOIUrl":"10.1109/TIP.2024.3446228","url":null,"abstract":"This paper presents a convolutional neural network (CNN)-based enhancement to inter prediction in Versatile Video Coding (VVC). Our approach aims at improving the prediction signal of inter blocks with a residual CNN that incorporates spatial and temporal reference samples. It is motivated by the theoretical consideration that neural network-based methods have a higher degree of signal adaptivity than conventional signal processing methods and that spatially neighboring reference samples have the potential to improve the prediction signal by adapting it to the reconstructed signal in its immediate vicinity. We show that adding a polyphase decomposition stage to the CNN results in a significantly better trade-off between computational complexity and coding performance. Incorporating spatial reference samples in the inter prediction process is challenging: The fact that the input of the CNN for one block may depend on the output of the CNN for preceding blocks prohibits parallel processing. We solve this by introducing a novel signal plane that contains specifically constrained reference samples, enabling parallel decoding while maintaining a high compression efficiency. Overall, experimental results show average bit rate savings of 4.07% and 3.47% for the random access (RA) and low-delay B (LB) configurations of the JVET common test conditions, respectively.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10648618","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Action Unit Representation based on Self-supervised Learning with Ensembled Priori Constraints. 基于集合先验约束的自监督学习的面部动作单元表示法
Haifeng Chen, Peng Zhang, Chujia Guo, Ke Lu, Dongmei Jiang

Facial action units (AUs) focus on a comprehensive set of atomic facial muscle movements for human expression understanding. Based on supervised learning, discriminative AU representation can be achieved from local patches where the AUs are located. Unfortunately, accurate AU localization and characterization are challenged by the tremendous manual annotations, which limits the performance of AU recognition in realistic scenarios. In this study, we propose an end-to-end self-supervised AU representation learning model (SsupAU) to learn AU representations from unlabeled facial videos. Specifically, the input face is decomposed into six components using autoencoders: five photo-geometric meaningful components, together with 2D flow field AUs. By constructing the canonical neutral face, posed neutral face, and posed expressional face gradually, these components can be disentangled without supervision, therefore the AU representations can be learned. To construct the canonical neutral face without manually labeled ground truth of emotion state or AU intensity, two priori knowledge based assumptions are proposed: 1) identity consistency, which explores the identical albedos and depths of different frames in a face video, and helps to learn the camera color mode as an extra cue for canonical neutral face recovery. 2) average face, which enables the model to discover a 'neutral facial expression' of the canonical neutral face and decouple the AUs in representation learning. To the best of our knowledge, this is the first attempt to design self-supervised AU representation learnging method based on the definition of AUs. Substantial experiments on benchmark datasets have demonstrated the superior performance of the proposed work in comparison to other state-of-the-art approaches, as well as an outstanding capability of decomposing input face into meaningful factors for its reconstruction. The code is made available at https://github.com/Sunner4nwpu/SsupAU.

面部动作单元(AUs)侧重于一组全面的面部肌肉原子运动,用于人类表情理解。在监督学习的基础上,可以通过 AU 所在的局部斑块实现具有区分性的 AU 表示。遗憾的是,准确的 AU 定位和表征受到大量人工标注的挑战,这限制了现实场景中 AU 识别的性能。在本研究中,我们提出了一种端到端的自监督 AU 表示学习模型(SsupAU),用于从无标记的面部视频中学习 AU 表示。具体来说,输入的人脸通过自动编码器被分解成六个部分:五个有意义的照片几何部分和二维流场 AU。通过逐步构建典型中性脸、摆拍中性脸和摆拍表情脸,可以在没有监督的情况下将这些成分拆分开来,从而学习到 AU 表示。为了在没有人工标注情绪状态或 AU 强度的基本事实的情况下构建典型中性脸,提出了两个基于先验知识的假设:1) 身份一致性,即探索人脸视频中不同帧的相同反照率和深度,并帮助学习相机颜色模式,作为恢复标准中性人脸的额外线索。2) 平均人脸,它能使模型发现典型中性人脸的 "中性面部表情",并在表征学习中解耦 AU。据我们所知,这是首次尝试根据 AU 的定义来设计自监督 AU 表示学习方法。在基准数据集上进行的大量实验表明,与其他最先进的方法相比,所提出的方法性能优越,而且能将输入的人脸分解为有意义的因素,从而重建人脸。代码可在 https://github.com/Sunner4nwpu/SsupAU 上获取。
{"title":"Facial Action Unit Representation based on Self-supervised Learning with Ensembled Priori Constraints.","authors":"Haifeng Chen, Peng Zhang, Chujia Guo, Ke Lu, Dongmei Jiang","doi":"10.1109/TIP.2024.3446250","DOIUrl":"https://doi.org/10.1109/TIP.2024.3446250","url":null,"abstract":"<p><p>Facial action units (AUs) focus on a comprehensive set of atomic facial muscle movements for human expression understanding. Based on supervised learning, discriminative AU representation can be achieved from local patches where the AUs are located. Unfortunately, accurate AU localization and characterization are challenged by the tremendous manual annotations, which limits the performance of AU recognition in realistic scenarios. In this study, we propose an end-to-end self-supervised AU representation learning model (SsupAU) to learn AU representations from unlabeled facial videos. Specifically, the input face is decomposed into six components using autoencoders: five photo-geometric meaningful components, together with 2D flow field AUs. By constructing the canonical neutral face, posed neutral face, and posed expressional face gradually, these components can be disentangled without supervision, therefore the AU representations can be learned. To construct the canonical neutral face without manually labeled ground truth of emotion state or AU intensity, two priori knowledge based assumptions are proposed: 1) identity consistency, which explores the identical albedos and depths of different frames in a face video, and helps to learn the camera color mode as an extra cue for canonical neutral face recovery. 2) average face, which enables the model to discover a 'neutral facial expression' of the canonical neutral face and decouple the AUs in representation learning. To the best of our knowledge, this is the first attempt to design self-supervised AU representation learnging method based on the definition of AUs. Substantial experiments on benchmark datasets have demonstrated the superior performance of the proposed work in comparison to other state-of-the-art approaches, as well as an outstanding capability of decomposing input face into meaningful factors for its reconstruction. The code is made available at https://github.com/Sunner4nwpu/SsupAU.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation 利用改进的棋盘式上下文模型、可变形残差模块和知识蒸馏实现快速、高性能的学习图像压缩
Haisheng Fu;Feng Liang;Jie Liang;Yongqiang Wang;Zhenman Fang;Guohe Zhang;Jingning Han
Deep learning-based image compression has made great progresses recently. However, some leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we propose four techniques to balance the trade-off between the complexity and performance. We first introduce the deformable residual module to remove more redundancies in the input image, thereby enhancing compression performance. Second, we design an improved checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop a three-pass knowledge distillation scheme to retrain the decoder and entropy coding, and reduce the complexity of the core decoder network, which transfers both the final and intermediate results of the teacher network to the student network to improve its performance. Fourth, we introduce $L_{1}$ regularization to make the numerical values of the latent representation more sparse, and we only encode non-zero channels in the encoding and decoding process to reduce the bit rate. This also reduces the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3% higher. Our method achieves better rate-distortion performance than classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods, as measured by both PSNR and MS-SSIM metrics on the Kodak and Tecnick-40 datasets.
基于深度学习的图像压缩技术近来取得了长足进步。然而,一些领先的方案使用串行上下文自适应熵模型来提高速率-失真(R-D)性能,但速度非常慢。此外,编码和解码网络的复杂性也相当高,不适合许多实际应用。在本文中,我们提出了四种技术来平衡复杂性和性能之间的权衡。首先,我们引入了可变形残差模块,以去除输入图像中的更多冗余,从而提高压缩性能。其次,我们设计了一种改进的棋盘式上下文模型,该模型具有两个独立的分布参数估计网络和不同的概率模型,与顺序上下文自适应模型相比,它能在不牺牲性能的情况下实现并行解码。第三,我们开发了一种三重知识提炼方案来重新训练解码器和熵编码,并降低核心解码器网络的复杂度,将教师网络的最终和中间结果转移到学生网络,以提高其性能。第四,我们引入 L1 正则化,使潜在表示的数值更加稀疏,并且在编码和解码过程中只对非零通道进行编码,以降低比特率。这也缩短了编码和解码时间。实验表明,与最先进的学习图像编码方案相比,我们的方法在编码和解码时的速度分别提高了约 20 倍和 70-90 倍,R-D 性能也提高了 2.3%。根据柯达和 Tecnick-40 数据集上的 PSNR 和 MS-SSIM 指标,我们的方法比包括 H.266/VVC-intra (4:4:4) 在内的经典图像编解码器和一些最新的学习方法实现了更好的速率-失真性能。
{"title":"Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation","authors":"Haisheng Fu;Feng Liang;Jie Liang;Yongqiang Wang;Zhenman Fang;Guohe Zhang;Jingning Han","doi":"10.1109/TIP.2024.3445737","DOIUrl":"10.1109/TIP.2024.3445737","url":null,"abstract":"Deep learning-based image compression has made great progresses recently. However, some leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we propose four techniques to balance the trade-off between the complexity and performance. We first introduce the deformable residual module to remove more redundancies in the input image, thereby enhancing compression performance. Second, we design an improved checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop a three-pass knowledge distillation scheme to retrain the decoder and entropy coding, and reduce the complexity of the core decoder network, which transfers both the final and intermediate results of the teacher network to the student network to improve its performance. Fourth, we introduce \u0000<inline-formula> <tex-math>$L_{1}$ </tex-math></inline-formula>\u0000 regularization to make the numerical values of the latent representation more sparse, and we only encode non-zero channels in the encoding and decoding process to reduce the bit rate. This also reduces the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3% higher. Our method achieves better rate-distortion performance than classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods, as measured by both PSNR and MS-SSIM metrics on the Kodak and Tecnick-40 datasets.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decouple Ego-View Motions for Predicting Pedestrian Trajectory and Intention 解耦自我视图运动,预测行人轨迹和意图。
Zhengming Zhang;Zhengming Ding;Renran Tian
Pedestrian trajectory prediction is a critical component of autonomous driving in urban environments, allowing vehicles to anticipate pedestrian movements and facilitate safer interactions. While egocentric-view-based algorithms can reduce the sensing and computation burdens of 3D scene reconstruction, accurately predicting pedestrian trajectories and interpreting their intentions from this perspective requires a better understanding of the coupled vehicle (camera) and pedestrian motions, which has not been adequately addressed by existing models. In this paper, we present a novel egocentric pedestrian trajectory prediction approach that uses a two-tower structure and multi-modal inputs. One tower, the vehicle module, receives only the initial pedestrian position and ego-vehicle actions and speed, while the other, the pedestrian module, receives additional prior pedestrian trajectory and visual features. Our proposed action-aware loss function allows the two-tower model to decompose pedestrian trajectory predictions into two parts, caused by ego-vehicle movement and pedestrian movement, respectively, even when only trained on combined ego-view motions. This decomposition increases model flexibility and provides a better estimation of pedestrian actions and intentions, enhancing overall performance. Experiments on three publicly available benchmark datasets show that our proposed model outperforms all existing algorithms in ego-view pedestrian trajectory prediction accuracy.
行人轨迹预测是城市环境中自动驾驶的一个重要组成部分,它使车辆能够预测行人的行动并促进更安全的互动。虽然基于自我中心视角的算法可以减轻三维场景重建的传感和计算负担,但从这个角度准确预测行人轨迹并解读其意图需要更好地理解车辆(摄像头)和行人的耦合运动,而现有模型尚未充分解决这个问题。在本文中,我们提出了一种新颖的以自我为中心的行人轨迹预测方法,该方法采用双塔结构和多模态输入。其中一个塔,即车辆模块,只接收初始行人位置和自我车辆的动作和速度,而另一个塔,即行人模块,接收额外的先验行人轨迹和视觉特征。我们提出的行动感知损失函数允许双塔模型将行人轨迹预测分解为两部分,分别由自我车辆运动和行人运动引起,即使只对自我视图运动进行综合训练也是如此。这种分解增加了模型的灵活性,并能更好地估计行人的行动和意图,从而提高整体性能。在三个公开的基准数据集上进行的实验表明,我们提出的模型在自我视角行人轨迹预测准确性方面优于所有现有算法。
{"title":"Decouple Ego-View Motions for Predicting Pedestrian Trajectory and Intention","authors":"Zhengming Zhang;Zhengming Ding;Renran Tian","doi":"10.1109/TIP.2024.3445734","DOIUrl":"10.1109/TIP.2024.3445734","url":null,"abstract":"Pedestrian trajectory prediction is a critical component of autonomous driving in urban environments, allowing vehicles to anticipate pedestrian movements and facilitate safer interactions. While egocentric-view-based algorithms can reduce the sensing and computation burdens of 3D scene reconstruction, accurately predicting pedestrian trajectories and interpreting their intentions from this perspective requires a better understanding of the coupled vehicle (camera) and pedestrian motions, which has not been adequately addressed by existing models. In this paper, we present a novel egocentric pedestrian trajectory prediction approach that uses a two-tower structure and multi-modal inputs. One tower, the vehicle module, receives only the initial pedestrian position and ego-vehicle actions and speed, while the other, the pedestrian module, receives additional prior pedestrian trajectory and visual features. Our proposed action-aware loss function allows the two-tower model to decompose pedestrian trajectory predictions into two parts, caused by ego-vehicle movement and pedestrian movement, respectively, even when only trained on combined ego-view motions. This decomposition increases model flexibility and provides a better estimation of pedestrian actions and intentions, enhancing overall performance. Experiments on three publicly available benchmark datasets show that our proposed model outperforms all existing algorithms in ego-view pedestrian trajectory prediction accuracy.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1