首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
Texture-Consistent 3D Scene Style Transfer via Transformer-Guided Neural Radiance Fields. 纹理一致的3D场景风格转移通过变压器引导的神经辐射场。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-04 DOI: 10.1109/tip.2025.3626892
Wudi Chen,Zhiyuan Zha,Shigang Wang,Liaqat Ali,Bihan Wen,Xin Yuan,Jiantao Zhou,Ce Zhu
Recent advancements have suggested that neural radiance fields (NeRFs) show great potential in 3D style transfer. However, most existing NeRF-based style transfer methods still face considerable challenges in generating stylized images that simultaneously preserve clear scene textures and maintain strong cross-view consistency. To address these limitations, in this paper, we propose a novel transformer-guided approach for 3D scene style transfer. Specifically, we first design a transformer-based style transfer network to capture long-range dependencies and generate 2D stylized images with initial consistency, which serve as supervision for the 3D stylized generation. To enable fine-grained control over style, we propose a latent style vector as a conditional feature and design a style network that projects this style information into the 3D space. We further develop a merge network that integrates style features with scene geometry to render 3D stylized images that are both visually coherent and stylistically consistent. In addition, we propose a texture consistency loss to preserve scene structure and enhance texture fidelity across views. Extensive quantitative and qualitative experimental results demonstrate that our proposed approach outperforms many state-of-the-art methods in terms of visual perception, image quality and multi-view consistency. Our code and more results are available at: https://github.com/PaiDii/TGTC-Style.git.
最近的进展表明,神经辐射场(nerf)在3D风格转移中显示出巨大的潜力。然而,大多数现有的基于nerf的风格转移方法在生成程式化图像时仍然面临相当大的挑战,同时保持清晰的场景纹理和保持强的跨视图一致性。为了解决这些限制,在本文中,我们提出了一种新的变压器引导的3D场景风格转换方法。具体来说,我们首先设计了一个基于变压器的风格转移网络来捕获远程依赖关系,并生成具有初始一致性的2D风格化图像,作为3D风格化生成的监督。为了实现对风格的细粒度控制,我们提出了一个潜在的风格向量作为条件特征,并设计了一个风格网络,将该风格信息投射到3D空间中。我们进一步开发了一个合并网络,将风格特征与场景几何图形集成在一起,以呈现视觉连贯和风格一致的3D风格化图像。此外,我们提出了纹理一致性损失来保留场景结构并增强视图间的纹理保真度。大量的定量和定性实验结果表明,我们提出的方法在视觉感知、图像质量和多视图一致性方面优于许多最先进的方法。我们的代码和更多的结果可在:https://github.com/PaiDii/TGTC-Style.git。
{"title":"Texture-Consistent 3D Scene Style Transfer via Transformer-Guided Neural Radiance Fields.","authors":"Wudi Chen,Zhiyuan Zha,Shigang Wang,Liaqat Ali,Bihan Wen,Xin Yuan,Jiantao Zhou,Ce Zhu","doi":"10.1109/tip.2025.3626892","DOIUrl":"https://doi.org/10.1109/tip.2025.3626892","url":null,"abstract":"Recent advancements have suggested that neural radiance fields (NeRFs) show great potential in 3D style transfer. However, most existing NeRF-based style transfer methods still face considerable challenges in generating stylized images that simultaneously preserve clear scene textures and maintain strong cross-view consistency. To address these limitations, in this paper, we propose a novel transformer-guided approach for 3D scene style transfer. Specifically, we first design a transformer-based style transfer network to capture long-range dependencies and generate 2D stylized images with initial consistency, which serve as supervision for the 3D stylized generation. To enable fine-grained control over style, we propose a latent style vector as a conditional feature and design a style network that projects this style information into the 3D space. We further develop a merge network that integrates style features with scene geometry to render 3D stylized images that are both visually coherent and stylistically consistent. In addition, we propose a texture consistency loss to preserve scene structure and enhance texture fidelity across views. Extensive quantitative and qualitative experimental results demonstrate that our proposed approach outperforms many state-of-the-art methods in terms of visual perception, image quality and multi-view consistency. Our code and more results are available at: https://github.com/PaiDii/TGTC-Style.git.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"1 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probability Map-Guided Network for 3D Volumetric Medical Image Segmentation. 基于概率映射的三维体医学图像分割方法。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-04 DOI: 10.1109/tip.2025.3623259
Zhiqin Zhu,Zimeng Zhang,Guanqiu Qi,Yuanyuan Li,Pan Yang,Yu Liu
3D medical images are volumetric data that provide spatial continuity and multi-dimensional information. These features provide rich anatomical context. However, their anisotropy may result in reduced image detail along certain directions. This can cause blurring or distortion between slices. In addition, global or local intensity inhomogeneities are often observed. This may be due to limitations of the imaging equipment, inappropriate scanning parameters, or variations in the patient's anatomy. This inhomogeneity may blur lesion boundaries and may also mask true features, causing the model to focus on irrelevant regions. Therefore, a probability map-guided network for 3D volumetric medical image segmentation (3D-PMGNet) is proposed. The probability maps generated from the intermediate features are used as supervisory signals to guide the segmentation process. A new probability map reconstruction method is designed, combining dynamic thresholding with local adaptive smoothing. This enhances the reliability of high-response regions while suppressing low-response noise. A learnable channel-wise temperature coefficient is introduced to adjust the probability distribution to make it closer to the true distribution; in addition, a feature fusion method based on dynamic prompt encoding is developed. The response strength of the main feature maps is dynamically adjusted, and this adjustment is achieved through the spatial position encoding derived from the probability maps. The proposed method has been evaluated on four datasets. Experimental results show that the proposed method outperforms state-of-the-art 3D medical image segmentation methods. The source codes have been publicly released at https://github.com/ZHANGZIMENG01/3D-PMGNet.
三维医学图像是提供空间连续性和多维信息的体积数据。这些特征提供了丰富的解剖学背景。然而,它们的各向异性可能导致沿某些方向的图像细节降低。这可能导致切片之间模糊或失真。此外,经常观察到全球或局部强度的不均匀性。这可能是由于成像设备的限制、不适当的扫描参数或患者解剖结构的变化。这种不均匀性可能模糊病变边界,也可能掩盖真实特征,导致模型关注不相关的区域。为此,提出了一种概率地图引导的三维体医学图像分割网络(3D- pmgnet)。由中间特征生成的概率图作为监督信号来指导分割过程。设计了一种结合动态阈值和局部自适应平滑的概率图重构方法。这提高了高响应区域的可靠性,同时抑制了低响应噪声。引入可学习的通道温度系数来调整概率分布,使其更接近真实分布;此外,提出了一种基于动态提示编码的特征融合方法。主要特征映射的响应强度是动态调整的,这种调整是通过概率映射的空间位置编码来实现的。该方法已在四个数据集上进行了评估。实验结果表明,该方法优于现有的三维医学图像分割方法。源代码已经在https://github.com/ZHANGZIMENG01/3D-PMGNet上公开发布。
{"title":"Probability Map-Guided Network for 3D Volumetric Medical Image Segmentation.","authors":"Zhiqin Zhu,Zimeng Zhang,Guanqiu Qi,Yuanyuan Li,Pan Yang,Yu Liu","doi":"10.1109/tip.2025.3623259","DOIUrl":"https://doi.org/10.1109/tip.2025.3623259","url":null,"abstract":"3D medical images are volumetric data that provide spatial continuity and multi-dimensional information. These features provide rich anatomical context. However, their anisotropy may result in reduced image detail along certain directions. This can cause blurring or distortion between slices. In addition, global or local intensity inhomogeneities are often observed. This may be due to limitations of the imaging equipment, inappropriate scanning parameters, or variations in the patient's anatomy. This inhomogeneity may blur lesion boundaries and may also mask true features, causing the model to focus on irrelevant regions. Therefore, a probability map-guided network for 3D volumetric medical image segmentation (3D-PMGNet) is proposed. The probability maps generated from the intermediate features are used as supervisory signals to guide the segmentation process. A new probability map reconstruction method is designed, combining dynamic thresholding with local adaptive smoothing. This enhances the reliability of high-response regions while suppressing low-response noise. A learnable channel-wise temperature coefficient is introduced to adjust the probability distribution to make it closer to the true distribution; in addition, a feature fusion method based on dynamic prompt encoding is developed. The response strength of the main feature maps is dynamically adjusted, and this adjustment is achieved through the spatial position encoding derived from the probability maps. The proposed method has been evaluated on four datasets. Experimental results show that the proposed method outperforms state-of-the-art 3D medical image segmentation methods. The source codes have been publicly released at https://github.com/ZHANGZIMENG01/3D-PMGNet.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"29 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145440835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing Per-Shot Bitrate Ladders using Visual Information Fidelity 使用视觉信息保真度构建每镜头比特率梯子
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-31 DOI: 10.1109/tip.2025.3625750
Krishna Srikar Durbha, Alan C. Bovik
{"title":"Constructing Per-Shot Bitrate Ladders using Visual Information Fidelity","authors":"Krishna Srikar Durbha, Alan C. Bovik","doi":"10.1109/tip.2025.3625750","DOIUrl":"https://doi.org/10.1109/tip.2025.3625750","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"126 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145412124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios 关键- diff:数据稀缺场景下关键图像和注释合成的统一扩散模型
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-30 DOI: 10.1109/tip.2025.3625380
Siyue Yao, Mingjie Sun, Eng Gee Lim, Ran Yi, Baojiang Zhong, Moncef Gabbouj
{"title":"Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios","authors":"Siyue Yao, Mingjie Sun, Eng Gee Lim, Ran Yi, Baojiang Zhong, Moncef Gabbouj","doi":"10.1109/tip.2025.3625380","DOIUrl":"https://doi.org/10.1109/tip.2025.3625380","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"12 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145404441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Semi-smooth Newton-driven Unfolding Network for Multi-modal Image Super-Resolution. 多模态图像超分辨率的深度半光滑牛顿驱动展开网络。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-30 DOI: 10.1109/tip.2025.3625429
Chenxiao Zhang,Xin Deng,Jingyi Xu,Yongxuan Dou,Mai Xu
Deep unfolding has emerged as a powerful solution for Multi-modal Image Super-Resolution (MISR) through strategic integration of cross-modal priors in network architecture. However, current deep unfolding approaches rely on first-order optimization, which exhibit limitations in learning efficiency and reconstruction accuracy. In this paper, to overcome these limitations, we propose a novel Semi-smooth Newton driven Unfolding network for MISR, namely SNUM-Net. Specifically, we first develop a Semi-smooth Newton-driven MISR (SNM) algorithm that establishes a theoretical foundation for our approach. Then, we unfold the iterative solution of SNM into a novel network. To the best of our knowledge, the SNUM-Net is the first successful attempt to design a deep unfolding MISR network based on second-order optimization algorithm. Compared to existing methods, the SNUM-Net demonstrates three main advantages. 1) Universal paradigm: the SNUM-Net provides a unified paradigm for diverse MISR tasks without requiring scenario-specific constraints; 2) Explainable framework: the network preserves a mathematical correspondence with the SNM algorithm, ensuring that the topological relationships between modules are well explainable; 3) Superior performance: comprehensive evaluations across 10 datasets spanning 3 MISR tasks demonstrate the network's exceptional reconstruction accuracy and generalization capability. The software codes are available at https://github.com/pandazcx/SNUM-Net.
深度展开通过在网络架构中战略性地整合跨模态先验,成为多模态图像超分辨率(MISR)的有力解决方案。然而,目前的深度展开方法依赖于一阶优化,在学习效率和重建精度方面存在局限性。在本文中,为了克服这些限制,我们提出了一种新的半光滑牛顿驱动的MISR展开网络,即smum - net。具体来说,我们首先开发了一种半光滑牛顿驱动的MISR (SNM)算法,为我们的方法奠定了理论基础。然后,我们将SNM的迭代解展开为一个新的网络。据我们所知,smum - net是第一个成功设计基于二阶优化算法的深度展开MISR网络的尝试。与现有方法相比,smum - net具有三个主要优点。1)通用范式:smum - net为不同的MISR任务提供了统一的范式,而不需要特定场景的约束;2)可解释框架:网络与SNM算法保持数学对应关系,保证模块之间的拓扑关系具有良好的可解释性;3)卓越的性能:跨越3个MISR任务的10个数据集的综合评估证明了网络卓越的重建精度和泛化能力。软件代码可在https://github.com/pandazcx/SNUM-Net上获得。
{"title":"Deep Semi-smooth Newton-driven Unfolding Network for Multi-modal Image Super-Resolution.","authors":"Chenxiao Zhang,Xin Deng,Jingyi Xu,Yongxuan Dou,Mai Xu","doi":"10.1109/tip.2025.3625429","DOIUrl":"https://doi.org/10.1109/tip.2025.3625429","url":null,"abstract":"Deep unfolding has emerged as a powerful solution for Multi-modal Image Super-Resolution (MISR) through strategic integration of cross-modal priors in network architecture. However, current deep unfolding approaches rely on first-order optimization, which exhibit limitations in learning efficiency and reconstruction accuracy. In this paper, to overcome these limitations, we propose a novel Semi-smooth Newton driven Unfolding network for MISR, namely SNUM-Net. Specifically, we first develop a Semi-smooth Newton-driven MISR (SNM) algorithm that establishes a theoretical foundation for our approach. Then, we unfold the iterative solution of SNM into a novel network. To the best of our knowledge, the SNUM-Net is the first successful attempt to design a deep unfolding MISR network based on second-order optimization algorithm. Compared to existing methods, the SNUM-Net demonstrates three main advantages. 1) Universal paradigm: the SNUM-Net provides a unified paradigm for diverse MISR tasks without requiring scenario-specific constraints; 2) Explainable framework: the network preserves a mathematical correspondence with the SNM algorithm, ensuring that the topological relationships between modules are well explainable; 3) Superior performance: comprehensive evaluations across 10 datasets spanning 3 MISR tasks demonstrate the network's exceptional reconstruction accuracy and generalization capability. The software codes are available at https://github.com/pandazcx/SNUM-Net.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"4 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145403891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation 基于骨架的动作分割的文本派生关系图增强网络
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-29 DOI: 10.1109/tip.2025.3624581
Haoyu Ji, Bowen Chen, Weihong Ren, Wenze Huang, Zhihao Yang, Zhiyong Wang, Honghai Liu
{"title":"Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation","authors":"Haoyu Ji, Bowen Chen, Weihong Ren, Wenze Huang, Zhihao Yang, Zhiyong Wang, Honghai Liu","doi":"10.1109/tip.2025.3624581","DOIUrl":"https://doi.org/10.1109/tip.2025.3624581","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"23 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145381551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unleashing the Power of Each Distilled Image. 释放每个蒸馏图像的力量。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-28 DOI: 10.1109/tip.2025.3624626
Jingxuan Zhang,Zhihua Chen,Lei Dai
Dataset distillation (DD) aims to accelerate the training speed of neural networks (NNs) by synthesizing a reduced dataset. NNs trained on the smaller dataset are expected to obtain almost the same test set accuracy as they do on the larger one. Previous DD research treated the obtained distilled dataset as a regular dataset for training, neglecting the overfitting issue caused by the limited number of original distilled images. In this paper, we propose a new DD paradigm. Specifically, in the deployment stage, distilled images are augmented by amplifying their local information since the teacher network can produce diverse supervision signals when receiving inputs from different regions. Efficient and diverse augmentation methods for each distilled image are devised, while ensuring the authenticity of augmented samples. Additionally, to alleviate the increased training cost caused by data augmentation, we design a bi-directional dynamic dataset pruning technique to prune the original distilled dataset and augmented distilled dataset. A new pruning strategy and scheduling are proposed based on experimental findings. Experiments on 9 benchmark datasets (CIFAR10, CIFAR100, ImageWoof, ImageCat, ImageFruit, ImageNette, ImageNet10, ImageNet100 and ImageNet1K) demonstrate the effectiveness of our approach. For instance, on the ImageNet1K dataset with a ResNet18 architecture and 50 distilled images per class, our algorithm surpasses the second-ranked MiniMax algorithm by 7.6%, achieving a distilled accuracy of 66.2%.
数据集蒸馏(Dataset distillation, DD)旨在通过合成一个简化的数据集来加快神经网络的训练速度。在较小的数据集上训练的神经网络有望获得与在较大的数据集上几乎相同的测试集准确性。以往的DD研究将得到的蒸馏数据集作为常规数据集进行训练,忽略了原始蒸馏图像数量有限导致的过拟合问题。在本文中,我们提出了一个新的DD范式。具体来说,在部署阶段,由于教师网络在接收不同区域的输入时会产生不同的监督信号,因此通过放大其局部信息来增强提取的图像。在保证增强样本真实性的前提下,对每幅提取图像设计出高效多样的增强方法。此外,为了减轻数据增强带来的训练成本增加,我们设计了一种双向动态数据集剪枝技术,对原始蒸馏数据集和增强蒸馏数据集进行剪枝。根据实验结果,提出了一种新的剪枝策略和剪枝调度方法。在9个基准数据集(CIFAR10、CIFAR100、ImageWoof、ImageCat、ImageFruit、ImageNette、ImageNet10、ImageNet100和ImageNet1K)上的实验证明了我们方法的有效性。例如,在具有ResNet18架构的ImageNet1K数据集上,每个类有50张蒸馏图像,我们的算法比排名第二的MiniMax算法高出7.6%,达到66.2%的蒸馏精度。
{"title":"Unleashing the Power of Each Distilled Image.","authors":"Jingxuan Zhang,Zhihua Chen,Lei Dai","doi":"10.1109/tip.2025.3624626","DOIUrl":"https://doi.org/10.1109/tip.2025.3624626","url":null,"abstract":"Dataset distillation (DD) aims to accelerate the training speed of neural networks (NNs) by synthesizing a reduced dataset. NNs trained on the smaller dataset are expected to obtain almost the same test set accuracy as they do on the larger one. Previous DD research treated the obtained distilled dataset as a regular dataset for training, neglecting the overfitting issue caused by the limited number of original distilled images. In this paper, we propose a new DD paradigm. Specifically, in the deployment stage, distilled images are augmented by amplifying their local information since the teacher network can produce diverse supervision signals when receiving inputs from different regions. Efficient and diverse augmentation methods for each distilled image are devised, while ensuring the authenticity of augmented samples. Additionally, to alleviate the increased training cost caused by data augmentation, we design a bi-directional dynamic dataset pruning technique to prune the original distilled dataset and augmented distilled dataset. A new pruning strategy and scheduling are proposed based on experimental findings. Experiments on 9 benchmark datasets (CIFAR10, CIFAR100, ImageWoof, ImageCat, ImageFruit, ImageNette, ImageNet10, ImageNet100 and ImageNet1K) demonstrate the effectiveness of our approach. For instance, on the ImageNet1K dataset with a ResNet18 architecture and 50 distilled images per class, our algorithm surpasses the second-ranked MiniMax algorithm by 7.6%, achieving a distilled accuracy of 66.2%.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"15 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145380874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent Image Layout Editing with Diffusion Models 一致的图像布局编辑与扩散模型
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-28 DOI: 10.1109/tip.2025.3623869
Tao Xia, Yudi Zhang, Ting Liu, Lei Zhang
{"title":"Consistent Image Layout Editing with Diffusion Models","authors":"Tao Xia, Yudi Zhang, Ting Liu, Lei Zhang","doi":"10.1109/tip.2025.3623869","DOIUrl":"https://doi.org/10.1109/tip.2025.3623869","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"22 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145381558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Where and What: Contextual Dynamics-aware Anomaly Detection in Surveillance Videos. 地点和内容:监控视频中上下文动态感知的异常检测。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-24 DOI: 10.1109/tip.2025.3623392
Deok-Hyun Ahn,YongJin Jo,DongBum Kim,Gi-Pyo Nam,Jae-Ho Han,Haksub Kim
In surveillance environments, detecting anomalies requires understanding the contextual dynamics of the environment, human behaviors, and movements within a scene. Effective anomaly detection must address both the where and what of events, but existing approaches such as unimodal action-based methods or LLM-integrated multimodal frameworks have limitations. These methods either rely on implicit scene information, making it difficult to localize where anomalies occur, or fail to adapt to surveillance specific challenges such as view changes, subtle actions, low light conditions, and crowded scenes. As a result, these challenges hinder accurate detection of what occurs. To overcome these limitations, our system takes advantage of features from a lightweight scene classification model to discern where an event occurs, acquiring explicit location-based context. To identify what events occur, it focuses on atomic actions, which remain underexplored in this field and are better suited to interpreting intricate abnormal behaviors than conventional abstract action features. To achieve robust anomaly detection, the proposed Temporal-Semantic Relationship Network (TSRN) models spatio-temporal relationships among multimodal features and employs a Segment-selective Focal Margin loss (SFML) to effectively address class imbalance, outperforming conventional MIL-based methods. Compared to existing methods, experimental results demonstrate that our system significantly reduces false alarms while maintaining robustness across diverse scenarios. Quantitative and qualitative evaluations on public datasets validate the practical effectiveness of the proposed method for real-world surveillance applications.
在监视环境中,检测异常需要了解环境的上下文动态、人类行为和场景中的运动。有效的异常检测必须同时处理事件的位置和内容,但是现有的方法,如基于单模态动作的方法或集成llm的多模态框架都有局限性。这些方法要么依赖于隐含的场景信息,使得难以定位异常发生的位置,要么无法适应监视特定的挑战,如视图变化、细微动作、低光条件和拥挤的场景。因此,这些挑战阻碍了对所发生情况的准确检测。为了克服这些限制,我们的系统利用轻量级场景分类模型的特征来识别事件发生的位置,获取明确的基于位置的上下文。为了确定发生了什么事件,它将重点放在原子行为上,原子行为在这个领域还没有得到充分的探索,它比传统的抽象行为特征更适合解释复杂的异常行为。为了实现鲁棒性异常检测,提出的时间语义关系网络(TSRN)对多模态特征之间的时空关系进行建模,并采用段选择性焦点边缘损失(SFML)来有效解决类不平衡问题,优于传统的基于mil的方法。与现有方法相比,实验结果表明,我们的系统可以显著减少误报,同时在不同场景下保持鲁棒性。对公共数据集的定量和定性评估验证了所提出方法在现实世界监控应用中的实际有效性。
{"title":"Where and What: Contextual Dynamics-aware Anomaly Detection in Surveillance Videos.","authors":"Deok-Hyun Ahn,YongJin Jo,DongBum Kim,Gi-Pyo Nam,Jae-Ho Han,Haksub Kim","doi":"10.1109/tip.2025.3623392","DOIUrl":"https://doi.org/10.1109/tip.2025.3623392","url":null,"abstract":"In surveillance environments, detecting anomalies requires understanding the contextual dynamics of the environment, human behaviors, and movements within a scene. Effective anomaly detection must address both the where and what of events, but existing approaches such as unimodal action-based methods or LLM-integrated multimodal frameworks have limitations. These methods either rely on implicit scene information, making it difficult to localize where anomalies occur, or fail to adapt to surveillance specific challenges such as view changes, subtle actions, low light conditions, and crowded scenes. As a result, these challenges hinder accurate detection of what occurs. To overcome these limitations, our system takes advantage of features from a lightweight scene classification model to discern where an event occurs, acquiring explicit location-based context. To identify what events occur, it focuses on atomic actions, which remain underexplored in this field and are better suited to interpreting intricate abnormal behaviors than conventional abstract action features. To achieve robust anomaly detection, the proposed Temporal-Semantic Relationship Network (TSRN) models spatio-temporal relationships among multimodal features and employs a Segment-selective Focal Margin loss (SFML) to effectively address class imbalance, outperforming conventional MIL-based methods. Compared to existing methods, experimental results demonstrate that our system significantly reduces false alarms while maintaining robustness across diverse scenarios. Quantitative and qualitative evaluations on public datasets validate the practical effectiveness of the proposed method for real-world surveillance applications.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"19 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145357694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BinoHeM: Binocular Singular Hellinger Metametric for Fine-Grained Few-Shot Classification. 双目奇异Hellinger元度量法用于细粒度少弹分类。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-24 DOI: 10.1109/tip.2025.3623379
Chaofei Qi,Chao Ye,Weiyang Lin,Zhitai Liu,Jianbin Qiu
Meta-metric learning has demonstrated strong performance in coarse-grained few-shot situations. However, despite their simplicity and availability, these metametrics are limited in effectively handling fine-grained few-shot scenarios. Fine-Grained Few-Shot Classification (FGFSC) presents significant challenges to the network's ability to extract subtle features. Equipped with the symmetrical binocular perception system and complex neural networks in the brain, humans inherently possess exceptional and resilient meta-learning abilities, facilitating superior management of fine-grained few-shot scenarios. In this paper, inspired by the human binocular visual system, we pioneer the first human-like meta-metric paradigm: Binocular Singular Hellinger Metametric (BinoHeM). Functionally, BinoHeM incorporates advanced symmetric binocular feature encoding and recognition mechanisms. Structurally, it integrates two binocular sensing feature encoders, a singular Hellinger metametric, and two collaborative identification mechanisms. Building on this foundation, we introduce two innovative metametric variants: BinoHeM-KDL and BinoHeM-MTL. These are grounded in two advanced training mechanisms: knowledge distillation learning (KDL) and meta-transfer learning (MTL), respectively. Furthermore, we showcase the high accuracy and robust generalization capabilities of our approaches on four representative FGFSC benchmarks. Extensive comparative and ablation experiments have validated the efficiency and superiority of our paradigm over other state-of-the-art algorithms. Our code is publicly available at: https://github.com/ChaofeiQI/BinoHeM.
元度量学习在粗粒度的少镜头情况下表现出色。然而,尽管这些元度量简单且可用,但它们在有效处理细粒度的少量场景方面受到限制。细粒度少镜头分类(FGFSC)对网络提取细微特征的能力提出了重大挑战。人类拥有对称的双眼感知系统和复杂的大脑神经网络,天生具有超强的弹性元学习能力,有利于对细粒度的小镜头场景进行卓越的管理。在本文中,受人类双目视觉系统的启发,我们开创了第一个类人元度量范式:双目奇异海灵格元度量(BinoHeM)。在功能上,BinoHeM结合了先进的对称双目特征编码和识别机制。在结构上,它集成了两个双目传感特征编码器、一个奇异海灵格超度量和两个协同识别机制。在此基础上,我们介绍了两个创新的超度量变体:BinoHeM-KDL和BinoHeM-MTL。这些都是基于两种先进的训练机制:知识蒸馏学习(KDL)和元迁移学习(MTL)。此外,我们在四个代表性的FGFSC基准上展示了我们的方法的高精度和鲁棒泛化能力。广泛的比较和消融实验已经验证了我们的范式比其他最先进的算法的效率和优越性。我们的代码可以在https://github.com/ChaofeiQI/BinoHeM上公开获得。
{"title":"BinoHeM: Binocular Singular Hellinger Metametric for Fine-Grained Few-Shot Classification.","authors":"Chaofei Qi,Chao Ye,Weiyang Lin,Zhitai Liu,Jianbin Qiu","doi":"10.1109/tip.2025.3623379","DOIUrl":"https://doi.org/10.1109/tip.2025.3623379","url":null,"abstract":"Meta-metric learning has demonstrated strong performance in coarse-grained few-shot situations. However, despite their simplicity and availability, these metametrics are limited in effectively handling fine-grained few-shot scenarios. Fine-Grained Few-Shot Classification (FGFSC) presents significant challenges to the network's ability to extract subtle features. Equipped with the symmetrical binocular perception system and complex neural networks in the brain, humans inherently possess exceptional and resilient meta-learning abilities, facilitating superior management of fine-grained few-shot scenarios. In this paper, inspired by the human binocular visual system, we pioneer the first human-like meta-metric paradigm: Binocular Singular Hellinger Metametric (BinoHeM). Functionally, BinoHeM incorporates advanced symmetric binocular feature encoding and recognition mechanisms. Structurally, it integrates two binocular sensing feature encoders, a singular Hellinger metametric, and two collaborative identification mechanisms. Building on this foundation, we introduce two innovative metametric variants: BinoHeM-KDL and BinoHeM-MTL. These are grounded in two advanced training mechanisms: knowledge distillation learning (KDL) and meta-transfer learning (MTL), respectively. Furthermore, we showcase the high accuracy and robust generalization capabilities of our approaches on four representative FGFSC benchmarks. Extensive comparative and ablation experiments have validated the efficiency and superiority of our paradigm over other state-of-the-art algorithms. Our code is publicly available at: https://github.com/ChaofeiQI/BinoHeM.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"68 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145357699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1