首页 > 最新文献

IEEE Transactions on Image Processing最新文献

英文 中文
HR-SemNet: A High-Resolution Network for Enhanced Small Object Detection With Local Contextual Semantics. HR-SemNet:基于局部上下文语义的高分辨率增强小目标检测网络。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-23 DOI: 10.1109/tip.2026.3654770
Can Peng,Manxin Chao,Ruoyu Li,Zaiqing Chen,Lijun Yun,Yuelong Xia
Using higher-resolution feature maps in the network is an effective approach for detecting small objects. However, high-resolution feature maps face the challenge of lacking semantic information. This has led previous methods to rely on downsampling feature maps, applying large-kernel convolution layers, and then upsampling the feature maps to obtain semantic information. However, these methods have certain limitations: first, large kernel convolutions in deeper layers typically provide significant global semantic information, but our experiments reveal that such prominent semantic information introduces background smear, which in turn leads to overfitting. Second, deep features often contain substantial redundant information, and the features of small objects are either minimal or have disappeared, which causes a degradation in detection performance when directly relying on deep features. To address these issues, we propose a high-resolution network based on local contextual semantics (HR-SemNet). The network is built on the proposed high-resolution backbone (HRB), which replaces the traditional backbone-FPN architecture by focusing all computational resources of large kernel convolutions on highresolution feature layers to capture clearer features of small objects. Additionally, a local context semantic module (LCSM) is employed to extract semantic information from the background, confining the semantic extraction to a local window to avoid interference from large-scale backgrounds and objects. HRSemNet decouples small object semantics from contextual semantics, with HRB and LCSM independently extracting these features. Extensive experiments and comprehensive evaluations on the VisDrone, AI-TOD, and TinyPerson datasets validate the effectiveness of the method. On the VisDrone dataset, which contains a large number of small objects, HR-SemNet improves the mean average precision (mAP) by 4.6%, reduces the computational cost (GFLOPs) by 49.9%, and decreases the parameter count by 94.9%.
在网络中使用高分辨率的特征映射是检测小目标的有效方法。然而,高分辨率特征图面临缺乏语义信息的挑战。这导致以前的方法依赖于对特征映射进行下采样,应用大核卷积层,然后对特征映射进行上采样以获得语义信息。然而,这些方法有一定的局限性:首先,深层的大核卷积通常提供重要的全局语义信息,但我们的实验表明,这种突出的语义信息引入了背景涂抹,从而导致过拟合。其次,深度特征往往包含大量冗余信息,而小目标的特征要么极小,要么已经消失,直接依赖深度特征会导致检测性能下降。为了解决这些问题,我们提出了一个基于本地上下文语义的高分辨率网络(HR-SemNet)。该网络建立在提出的高分辨率骨干(HRB)基础上,取代了传统的骨干- fpn架构,将大核卷积的所有计算资源集中在高分辨率特征层上,以捕获更清晰的小目标特征。此外,采用局部上下文语义模块(local context semantic module, LCSM)从背景中提取语义信息,将语义提取限制在局部窗口内,避免了大规模背景和物体的干扰。HRSemNet将小对象语义与上下文语义解耦,HRB和LCSM独立提取这些特征。在VisDrone、AI-TOD和TinyPerson数据集上进行的大量实验和综合评估验证了该方法的有效性。在包含大量小目标的VisDrone数据集上,HR-SemNet的平均精度(mAP)提高了4.6%,计算成本(GFLOPs)降低了49.9%,参数计数减少了94.9%。
{"title":"HR-SemNet: A High-Resolution Network for Enhanced Small Object Detection With Local Contextual Semantics.","authors":"Can Peng,Manxin Chao,Ruoyu Li,Zaiqing Chen,Lijun Yun,Yuelong Xia","doi":"10.1109/tip.2026.3654770","DOIUrl":"https://doi.org/10.1109/tip.2026.3654770","url":null,"abstract":"Using higher-resolution feature maps in the network is an effective approach for detecting small objects. However, high-resolution feature maps face the challenge of lacking semantic information. This has led previous methods to rely on downsampling feature maps, applying large-kernel convolution layers, and then upsampling the feature maps to obtain semantic information. However, these methods have certain limitations: first, large kernel convolutions in deeper layers typically provide significant global semantic information, but our experiments reveal that such prominent semantic information introduces background smear, which in turn leads to overfitting. Second, deep features often contain substantial redundant information, and the features of small objects are either minimal or have disappeared, which causes a degradation in detection performance when directly relying on deep features. To address these issues, we propose a high-resolution network based on local contextual semantics (HR-SemNet). The network is built on the proposed high-resolution backbone (HRB), which replaces the traditional backbone-FPN architecture by focusing all computational resources of large kernel convolutions on highresolution feature layers to capture clearer features of small objects. Additionally, a local context semantic module (LCSM) is employed to extract semantic information from the background, confining the semantic extraction to a local window to avoid interference from large-scale backgrounds and objects. HRSemNet decouples small object semantics from contextual semantics, with HRB and LCSM independently extracting these features. Extensive experiments and comprehensive evaluations on the VisDrone, AI-TOD, and TinyPerson datasets validate the effectiveness of the method. On the VisDrone dataset, which contains a large number of small objects, HR-SemNet improves the mean average precision (mAP) by 4.6%, reduces the computational cost (GFLOPs) by 49.9%, and decreases the parameter count by 94.9%.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"31 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146034074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unc-SOD: An Uncertainty Learning Framework for Small Object Detection. Unc-SOD:小目标检测的不确定性学习框架。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1109/tip.2026.3654892
Xiang Yuan,Gong Cheng,Jiacheng Cheng,Ruixiang Yao,Junwei Han
Small object detection (SOD) constitutes a notable yet immensely arduous task, stemming from the restricted informative regions inherent in size-limited instances, which further sparks off heightened uncertainty beyond the capacity of current two-stage detectors. Specifically, the intrinsic ambiguity in small objects undermines the prevailing sampling paradigms and may mislead the model to devote futile effort to those unrecognizable targets, while the inconsistency of features utilized for the detection at two stages further exposes the hierarchical uncertainty. In this paper, we develop an Uncertainty learning framework for Small Object Detection, dubbed as Unc-SOD. By incorporating an auxiliary uncertainty branch to conventional Region Proposal Network (RPN), we model the indeterminacy at instance-level which later on serves as a surrogate criterion for sampling, thereby unearthing adequate candidates dynamically based on the varying degrees of uncertainty and facilitating the learning of proposal networks. In parallel, a Perception-and-Interaction strategy is devised to capture rich and discriminative representations, through optimizing the intrinsic properties from the regional features at the original pyramid and the assigned one, in which the perceptual process unfolds in a mutual paradigm. As the seminal attempt to model uncertainty in SOD task, our Unc-SOD yields state-of-the-art performance on two large-scale small object detection benchmarks, SODA-D and SODA-A, and the results on several SOD-oriented datasets including COCO, VisDrone, and Tsinghua-Tencent 100K also exhibit the promotion to baseline detector. This underscores the efficacy of our approach and its superiority over prevailing detectors when dealing with small instances.
小目标检测(SOD)是一项值得注意但非常艰巨的任务,它源于尺寸有限的实例中固有的有限信息区域,这进一步引发了当前两级检测器无法承受的高度不确定性。具体来说,小目标的固有模糊性破坏了主流的采样范式,并可能误导模型对那些无法识别的目标投入无用的努力,而两个阶段用于检测的特征的不一致性进一步暴露了层次的不确定性。在本文中,我们开发了一个用于小目标检测的不确定性学习框架,称为Unc-SOD。通过在传统的区域建议网络(RPN)中加入一个辅助的不确定性分支,我们在实例级对不确定性进行建模,然后将其作为抽样的替代标准,从而根据不同程度的不确定性动态挖掘适当的候选对象,促进建议网络的学习。同时,设计了感知与交互策略,通过优化原始金字塔和指定金字塔的区域特征的内在属性来捕获丰富的判别表征,其中感知过程以相互范式展开。作为对SOD任务不确定性建模的开创性尝试,我们的Unc-SOD在两个大规模小目标检测基准(SODA-D和SODA-A)上产生了最先进的性能,并且在几个面向SOD的数据集(包括COCO, VisDrone和Tsinghua-Tencent 100K)上的结果也显示出向基线检测器的提升。这强调了我们的方法的有效性,以及它在处理小实例时优于现有检测器的优越性。
{"title":"Unc-SOD: An Uncertainty Learning Framework for Small Object Detection.","authors":"Xiang Yuan,Gong Cheng,Jiacheng Cheng,Ruixiang Yao,Junwei Han","doi":"10.1109/tip.2026.3654892","DOIUrl":"https://doi.org/10.1109/tip.2026.3654892","url":null,"abstract":"Small object detection (SOD) constitutes a notable yet immensely arduous task, stemming from the restricted informative regions inherent in size-limited instances, which further sparks off heightened uncertainty beyond the capacity of current two-stage detectors. Specifically, the intrinsic ambiguity in small objects undermines the prevailing sampling paradigms and may mislead the model to devote futile effort to those unrecognizable targets, while the inconsistency of features utilized for the detection at two stages further exposes the hierarchical uncertainty. In this paper, we develop an Uncertainty learning framework for Small Object Detection, dubbed as Unc-SOD. By incorporating an auxiliary uncertainty branch to conventional Region Proposal Network (RPN), we model the indeterminacy at instance-level which later on serves as a surrogate criterion for sampling, thereby unearthing adequate candidates dynamically based on the varying degrees of uncertainty and facilitating the learning of proposal networks. In parallel, a Perception-and-Interaction strategy is devised to capture rich and discriminative representations, through optimizing the intrinsic properties from the regional features at the original pyramid and the assigned one, in which the perceptual process unfolds in a mutual paradigm. As the seminal attempt to model uncertainty in SOD task, our Unc-SOD yields state-of-the-art performance on two large-scale small object detection benchmarks, SODA-D and SODA-A, and the results on several SOD-oriented datasets including COCO, VisDrone, and Tsinghua-Tencent 100K also exhibit the promotion to baseline detector. This underscores the efficacy of our approach and its superiority over prevailing detectors when dealing with small instances.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"1 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Decoupling Networks for Accurate, Efficient, Generalizable, and Robust Video Object Segmentation. 视频解耦网络用于准确、高效、可推广和鲁棒的视频对象分割。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1109/tip.2025.3649360
Jisheng Dang,Huicheng Zheng,Yulan Guo,Jianhuang Lai,Bin Hu,Tat-Seng Chua
object segmentation (VOS) is a fundamental task in video analysis, aiming to accurately recognize and segment objects of interest within video sequences. Conventional methods, relying on memory networks to store single-frame appearance features, face challenges in computational efficiency and capturing dynamic visual information effectively. To address these limitations, we present a Video Decoupling Network (VDN) with a per-clip memory updating mechanism. Our approach is inspired by the dual-stream hypothesis of the human visual cortex and decomposes multiple previous video frames into fundamental elements: scene, motion, and instance. We propose the Unified Prior-based Spatio-temporal Decoupler (UPSD) algorithm, which parses multiple frames into basic elements in a unified manner. UPSD continuously stores elements over time, enabling adaptive integration of different cues based on task requirements. This decomposition mechanism facilitates comprehensive spatial-temporal information capture and rapid updating, leading to notable enhancements in overall VOS performance. Extensive experiments conducted on multiple VOS benchmarks validate the state-of-the-art accuracy, efficiency, generalizability, and robustness of our approach. Remarkably, VDN demonstrates a significant performance improvement and a substantial speed-up compared to previous state-of-the-art methods on multiple VOS benchmarks. It also exhibits excellent generalizability under domain shift and robustness against various noise types.
目标分割(VOS)是视频分析中的一项基本任务,旨在准确识别和分割视频序列中感兴趣的目标。传统方法依靠记忆网络存储单帧外观特征,在计算效率和有效捕获动态视觉信息方面面临挑战。为了解决这些限制,我们提出了一个具有每个片段内存更新机制的视频解耦网络(VDN)。我们的方法受到人类视觉皮层双流假说的启发,并将多个先前的视频帧分解为基本元素:场景、运动和实例。提出了统一先验时空解耦算法(Unified Prior-based spatial -temporal decoupling, UPSD),该算法将多帧图像统一解析为基本元素。随着时间的推移,UPSD持续存储元素,支持基于任务需求的不同线索的自适应集成。这种分解机制促进了全面的时空信息捕获和快速更新,从而显著提高了VOS的整体性能。在多个VOS基准测试上进行的大量实验验证了我们的方法的最先进的准确性、效率、通用性和鲁棒性。值得注意的是,在多个VOS基准测试中,与以前最先进的方法相比,VDN表现出了显著的性能改进和显著的速度提升。它在域移下也表现出良好的泛化性和对各种噪声类型的鲁棒性。
{"title":"Video Decoupling Networks for Accurate, Efficient, Generalizable, and Robust Video Object Segmentation.","authors":"Jisheng Dang,Huicheng Zheng,Yulan Guo,Jianhuang Lai,Bin Hu,Tat-Seng Chua","doi":"10.1109/tip.2025.3649360","DOIUrl":"https://doi.org/10.1109/tip.2025.3649360","url":null,"abstract":"object segmentation (VOS) is a fundamental task in video analysis, aiming to accurately recognize and segment objects of interest within video sequences. Conventional methods, relying on memory networks to store single-frame appearance features, face challenges in computational efficiency and capturing dynamic visual information effectively. To address these limitations, we present a Video Decoupling Network (VDN) with a per-clip memory updating mechanism. Our approach is inspired by the dual-stream hypothesis of the human visual cortex and decomposes multiple previous video frames into fundamental elements: scene, motion, and instance. We propose the Unified Prior-based Spatio-temporal Decoupler (UPSD) algorithm, which parses multiple frames into basic elements in a unified manner. UPSD continuously stores elements over time, enabling adaptive integration of different cues based on task requirements. This decomposition mechanism facilitates comprehensive spatial-temporal information capture and rapid updating, leading to notable enhancements in overall VOS performance. Extensive experiments conducted on multiple VOS benchmarks validate the state-of-the-art accuracy, efficiency, generalizability, and robustness of our approach. Remarkably, VDN demonstrates a significant performance improvement and a substantial speed-up compared to previous state-of-the-art methods on multiple VOS benchmarks. It also exhibits excellent generalizability under domain shift and robustness against various noise types.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"66 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Image Processing publication information IEEE图像处理汇刊信息
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tip.2026.3651208
{"title":"IEEE Transactions on Image Processing publication information","authors":"","doi":"10.1109/tip.2026.3651208","DOIUrl":"https://doi.org/10.1109/tip.2026.3651208","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"58 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios 增强细分任何模型来概括视觉上不显著的场景
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tip.2026.3651951
Guangqian Guo, Pengfei Chen, Yong Guo, Huafeng Chen, Boqiang Zhang, Shan Gao
{"title":"Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios","authors":"Guangqian Guo, Pengfei Chen, Yong Guo, Huafeng Chen, Boqiang Zhang, Shan Gao","doi":"10.1109/tip.2026.3651951","DOIUrl":"https://doi.org/10.1109/tip.2026.3651951","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"26 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual Domain Optimization Algorithm for CBCT Ring Artifact Correction. CBCT环伪影校正的对偶域优化算法。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tip.2026.3652008
Yanwei Qin,Xiaohui Su,Xin Lu,Baodi Yu,Yunsong Zhao,Fanyong Meng
Compared to traditional computed tomography (CT), photon-counting detector (PCD)-based CT provides significant advantages, including enhanced CT image contrast and reduced radiation dose. However, owing to the current immaturity of PCD technology, scanned PCD data often contain stripe artifacts resulting from non-functional or defective detector units, which subsequently introduce ring artifacts in reconstructed CT images. The presence of ring artifact may compromise the accuracy of CT values and even introduce pseudo-structures, thereby reducing the application value of CT images. In this paper, we propose a dual-domain optimization model that takes advantage of the distribution characteristics of stripe artifact in 3D projection data and the prior features of reconstructed 3D CT images. Specifically, we demonstrate that stripe artifact in 3D projection data exhibit both group sparsity and low-rank properties. Building on this observation, we propose a TLT (TV-l2,1- Tucker) model to eliminate ring artifact in PCD-based cone beam CT (CBCT). In addition, an efficient iterative algorithm is designed to solve the proposed model. The effectiveness of both the model and the algorithm is evaluated through simulated and real data experiments. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches.
与传统的计算机断层扫描(CT)相比,基于光子计数检测器(PCD)的CT具有显著的优势,包括增强CT图像对比度和降低辐射剂量。然而,由于目前PCD技术的不成熟,扫描的PCD数据经常包含由于检测器单元不功能或缺陷而导致的条纹伪影,这些伪影随后在重建的CT图像中引入环状伪影。环形伪影的存在可能会影响CT值的准确性,甚至引入伪结构,从而降低CT图像的应用价值。本文提出了一种利用三维投影数据中条纹伪影的分布特征和三维CT重建图像的先验特征的双域优化模型。具体来说,我们证明了三维投影数据中的条纹伪影具有群稀疏性和低秩性。基于这一观察结果,我们提出了一种TLT (TV-l2,1- Tucker)模型来消除基于pcd的锥束CT (CBCT)中的环伪影。此外,还设计了一种高效的迭代算法来求解所提出的模型。通过仿真和实际数据实验,对模型和算法的有效性进行了评价。实验结果表明,该方法优于现有的最先进的方法。
{"title":"Dual Domain Optimization Algorithm for CBCT Ring Artifact Correction.","authors":"Yanwei Qin,Xiaohui Su,Xin Lu,Baodi Yu,Yunsong Zhao,Fanyong Meng","doi":"10.1109/tip.2026.3652008","DOIUrl":"https://doi.org/10.1109/tip.2026.3652008","url":null,"abstract":"Compared to traditional computed tomography (CT), photon-counting detector (PCD)-based CT provides significant advantages, including enhanced CT image contrast and reduced radiation dose. However, owing to the current immaturity of PCD technology, scanned PCD data often contain stripe artifacts resulting from non-functional or defective detector units, which subsequently introduce ring artifacts in reconstructed CT images. The presence of ring artifact may compromise the accuracy of CT values and even introduce pseudo-structures, thereby reducing the application value of CT images. In this paper, we propose a dual-domain optimization model that takes advantage of the distribution characteristics of stripe artifact in 3D projection data and the prior features of reconstructed 3D CT images. Specifically, we demonstrate that stripe artifact in 3D projection data exhibit both group sparsity and low-rank properties. Building on this observation, we propose a TLT (TV-l2,1- Tucker) model to eliminate ring artifact in PCD-based cone beam CT (CBCT). In addition, an efficient iterative algorithm is designed to solve the proposed model. The effectiveness of both the model and the algorithm is evaluated through simulated and real data experiments. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"81 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BELE: Blur Equivalent Linearized Estimator 模糊等效线性化估计器
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651959
Paolo Giannitrapani, Elio D. Di Claudio, Giovanni Jacovitti
{"title":"BELE: Blur Equivalent Linearized Estimator","authors":"Paolo Giannitrapani, Elio D. Di Claudio, Giovanni Jacovitti","doi":"10.1109/tip.2026.3651959","DOIUrl":"https://doi.org/10.1109/tip.2026.3651959","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning 面向持续学习的视觉语言模型的多阶段知识集成
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3652014
Hongsheng Zhang, Zhong Ji, Jingren Liu, Yanwei Pang, Jungong Han
{"title":"Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning","authors":"Hongsheng Zhang, Zhong Ji, Jingren Liu, Yanwei Pang, Jungong Han","doi":"10.1109/tip.2026.3652014","DOIUrl":"https://doi.org/10.1109/tip.2026.3652014","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"12378 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Feature Encoding with Background Perturbation Learning for Ultra-Fine-Grained Visual Categorization. 基于背景扰动学习的渐进式特征编码用于超细粒度视觉分类。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651956
Xin Jiang,Ziye Fang,Fei Shen,Junyao Gao,Zechao Li
Ultra-Fine-Grained Visual Categorization (Ultra-FGVC) aims to classify objects into sub-granular categories, presenting the challenge of distinguishing visually similar objects with limited data. Existing methods primarily address sample scarcity but often overlook the importance of leveraging intrinsic object features to construct highly discriminative representations. This limitation significantly constrains their effectiveness in Ultra-FGVC tasks. To address these challenges, we propose SV-Transformer that progressively encodes object features while incorporating background perturbation modeling to generate robust and discriminative representations. At the core of our approach is a progressive feature encoder, which hierarchically extracts global semantic structures and local discriminative details from backbone-generated representations. This design enhances inter-class separability while ensuring resilience to intra-class variations. Furthermore, our background perturbation learning mechanism introduces controlled variations in the feature space, effectively mitigating the impact of sample limitations and improving the model's capacity to capture fine-grained distinctions. Comprehensive experiments demonstrate that SV-Transformer achieves state-of-the-art performance on benchmark Ultra-FGVC datasets, showcasing its efficacy in addressing the challenges of Ultra-FGVC task.
超细粒度视觉分类(ultra -细粒度Visual Categorization, Ultra-FGVC)旨在将物体分类为亚颗粒类别,这给在有限数据下区分视觉上相似的物体带来了挑战。现有方法主要解决样本稀缺性问题,但往往忽视了利用内在对象特征构建高度判别表征的重要性。这一限制极大地限制了它们在Ultra-FGVC任务中的有效性。为了解决这些挑战,我们提出了SV-Transformer,它在结合背景扰动建模的同时逐步编码对象特征,以生成鲁棒和鉴别表示。我们方法的核心是一个渐进式特征编码器,它分层地从主干生成的表示中提取全局语义结构和局部判别细节。这种设计增强了类间的可分离性,同时确保了对类内变化的弹性。此外,我们的背景扰动学习机制在特征空间中引入了可控的变化,有效地减轻了样本限制的影响,提高了模型捕捉细粒度差异的能力。综合实验表明,SV-Transformer在基准Ultra-FGVC数据集上实现了最先进的性能,展示了其在解决Ultra-FGVC任务挑战方面的有效性。
{"title":"Progressive Feature Encoding with Background Perturbation Learning for Ultra-Fine-Grained Visual Categorization.","authors":"Xin Jiang,Ziye Fang,Fei Shen,Junyao Gao,Zechao Li","doi":"10.1109/tip.2026.3651956","DOIUrl":"https://doi.org/10.1109/tip.2026.3651956","url":null,"abstract":"Ultra-Fine-Grained Visual Categorization (Ultra-FGVC) aims to classify objects into sub-granular categories, presenting the challenge of distinguishing visually similar objects with limited data. Existing methods primarily address sample scarcity but often overlook the importance of leveraging intrinsic object features to construct highly discriminative representations. This limitation significantly constrains their effectiveness in Ultra-FGVC tasks. To address these challenges, we propose SV-Transformer that progressively encodes object features while incorporating background perturbation modeling to generate robust and discriminative representations. At the core of our approach is a progressive feature encoder, which hierarchically extracts global semantic structures and local discriminative details from backbone-generated representations. This design enhances inter-class separability while ensuring resilience to intra-class variations. Furthermore, our background perturbation learning mechanism introduces controlled variations in the feature space, effectively mitigating the impact of sample limitations and improving the model's capacity to capture fine-grained distinctions. Comprehensive experiments demonstrate that SV-Transformer achieves state-of-the-art performance on benchmark Ultra-FGVC datasets, showcasing its efficacy in addressing the challenges of Ultra-FGVC task.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"391 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COSOS-1k: A Benchmark Dataset and Occlusion-aware Uncertainty Learning for Multi-view Video Object Detection. COSOS-1k:用于多视点视频目标检测的基准数据集和闭塞感知不确定性学习。
IF 10.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tip.2026.3651950
Wenjie Yang,Yueying Kao,Tong Liu,Yuanlong Yu,Kaiqi Huang
Confined spaces refer to partially or fully enclosed areas, e.g., sewage wells, where working conditions pose significant risks to the workers. The evaluation of COfined Space Operational Safety (COSOS) refers to verifying whether workers are properly equipped with safety equipment before entering a confined space, which is crucial for protecting their safety and health. Due to the crowded nature of such environments and the small size of certain safety equipment, existing methods face significant challenges. Moreover, there is a lack of dedicated datasets to support research in this domain. In this paper, in order to advance research in this challenging task, we present COSOS-1k, an extensive dataset constructed from diverse confined space scenarios. It comprises multi-view videos for each scenario, covers 10 essential safety protective equipments and 6 attributes of worker, and is annotated with expressive object locations, fine-grained attributes, and occlusion status. The COSOS-1k is the first dataset known to date, tailored explicitly for the real-world COSOS scenarios. In addition, we address the challenge of occlusion from three perspectives: instance, video, and view. Firstly, at the instance level, we propose Occlusion-aware Uncertainty Estimation (OUE) method, which leverages box-level occlusion annotations to enable part-level occlusion prediction for objects. Secondly, at the video level, we introduce Cross-Frame Cluster (CFC) attention, which integrates temporal context features from the same object category to mitigate the impact of occlusions in the current frame. Finally, we extend CFC to the view level and form Cross-View Cluster (CVC) attention, where complementary information is mined from another view. Extensive experiments demonstrate the effectiveness of the proposed methods and provide insights into the importance of dataset diversity and expressivity. The COSOS-1k dataset and code are available at https://github.com/deepalchemist/cosos-1k.
密闭空间是指部分或完全封闭的区域,例如污水井,在那里工作条件对工人构成重大危险。密闭空间作业安全评价是指在进入密闭空间之前,核实工作人员是否配备了适当的安全设备,这对保护工作人员的安全和健康至关重要。由于这种环境的拥挤性质和某些安全设备的小尺寸,现有的方法面临重大挑战。此外,缺乏专门的数据集来支持这一领域的研究。在本文中,为了推进这一具有挑战性的任务的研究,我们提出了COSOS-1k,这是一个由不同密闭空间场景构建的广泛数据集。它包含每个场景的多视图视频,涵盖了10个基本安全防护装备和工人的6个属性,并标注了具有表达性的对象位置、细粒度属性和遮挡状态。COSOS-1k是迄今为止已知的第一个数据集,专门为现实世界的COSOS场景量身定制。此外,我们从实例、视频和视图三个角度解决了遮挡的挑战。首先,在实例级,我们提出了闭塞感知的不确定性估计(OUE)方法,该方法利用盒级闭塞注释实现对物体的局部闭塞预测。其次,在视频级别,我们引入了跨帧聚类(Cross-Frame Cluster, CFC)注意力,它集成了来自同一对象类别的时间上下文特征,以减轻当前帧中遮挡的影响。最后,我们将CFC扩展到视图级别,形成跨视图聚类(Cross-View Cluster, CVC)注意,其中从另一个视图中挖掘互补信息。大量的实验证明了所提出方法的有效性,并提供了对数据集多样性和表达性重要性的见解。COSOS-1k数据集和代码可在https://github.com/deepalchemist/cosos-1k上获得。
{"title":"COSOS-1k: A Benchmark Dataset and Occlusion-aware Uncertainty Learning for Multi-view Video Object Detection.","authors":"Wenjie Yang,Yueying Kao,Tong Liu,Yuanlong Yu,Kaiqi Huang","doi":"10.1109/tip.2026.3651950","DOIUrl":"https://doi.org/10.1109/tip.2026.3651950","url":null,"abstract":"Confined spaces refer to partially or fully enclosed areas, e.g., sewage wells, where working conditions pose significant risks to the workers. The evaluation of COfined Space Operational Safety (COSOS) refers to verifying whether workers are properly equipped with safety equipment before entering a confined space, which is crucial for protecting their safety and health. Due to the crowded nature of such environments and the small size of certain safety equipment, existing methods face significant challenges. Moreover, there is a lack of dedicated datasets to support research in this domain. In this paper, in order to advance research in this challenging task, we present COSOS-1k, an extensive dataset constructed from diverse confined space scenarios. It comprises multi-view videos for each scenario, covers 10 essential safety protective equipments and 6 attributes of worker, and is annotated with expressive object locations, fine-grained attributes, and occlusion status. The COSOS-1k is the first dataset known to date, tailored explicitly for the real-world COSOS scenarios. In addition, we address the challenge of occlusion from three perspectives: instance, video, and view. Firstly, at the instance level, we propose Occlusion-aware Uncertainty Estimation (OUE) method, which leverages box-level occlusion annotations to enable part-level occlusion prediction for objects. Secondly, at the video level, we introduce Cross-Frame Cluster (CFC) attention, which integrates temporal context features from the same object category to mitigate the impact of occlusions in the current frame. Finally, we extend CFC to the view level and form Cross-View Cluster (CVC) attention, where complementary information is mined from another view. Extensive experiments demonstrate the effectiveness of the proposed methods and provide insights into the importance of dataset diversity and expressivity. The COSOS-1k dataset and code are available at https://github.com/deepalchemist/cosos-1k.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"26 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Image Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1