IEEE transactions on pattern analysis and machine intelligence最新文献

英文中文

Self-Supervised High-Order Information Bottleneck Learning of Spiking Neural Network for Robust Event-Based Optical Flow Estimation 自监督高阶信息瓶颈学习的脉冲神经网络鲁棒事件光流估计

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-12-02 DOI: 10.1109/TPAMI.2024.3510627

Shuangming Yang;Bernabé Linares-Barranco;Yuzhu Wu;Badong Chen

Event cameras form a fundamental foundation for visual perception in scenes characterized by high speed and a wide dynamic range. Although deep learning techniques have achieved remarkable success in estimating event-based optical flow, existing methods have not adequately addressed the significance of temporal information in capturing spatiotemporal features. Due to the dynamics of spiking neurons in SNNs, which preserve important information while forgetting redundant information over time, they are expected to outperform analog neural networks (ANNs) with the same architecture and size in sequential regression tasks. In addition, SNNs on neuromorphic hardware achieve advantages of extremely low power consumption. However, present SNN architectures encounter issues related to limited generalization and robustness during training, particularly in noisy scenes. To tackle these problems, this study introduces an innovative spike-based self-supervised learning algorithm known as SeLHIB, which leverages the information bottleneck theory. By utilizing event-based camera inputs, SeLHIB enables robust estimation of optical flow in the presence of noise. To the best of our knowledge, this is the first proposal of a self-supervised information bottleneck learning strategy based on SNNs. Furthermore, we develop spike-based self-supervised algorithms with nonlinear and high-order information bottleneck learning that employs nonlinear and high-order mutual information to enhance the extraction of relevant information and eliminate redundancy. We demonstrate that SeLHIB significantly enhances the generalization ability and robustness of optical flow estimation in various noise conditions. In terms of energy efficiency, SeLHIB achieves 90.44% and 45.70% cut down of energy consumption compared to its counterpart ANN and counterpart SNN models, while attaining 33.78% lower AEE (MVSEC), 5.96% lower RSAT (ECD) and 6.21% lower RSAT (HQF) compared to the counterpart ANN implementations with the same sizes and architectures.

{"title":"Self-Supervised High-Order Information Bottleneck Learning of Spiking Neural Network for Robust Event-Based Optical Flow Estimation","authors":"Shuangming Yang;Bernabé Linares-Barranco;Yuzhu Wu;Badong Chen","doi":"10.1109/TPAMI.2024.3510627","DOIUrl":"10.1109/TPAMI.2024.3510627","url":null,"abstract":"Event cameras form a fundamental foundation for visual perception in scenes characterized by high speed and a wide dynamic range. Although deep learning techniques have achieved remarkable success in estimating event-based optical flow, existing methods have not adequately addressed the significance of temporal information in capturing spatiotemporal features. Due to the dynamics of spiking neurons in SNNs, which preserve important information while forgetting redundant information over time, they are expected to outperform analog neural networks (ANNs) with the same architecture and size in sequential regression tasks. In addition, SNNs on neuromorphic hardware achieve advantages of extremely low power consumption. However, present SNN architectures encounter issues related to limited generalization and robustness during training, particularly in noisy scenes. To tackle these problems, this study introduces an innovative spike-based self-supervised learning algorithm known as SeLHIB, which leverages the information bottleneck theory. By utilizing event-based camera inputs, SeLHIB enables robust estimation of optical flow in the presence of noise. To the best of our knowledge, this is the first proposal of a self-supervised information bottleneck learning strategy based on SNNs. Furthermore, we develop spike-based self-supervised algorithms with nonlinear and high-order information bottleneck learning that employs nonlinear and high-order mutual information to enhance the extraction of relevant information and eliminate redundancy. We demonstrate that SeLHIB significantly enhances the generalization ability and robustness of optical flow estimation in various noise conditions. In terms of energy efficiency, SeLHIB achieves 90.44% and 45.70% cut down of energy consumption compared to its counterpart ANN and counterpart SNN models, while attaining 33.78% lower AEE (MVSEC), 5.96% lower RSAT (ECD) and 6.21% lower RSAT (HQF) compared to the counterpart ANN implementations with the same sizes and architectures.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2280-2297"},"PeriodicalIF":0.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10772601","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142760513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling and Mitigating Generalized Biases of DNNs Through the Intrinsic Dimensions of Perceptual Manifolds 通过感知流形的内在维度揭示和减轻dnn的广义偏差

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-12-02 DOI: 10.1109/TPAMI.2024.3510048

Yanbiao Ma;Licheng Jiao;Fang Liu;Lingling Li;Wenping Ma;Shuyuan Yang;Xu Liu;Puhua Chen

Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. Delving into deeper factors that affect the fairness of DNNs is paramount and serves as the foundation for mitigating model biases. However, current methods are limited in accurately predicting DNN biases, relying solely on the number of training samples and lacking more precise measurement tools. Here, we establish a geometric perspective for analyzing the fairness of DNNs, comprehensively exploring how DNNs internally shape the intrinsic geometric characteristics of datasets—the intrinsic dimensions (IDs) of perceptual manifolds, and the impact of IDs on the fairness of DNNs. Based on multiple findings, we propose Intrinsic Dimension Regularization (IDR), which enhances the fairness and performance of models by promoting the learning of concise and ID-balanced class perceptual manifolds. In various image recognition benchmark tests, IDR significantly mitigates model bias while improving its performance.

引用次数: 0

Unsupervised Global and Local Homography Estimation With Coplanarity-Aware GAN 共平面感知GAN的无监督全局和局部单应估计

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-12-02 DOI: 10.1109/TPAMI.2024.3509614

Shuaicheng Liu;Mingbo Hong;Yuhang Lu;Nianjin Ye;Chunyu Lin;Bing Zeng

Unsupervised methods have received increasing attention in homography learning due to their promising performance and label-free training. However, existing methods do not explicitly consider the plane-induced parallax, making the prediction compromised on multiple planes. In this work, we propose a novel method HomoGAN to guide unsupervised homography estimation to focus on the dominant plane. First, a multi-scale transformer is designed to predict homography from the feature pyramids of input images in a coarse-to-fine fashion. Moreover, we propose an unsupervised GAN to impose coplanarity constraint on the predicted homography, which is realized by using a generator to predict a mask of aligned regions, and then a discriminator to check if two masked feature maps are induced by a single homography. Based on the global homography framework, we extend it to the local mesh-grid homography estimation, namely, MeshHomoGAN, where plane constraints can be enforced on each mesh cell to go beyond a single dominant plane, such that scenes with multiple depth planes can be better aligned. To validate the effectiveness of our method and its components, we conduct extensive experiments on large-scale datasets. Results show that our matching error is 22% lower than previous SOTA methods.

{"title":"Unsupervised Global and Local Homography Estimation With Coplanarity-Aware GAN","authors":"Shuaicheng Liu;Mingbo Hong;Yuhang Lu;Nianjin Ye;Chunyu Lin;Bing Zeng","doi":"10.1109/TPAMI.2024.3509614","DOIUrl":"10.1109/TPAMI.2024.3509614","url":null,"abstract":"Unsupervised methods have received increasing attention in homography learning due to their promising performance and label-free training. However, existing methods do not explicitly consider the plane-induced parallax, making the prediction compromised on multiple planes. In this work, we propose a novel method HomoGAN to guide unsupervised homography estimation to focus on the dominant plane. First, a multi-scale transformer is designed to predict homography from the feature pyramids of input images in a coarse-to-fine fashion. Moreover, we propose an unsupervised GAN to impose coplanarity constraint on the predicted homography, which is realized by using a generator to predict a mask of aligned regions, and then a discriminator to check if two masked feature maps are induced by a single homography. Based on the global homography framework, we extend it to the local mesh-grid homography estimation, namely, MeshHomoGAN, where plane constraints can be enforced on each mesh cell to go beyond a single dominant plane, such that scenes with multiple depth planes can be better aligned. To validate the effectiveness of our method and its components, we conduct extensive experiments on large-scale datasets. Results show that our matching error is 22% lower than previous SOTA methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1863-1876"},"PeriodicalIF":0.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142759900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DiffAct++: Diffusion Action Segmentation diff++：扩散动作分割

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-11-29 DOI: 10.1109/TPAMI.2024.3509434

Daochang Liu;Qiyue Li;Anh-Dung Dinh;Tingting Jiang;Mubarak Shah;Chang Xu

Understanding long-form videos requires precise temporal action segmentation. While existing studies typically employ multi-stage models that follow an iterative refinement process, we present a novel framework based on the denoising diffusion model that retains this core iterative principle. Within this framework, the model iteratively produces action predictions starting with random noise, conditioned on the features of the input video. To effectively capture three key characteristics of human actions, namely the position prior, the boundary ambiguity, and the relational dependency, we propose a cohesive masking strategy for the conditioning features. Moreover, a consistency gradient guidance technique is proposed, which maximizes the similarity between outputs with or without the masking, thereby enriching conditional information during the inference process. Extensive experiments are performed on four datasets, i.e., GTEA, 50Salads, Breakfast, and Assembly101. The results indicate that our proposed method outperforms or is on par with existing state-of-the-art techniques, underscoring the potential of generative approaches for action segmentation.

{"title":"DiffAct++: Diffusion Action Segmentation","authors":"Daochang Liu;Qiyue Li;Anh-Dung Dinh;Tingting Jiang;Mubarak Shah;Chang Xu","doi":"10.1109/TPAMI.2024.3509434","DOIUrl":"10.1109/TPAMI.2024.3509434","url":null,"abstract":"Understanding long-form videos requires precise temporal action segmentation. While existing studies typically employ multi-stage models that follow an iterative refinement process, we present a novel framework based on the denoising diffusion model that retains this core iterative principle. Within this framework, the model iteratively produces action predictions starting with random noise, conditioned on the features of the input video. To effectively capture three key characteristics of human actions, namely the position prior, the boundary ambiguity, and the relational dependency, we propose a cohesive masking strategy for the conditioning features. Moreover, a consistency gradient guidance technique is proposed, which maximizes the similarity between outputs with or without the masking, thereby enriching conditional information during the inference process. Extensive experiments are performed on four datasets, i.e., GTEA, 50Salads, Breakfast, and Assembly101. The results indicate that our proposed method outperforms or is on par with existing state-of-the-art techniques, underscoring the potential of generative approaches for action segmentation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1644-1659"},"PeriodicalIF":0.0,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142752991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ELDP: Enhanced Label Distribution Propagation for Crowdsourcing ELDP：用于众包的增强标签分发传播

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-11-28 DOI: 10.1109/TPAMI.2024.3507774

Wenjun Zhang;Liangxiao Jiang;Chaoqun Li

In crowdsourcing scenarios, we can obtain multiple noisy labels for an instance from crowd workers and then aggregate these labels to infer the unknown true label of this instance. Due to the lack of expertise of workers, obtained labels usually contain a degree of noise. Existing studies usually focus on the crowdsourcing scenarios with low noise ratios but rarely focus on the crowdsourcing scenarios with high noise ratios. In this paper, we focus on the crowdsourcing scenarios with high noise ratios and propose a novel label aggregation algorithm called enhanced label distribution propagation (ELDP). First, ELDP harnesses an internal worker weighting method to estimate the weights of workers and then performs the first label distribution enhancement. Then, for instances not covered in the first enhancement, ELDP performs the second enhancement using a class membership estimation method based on the intra-cluster distance. Finally, ELDP propagates enhanced label distributions from accurately enhanced instances to inaccurately enhanced instances. Experimental results on both simulated and real-world crowdsourced datasets show that ELDP significantly outperforms all the other state-of-the-art label aggregation algorithms.

{"title":"ELDP: Enhanced Label Distribution Propagation for Crowdsourcing","authors":"Wenjun Zhang;Liangxiao Jiang;Chaoqun Li","doi":"10.1109/TPAMI.2024.3507774","DOIUrl":"10.1109/TPAMI.2024.3507774","url":null,"abstract":"In crowdsourcing scenarios, we can obtain multiple noisy labels for an instance from crowd workers and then aggregate these labels to infer the unknown true label of this instance. Due to the lack of expertise of workers, obtained labels usually contain a degree of noise. Existing studies usually focus on the crowdsourcing scenarios with low noise ratios but rarely focus on the crowdsourcing scenarios with high noise ratios. In this paper, we focus on the crowdsourcing scenarios with high noise ratios and propose a novel label aggregation algorithm called enhanced label distribution propagation (ELDP). First, ELDP harnesses an internal worker weighting method to estimate the weights of workers and then performs the first label distribution enhancement. Then, for instances not covered in the first enhancement, ELDP performs the second enhancement using a class membership estimation method based on the intra-cluster distance. Finally, ELDP propagates enhanced label distributions from accurately enhanced instances to inaccurately enhanced instances. Experimental results on both simulated and real-world crowdsourced datasets show that ELDP significantly outperforms all the other state-of-the-art label aggregation algorithms.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1850-1862"},"PeriodicalIF":0.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STAR: A First-Ever Dataset and a Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery STAR：用于大尺寸卫星图像场景图生成的首个数据集和大规模基准

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-11-28 DOI: 10.1109/TPAMI.2024.3508072

Yansheng Li;Linlin Wang;Tingzhu Wang;Xue Yang;Junwei Luo;Qi Wang;Youming Deng;Wenbin Wang;Xian Sun;Haifeng Li;Bo Dang;Yongjun Zhang;Yi Yu;Junchi Yan

Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, there lack such SGG datasets. Due to the complexity of large-size SAI, mining triplets

$< $

subject, relationship, object

$> $

heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size SAI. This paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 × 768 to 27 860 × 31 096 pixels, named STAR (Scene graph generaTion in lArge-size satellite imageRy), encompassing over 210K objects and over 400K triplets. To realize SGG in large-size SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI regarding object detection (OBD), pair pruning and relationship prediction for SGG. We also release a SAI-oriented SGG toolkit with about 30 OBD and 10 SGG methods which need further adaptation by our devised modules on our challenging STAR dataset.

{"title":"STAR: A First-Ever Dataset and a Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery","authors":"Yansheng Li;Linlin Wang;Tingzhu Wang;Xue Yang;Junwei Luo;Qi Wang;Youming Deng;Wenbin Wang;Xian Sun;Haifeng Li;Bo Dang;Yongjun Zhang;Yi Yu;Junchi Yan","doi":"10.1109/TPAMI.2024.3508072","DOIUrl":"10.1109/TPAMI.2024.3508072","url":null,"abstract":"Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, there lack such SGG datasets. Due to the complexity of large-size SAI, mining triplets <italic><inline-formula><tex-math>$< $</tex-math><alternatives><mml:math><mml:mo><</mml:mo></mml:math><inline-graphic></alternatives></inline-formula>subject, relationship, object<inline-formula><tex-math>$> $</tex-math><alternatives><mml:math><mml:mo>></mml:mo></mml:math><inline-graphic></alternatives></inline-formula> heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size SAI. This paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 × 768 to 27 860 × 31 096 pixels, named STAR (Scene graph generaTion in lArge-size satellite imageRy), encompassing over 210K objects and over 400K triplets. To realize SGG in large-size SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI regarding object detection (OBD), pair pruning and relationship prediction for SGG. We also release a SAI-oriented SGG toolkit with about 30 OBD and 10 SGG methods which need further adaptation by our devised modules on our challenging STAR dataset.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1832-1849"},"PeriodicalIF":0.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fair Clustering Ensemble With Equal Cluster Capacity 具有相等簇容量的公平聚类集成

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-11-28 DOI: 10.1109/TPAMI.2024.3507857

Peng Zhou;Rongwen Li;Zhaolong Ling;Liang Du;Xinwang Liu

Clustering ensemble has been widely studied in data mining and machine learning. However, the existing clustering ensemble methods do not pay attention to fairness, which is important in real-world applications, especially in applications involving humans. To address this issue, this paper proposes a novel fair clustering ensemble method, which takes multiple base clustering results as inputs and learns a fair consensus clustering result. When designing the algorithm, we observe that one of the widely used definitions of fairness may cause a cluster imbalance problem. To tackle this problem, we give a new definition of fairness that can simultaneously characterize fairness and cluster capacity equality. Based on this new definition, we design an extremely simple yet effective regularized term to achieve fairness and cluster capacity equality. We plug this regularized term into our clustering ensemble framework, finally leading to our new fair clustering ensemble method. The extensive experiments show that, compared with the state-of-the-art clustering ensemble methods, our method can not only achieve a comparable or even better clustering performance, but also obtain a much fairer and better capacity equality result, which well demonstrates the effectiveness and superiority of our method.

{"title":"Fair Clustering Ensemble With Equal Cluster Capacity","authors":"Peng Zhou;Rongwen Li;Zhaolong Ling;Liang Du;Xinwang Liu","doi":"10.1109/TPAMI.2024.3507857","DOIUrl":"10.1109/TPAMI.2024.3507857","url":null,"abstract":"Clustering ensemble has been widely studied in data mining and machine learning. However, the existing clustering ensemble methods do not pay attention to fairness, which is important in real-world applications, especially in applications involving humans. To address this issue, this paper proposes a novel fair clustering ensemble method, which takes multiple base clustering results as inputs and learns a fair consensus clustering result. When designing the algorithm, we observe that one of the widely used definitions of fairness may cause a cluster imbalance problem. To tackle this problem, we give a new definition of fairness that can simultaneously characterize fairness and cluster capacity equality. Based on this new definition, we design an extremely simple yet effective regularized term to achieve fairness and cluster capacity equality. We plug this regularized term into our clustering ensemble framework, finally leading to our new fair clustering ensemble method. The extensive experiments show that, compared with the state-of-the-art clustering ensemble methods, our method can not only achieve a comparable or even better clustering performance, but also obtain a much fairer and better capacity equality result, which well demonstrates the effectiveness and superiority of our method.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1729-1746"},"PeriodicalIF":0.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression 基于样本自适应稀疏三维锚点回归的三维车道检测

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-11-28 DOI: 10.1109/TPAMI.2024.3508798

Shaofei Huang;Zhenwei Shen;Zehao Huang;Yue Liao;Jizhong Han;Naiyan Wang;Si Liu

In this paper, we focus on the challenging task of monocular 3D lane detection. Previous methods typically adopt inverse perspective mapping (IPM) to transform the Front-Viewed (FV) images or features into the Bird-Eye-Viewed (BEV) space for lane detection. However, IPM's dependence on flat ground assumption and context information loss in BEV representations lead to inaccurate 3D information estimation. Though efforts have been made to bypass BEV and directly predict 3D lanes from FV representations, their performances still fall behind BEV-based methods due to a lack of structured modeling of 3D lanes. In this paper, we propose a novel BEV-free method named Anchor3DLane++ which defines 3D lane anchors as structural representations and makes predictions directly from FV features. We also design a Prototype-based Adaptive Anchor Generation (PAAG) module to generate sample-adaptive sparse 3D anchors dynamically. In addition, an Equal-Width (EW) loss is developed to leverage the parallel property of lanes for regularization. Furthermore, camera-LiDAR fusion is also explored based on Anchor3DLane++ to leverage complementary information. Extensive experiments on three popular 3D lane detection benchmarks show that our Anchor3DLane++ outperforms previous state-of-the-art methods.

{"title":"Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression","authors":"Shaofei Huang;Zhenwei Shen;Zehao Huang;Yue Liao;Jizhong Han;Naiyan Wang;Si Liu","doi":"10.1109/TPAMI.2024.3508798","DOIUrl":"10.1109/TPAMI.2024.3508798","url":null,"abstract":"In this paper, we focus on the challenging task of monocular 3D lane detection. Previous methods typically adopt inverse perspective mapping (IPM) to transform the Front-Viewed (FV) images or features into the Bird-Eye-Viewed (BEV) space for lane detection. However, IPM's dependence on flat ground assumption and context information loss in BEV representations lead to inaccurate 3D information estimation. Though efforts have been made to bypass BEV and directly predict 3D lanes from FV representations, their performances still fall behind BEV-based methods due to a lack of structured modeling of 3D lanes. In this paper, we propose a novel BEV-free method named Anchor3DLane++ which defines 3D lane anchors as structural representations and makes predictions directly from FV features. We also design a Prototype-based Adaptive Anchor Generation (PAAG) module to generate sample-adaptive sparse 3D anchors dynamically. In addition, an Equal-Width (EW) loss is developed to leverage the parallel property of lanes for regularization. Furthermore, camera-LiDAR fusion is also explored based on Anchor3DLane++ to leverage complementary information. Extensive experiments on three popular 3D lane detection benchmarks show that our Anchor3DLane++ outperforms previous state-of-the-art methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1660-1673"},"PeriodicalIF":0.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IBCS: Learning Information Bottleneck-Constrained Denoised Causal Subgraph for Graph Classification 基于学习信息瓶颈约束的去噪因果子图的图分类

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-11-28 DOI: 10.1109/TPAMI.2024.3508766

Ruiwen Yuan;Yongqiang Tang;Yanghao Xiao;Wensheng Zhang

The significant success of graph learning has provoked a meaningful but challenging task of extracting the precise causal subgraphs that can interpret and improve the predictions. Unfortunately, current works merely center on partially eliminating either the spurious or the noisy parts, while overlook the fact that in more practical and general situations, both the spurious and noisy subgraph coexist with the causal one. This brings great challenges and makes previous methods fail to extract the true causal substructure. Unlike existing studies, in this paper, we propose a more reasonable problem formulation that hypothesizes the graph is a mixture of causal, spurious, and noisy subgraphs. With this regard, an Information Bottleneck-constrained denoised Causal Subgraph (IBCS) learning model is developed, which is capable of simultaneously excluding the spurious and noisy parts. Specifically, for the spurious correlation, we design a novel causal learning objective, in which beyond minimizing the empirical risks of causal and spurious subgraph classification, the intervention is further conducted on spurious features to cut off its correlation with the causal part. On this basis, we further impose the information bottleneck constraint to filter out label-irrelevant noise information. Theoretically, we prove that the causal subgraph extracted by our IBCS can approximate the ground-truth. Empirically, extensive evaluations on nine benchmark datasets demonstrate our superiority over state-of-the-art baselines.

{"title":"IBCS: Learning Information Bottleneck-Constrained Denoised Causal Subgraph for Graph Classification","authors":"Ruiwen Yuan;Yongqiang Tang;Yanghao Xiao;Wensheng Zhang","doi":"10.1109/TPAMI.2024.3508766","DOIUrl":"10.1109/TPAMI.2024.3508766","url":null,"abstract":"The significant success of graph learning has provoked a meaningful but challenging task of extracting the precise causal subgraphs that can interpret and improve the predictions. Unfortunately, current works merely center on partially eliminating either the spurious or the noisy parts, while overlook the fact that in more practical and general situations, both the spurious and noisy subgraph coexist with the causal one. This brings great challenges and makes previous methods fail to extract the true causal substructure. Unlike existing studies, in this paper, we propose a more reasonable problem formulation that hypothesizes the graph is a mixture of causal, spurious, and noisy subgraphs. With this regard, an <bold>Information <bold>Bottleneck-constrained denoised <bold>Causal <bold>Subgraph (<bold>IBCS) learning model is developed, which is capable of simultaneously excluding the spurious and noisy parts. Specifically, for the spurious correlation, we design a novel causal learning objective, in which beyond minimizing the empirical risks of causal and spurious subgraph classification, the intervention is further conducted on spurious features to cut off its correlation with the causal part. On this basis, we further impose the information bottleneck constraint to filter out label-irrelevant noise information. Theoretically, we prove that the causal subgraph extracted by our IBCS can approximate the ground-truth. Empirically, extensive evaluations on nine benchmark datasets demonstrate our superiority over state-of-the-art baselines.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1627-1643"},"PeriodicalIF":0.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NAS-PED: Neural Architecture Search for Pedestrian Detection 行人检测的神经结构搜索

IEEE transactions on pattern analysis and machine intelligence

Pub Date : 2024-11-28 DOI: 10.1109/TPAMI.2024.3507918

Yi Tang;Min Liu;Baopu Li;Yaonan Wang;Wanli Ouyang

Pedestrian detection currently suffers from two issues in crowded scenes: occlusion and dense boundary prediction, making it still challenging in complex real-world scenarios. In recent years, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have shown their superiorities in addressing these issues, where ViTs capture global feature dependency to infer occlusion parts and CNNs make accurate dense predictions by local detailed features. Nevertheless, limited by the narrow receptive field, CNNs fail to infer occlusion parts, while ViTs tend to ignore local features that are vital to distinguish different pedestrians in the crowd. Therefore, it is essential to combine the advantages of CNN and ViT for pedestrian detection. However, manually designing a specific CNN and ViT hybrid network requires enormous time and resources for trial and error. To address this issue, we propose the first Neural Architecture Search (NAS) framework specifically designed for pedestrian detection named NAS-PED, which automatically designs an appropriate CNNs and ViTs hybrid backbone for the crowded pedestrian detection task. Specifically, we formulate transformers and convolutions with various kernel sizes in the same format, which provides an unconstrained space for diverse hybrid network search. Furthermore, to search for a suitable backbone, we propose an information bottleneck based NAS objective function, which treats the process of NAS as an information extraction process, preserving relevant information and suppressing redundant information from the dense pedestrians in crowd scenes Extensive experiments on CrowdHuman, CityPersons and EuroCity Persons datasets demonstrate the effectiveness of the proposed method. Our NAS-PED obtains absolute gains of 4.0% MR

$^{-2}$

and 1.9% AP over the state-of-the-art (SOTA) pedestrian detection framework on CrowdHuman datasets. For the CityPersons and EuroCity Persons datasets, the searched backbone achieves stable improvement across all three subsets, outperforming some large language-image pre-trained models.

{"title":"NAS-PED: Neural Architecture Search for Pedestrian Detection","authors":"Yi Tang;Min Liu;Baopu Li;Yaonan Wang;Wanli Ouyang","doi":"10.1109/TPAMI.2024.3507918","DOIUrl":"10.1109/TPAMI.2024.3507918","url":null,"abstract":"Pedestrian detection currently suffers from two issues in crowded scenes: occlusion and dense boundary prediction, making it still challenging in complex real-world scenarios. In recent years, Convolutional Neural Networks (CNN) and Vision Transformers (ViT) have shown their superiorities in addressing these issues, where ViTs capture global feature dependency to infer occlusion parts and CNNs make accurate dense predictions by local detailed features. Nevertheless, limited by the narrow receptive field, CNNs fail to infer occlusion parts, while ViTs tend to ignore local features that are vital to distinguish different pedestrians in the crowd. Therefore, it is essential to combine the advantages of CNN and ViT for pedestrian detection. However, manually designing a specific CNN and ViT hybrid network requires enormous time and resources for trial and error. To address this issue, we propose the first Neural Architecture Search (NAS) framework specifically designed for pedestrian detection named NAS-PED, which automatically designs an appropriate CNNs and ViTs hybrid backbone for the crowded pedestrian detection task. Specifically, we formulate transformers and convolutions with various kernel sizes in the same format, which provides an unconstrained space for diverse hybrid network search. Furthermore, to search for a suitable backbone, we propose an information bottleneck based NAS objective function, which treats the process of NAS as an information extraction process, preserving relevant information and suppressing redundant information from the dense pedestrians in crowd scenes Extensive experiments on CrowdHuman, CityPersons and EuroCity Persons datasets demonstrate the effectiveness of the proposed method. Our NAS-PED obtains absolute gains of 4.0% MR<inline-formula><tex-math>$^{-2}$</tex-math></inline-formula> and 1.9% AP over the state-of-the-art (SOTA) pedestrian detection framework on CrowdHuman datasets. For the CityPersons and EuroCity Persons datasets, the searched backbone achieves stable improvement across all three subsets, outperforming some large language-image pre-trained models.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1800-1817"},"PeriodicalIF":0.0,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE transactions on pattern analysis and machine intelligence

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀