首页 > 最新文献

IEEE transactions on pattern analysis and machine intelligence最新文献

英文 中文
A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations. 深度神经网络剪枝调查:分类、比较、分析和建议。
Pub Date : 2024-08-21 DOI: 10.1109/TPAMI.2024.3447085
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi

Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources. To enable the deployment of modern models on resource-constrained environments and to accelerate inference time, researchers have increasingly explored pruning techniques as a popular research direction in neural network compression. More than three thousand pruning papers have been published from 2020 to 2024. However, there is a dearth of up-to-date comprehensive review papers on pruning. To address this issue, in this survey, we provide a comprehensive review of existing research works on deep neural network pruning in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to prune, and 4) fusion of pruning and other compression techniques. We then provide a thorough comparative analysis of eight pairs of contrast settings for pruning (e.g., unstructured/structured, one-shot/iterative, data-free/data-driven, initialized/pre-trained weights, etc.) and explore several emerging topics, including pruning for large language models, vision transformers, diffusion models, and large multimodal models, post-training pruning, and different levels of supervision for pruning to shed light on the commonalities and differences of existing methods and lay the foundation for further method development. Finally, we provide some valuable recommendations on selecting pruning methods and prospect several promising research directions for neural network pruning. To facilitate future research on deep neural network pruning, we summarize broad pruning applications (e.g., adversarial robustness, natural language understanding, etc.) and build a curated collection of datasets, networks, and evaluations on different applications. We maintain a repository on https://github.com/hrcheng1066/awesome-pruning that serves as a comprehensive resource for neural network pruning papers and corresponding open-source codes. We will keep updating this repository to include the latest advancements in the field.

现代深度神经网络,尤其是最新的大型语言模型,具有庞大的模型规模,需要大量的计算和存储资源。为了在资源受限的环境中部署现代模型并加快推理时间,研究人员越来越多地探索剪枝技术,将其作为神经网络压缩的一个热门研究方向。从 2020 年到 2024 年,已经发表了三千多篇剪枝论文。然而,关于剪枝的最新综合综述论文却十分匮乏。为解决这一问题,我们在本调查报告中全面回顾了现有的深度神经网络剪枝研究工作,并从以下几个方面进行了分类:1)通用/特定加速;2)何时剪枝;3)如何剪枝;4)剪枝与其他压缩技术的融合。然后,我们对剪枝的八对对比设置(如非结构化/结构化、单次/迭代、无数据/数据驱动、初始化/预训练权重等)进行了深入的对比分析,并探讨了几个新出现的主题,包括大型语言模型、视觉变换器、扩散模型和大型多模态模型的剪枝、后训练剪枝以及剪枝的不同监督水平,以揭示现有方法的共性和差异,为进一步的方法开发奠定基础。最后,我们就剪枝方法的选择提出了一些有价值的建议,并展望了神经网络剪枝的几个有前景的研究方向。为了促进未来对深度神经网络剪枝的研究,我们总结了剪枝的广泛应用(如对抗鲁棒性、自然语言理解等),并建立了一个数据集、网络和不同应用评估的集合。我们在 https://github.com/hrcheng1066/awesome-pruning 上维护了一个资源库,作为神经网络剪枝论文和相应开源代码的综合资源。我们将不断更新该资源库,以纳入该领域的最新进展。
{"title":"A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations.","authors":"Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi","doi":"10.1109/TPAMI.2024.3447085","DOIUrl":"10.1109/TPAMI.2024.3447085","url":null,"abstract":"<p><p>Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources. To enable the deployment of modern models on resource-constrained environments and to accelerate inference time, researchers have increasingly explored pruning techniques as a popular research direction in neural network compression. More than three thousand pruning papers have been published from 2020 to 2024. However, there is a dearth of up-to-date comprehensive review papers on pruning. To address this issue, in this survey, we provide a comprehensive review of existing research works on deep neural network pruning in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to prune, and 4) fusion of pruning and other compression techniques. We then provide a thorough comparative analysis of eight pairs of contrast settings for pruning (e.g., unstructured/structured, one-shot/iterative, data-free/data-driven, initialized/pre-trained weights, etc.) and explore several emerging topics, including pruning for large language models, vision transformers, diffusion models, and large multimodal models, post-training pruning, and different levels of supervision for pruning to shed light on the commonalities and differences of existing methods and lay the foundation for further method development. Finally, we provide some valuable recommendations on selecting pruning methods and prospect several promising research directions for neural network pruning. To facilitate future research on deep neural network pruning, we summarize broad pruning applications (e.g., adversarial robustness, natural language understanding, etc.) and build a curated collection of datasets, networks, and evaluations on different applications. We maintain a repository on https://github.com/hrcheng1066/awesome-pruning that serves as a comprehensive resource for neural network pruning papers and corresponding open-source codes. We will keep updating this repository to include the latest advancements in the field.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel and Effective Method to Directly Solve Spectral Clustering. 直接解决频谱聚类问题的新颖有效方法
Pub Date : 2024-08-21 DOI: 10.1109/TPAMI.2024.3447287
Feiping Nie, Chaodie Liu, Rong Wang, Xuelong Li

Spectral clustering has been attracting increasing attention due to its well-defined framework and excellent performance. However, most traditional spectral clustering methods consist of two separate steps: 1) Solving a relaxed optimization problem to learn the continuous clustering labels, and 2) Rounding the continuous clustering labels into discrete ones. The clustering results of the relax-and-discretize strategy inevitably result in information loss and unsatisfactory clustering performance. Moreover, the similarity matrix constructed from original data may not be optimal for clustering since data usually have noise and redundancy. To address these problems, we propose a novel and effective algorithm to directly optimize the original spectral clustering model, called Direct Spectral Clustering (DSC). We theoretically prove that the original spectral clustering model can be solved by simultaneously learning a weighted discrete indicator matrix and a structured similarity matrix whose connected components are equal to the number of clusters. Both of them can be used to directly obtain the final clustering results without any post-processing. Further, an effective iterative optimization algorithm is exploited to solve the proposed method. Extensive experiments performed on synthetic and real-world datasets demonstrate the superiority and effectiveness of the proposed method compared to the state-of-the-art algorithms.

光谱聚类因其定义明确的框架和出色的性能而受到越来越多的关注。然而,大多数传统的光谱聚类方法都包含两个独立的步骤:1) 解决松弛优化问题以学习连续聚类标签,以及 2) 将连续聚类标签舍入为离散标签。松弛-离散策略的聚类结果不可避免地会造成信息损失,聚类效果也不尽如人意。此外,由于数据通常存在噪声和冗余,根据原始数据构建的相似性矩阵可能不是最佳的聚类矩阵。为了解决这些问题,我们提出了一种直接优化原始光谱聚类模型的新颖而有效的算法,称为直接光谱聚类(DSC)。我们从理论上证明,原始光谱聚类模型可以通过同时学习加权离散指标矩阵和结构化相似性矩阵来解决。这两种方法都可以用来直接获得最终的聚类结果,而无需任何后处理。此外,该方法还采用了一种有效的迭代优化算法。在合成数据集和实际数据集上进行的大量实验证明,与最先进的算法相比,所提出的方法具有优越性和有效性。
{"title":"A Novel and Effective Method to Directly Solve Spectral Clustering.","authors":"Feiping Nie, Chaodie Liu, Rong Wang, Xuelong Li","doi":"10.1109/TPAMI.2024.3447287","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3447287","url":null,"abstract":"<p><p>Spectral clustering has been attracting increasing attention due to its well-defined framework and excellent performance. However, most traditional spectral clustering methods consist of two separate steps: 1) Solving a relaxed optimization problem to learn the continuous clustering labels, and 2) Rounding the continuous clustering labels into discrete ones. The clustering results of the relax-and-discretize strategy inevitably result in information loss and unsatisfactory clustering performance. Moreover, the similarity matrix constructed from original data may not be optimal for clustering since data usually have noise and redundancy. To address these problems, we propose a novel and effective algorithm to directly optimize the original spectral clustering model, called Direct Spectral Clustering (DSC). We theoretically prove that the original spectral clustering model can be solved by simultaneously learning a weighted discrete indicator matrix and a structured similarity matrix whose connected components are equal to the number of clusters. Both of them can be used to directly obtain the final clustering results without any post-processing. Further, an effective iterative optimization algorithm is exploited to solve the proposed method. Extensive experiments performed on synthetic and real-world datasets demonstrate the superiority and effectiveness of the proposed method compared to the state-of-the-art algorithms.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CO-Net++: A Cohesive Network for Multiple Point Cloud Tasks at Once with Two-Stage Feature Rectification. CO-Net++:一次完成多个点云任务的内聚网络,带两阶段特征校正。
Pub Date : 2024-08-21 DOI: 10.1109/TPAMI.2024.3447008
Tao Xie, Kun Dai, Qihao Sun, Zhiqiang Jiang, Chuqing Cao, Lijun Zhao, Ke Wang, Ruifeng Li

We present CO-Net++, a cohesive framework that optimizes multiple point cloud tasks collectively across heterogeneous dataset domains with a two-stage feature rectification strategy. The core of CO-Net++ lies in optimizing task-shared parameters to capture universal features across various tasks while discerning task-specific parameters tailored to encapsulate the unique characteristics of each task. Specifically, CO-Net++ develops a two-stage feature rectification strategy (TFRS) that distinctly separates the optimization processes for task-shared and task-specific parameters. At the first stage, TFRS configures all parameters in backbone as task-shared, which encourages CO-Net++ to thoroughly assimilate universal attributes pertinent to all tasks. In addition, TFRS introduces a sign-based gradient surgery to facilitate the optimization of task-shared parameters, thus alleviating conflicting gradients induced by various dataset domains. In the second stage, TFRS freezes task-shared parameters and flexibly integrates task-specific parameters into the network for encoding specific characteristics of each dataset domain. CO-Net++ prominently mitigates conflicting optimization caused by parameter entanglement, ensuring the sufficient identification of universal and specific features. Extensive experiments reveal that CO-Net++ realizes exceptional performances on both 3D object detection and 3D semantic segmentation tasks. Moreover, CO-Net++ delivers an impressive incremental learning capability and prevents catastrophic amnesia when generalizing to new point cloud tasks.

我们提出了 CO-Net++,这是一个内聚性框架,采用两阶段特征校正策略,在异构数据集领域对多个点云任务进行集体优化。CO-Net++ 的核心在于优化任务共享参数,以捕捉不同任务的通用特征,同时辨别特定任务参数,以概括每个任务的独特特征。具体来说,CO-Net++ 开发了一种两阶段特征修正策略(TFRS),将任务共享参数和任务特定参数的优化过程截然分开。在第一阶段,TFRS 将骨干网中的所有参数配置为任务共享参数,从而鼓励 CO-Net++ 彻底吸收与所有任务相关的通用属性。此外,TFRS 还引入了基于符号的梯度手术,以促进任务共享参数的优化,从而缓解不同数据集域引起的梯度冲突。在第二阶段,TFRS 会冻结任务共享参数,并灵活地将特定任务参数整合到网络中,以编码每个数据集域的具体特征。CO-Net++ 显著缓解了因参数纠缠而产生的优化冲突,确保了通用特征和特定特征的充分识别。广泛的实验表明,CO-Net++ 在三维物体检测和三维语义分割任务中均表现出色。此外,CO-Net++ 还具有令人印象深刻的增量学习能力,在推广到新的点云任务时可防止灾难性失忆。
{"title":"CO-Net++: A Cohesive Network for Multiple Point Cloud Tasks at Once with Two-Stage Feature Rectification.","authors":"Tao Xie, Kun Dai, Qihao Sun, Zhiqiang Jiang, Chuqing Cao, Lijun Zhao, Ke Wang, Ruifeng Li","doi":"10.1109/TPAMI.2024.3447008","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3447008","url":null,"abstract":"<p><p>We present CO-Net++, a cohesive framework that optimizes multiple point cloud tasks collectively across heterogeneous dataset domains with a two-stage feature rectification strategy. The core of CO-Net++ lies in optimizing task-shared parameters to capture universal features across various tasks while discerning task-specific parameters tailored to encapsulate the unique characteristics of each task. Specifically, CO-Net++ develops a two-stage feature rectification strategy (TFRS) that distinctly separates the optimization processes for task-shared and task-specific parameters. At the first stage, TFRS configures all parameters in backbone as task-shared, which encourages CO-Net++ to thoroughly assimilate universal attributes pertinent to all tasks. In addition, TFRS introduces a sign-based gradient surgery to facilitate the optimization of task-shared parameters, thus alleviating conflicting gradients induced by various dataset domains. In the second stage, TFRS freezes task-shared parameters and flexibly integrates task-specific parameters into the network for encoding specific characteristics of each dataset domain. CO-Net++ prominently mitigates conflicting optimization caused by parameter entanglement, ensuring the sufficient identification of universal and specific features. Extensive experiments reveal that CO-Net++ realizes exceptional performances on both 3D object detection and 3D semantic segmentation tasks. Moreover, CO-Net++ delivers an impressive incremental learning capability and prevents catastrophic amnesia when generalizing to new point cloud tasks.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Q-BENCH: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs. Q-BENCH:从单一图像到成对图像的低级视觉多模式基础模型基准。
Pub Date : 2024-08-21 DOI: 10.1109/TPAMI.2024.3445770
Zicheng Zhang, Haoning Wu, Erli Zhang, Guangtao Zhai, Weisi Lin

The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in low-level visual perception and understanding remains a yet-to-explore domain. To this end, we design benchmark settings to emulate human language responses related to low-level vision: the low-level visual perception (A1) via visual question answering related to low-level attributes (e.g. clarity, lighting); and the low-level visual description (A2), on evaluating MLLMs for low-level text descriptions. Furthermore, given that pairwise comparison can better avoid ambiguity of responses and has been adopted by many human experiments, we further extend the low-level perception-related questionanswering and description evaluations of MLLMs from single images to image pairs. Specifically, for perception (A1), we carry out the LLVisionQA+ dataset, comprising 2,990 single images and 1,999 image pairs each accompanied by an open-ended question about its low-level features; for description (A2), we propose the LLDescribe+ dataset, evaluating MLLMs for low-level descriptions on 499 single images and 450 pairs. Additionally, we evaluate MLLMs on assessment (A3) ability, i.e. predicting score, by employing a softmax-based approach to enable all MLLMs to generate quantifiable quality ratings, tested against human opinions in 7 image quality assessment (IQA) datasets. With 24 MLLMs under evaluation, we demonstrate that several MLLMs have decent low-level visual competencies on single images, but only GPT-4V exhibits higher accuracy on pairwise comparisons than single image evaluations (like humans). We hope that our benchmark will motivate further research into uncovering and enhancing these nascent capabilities of MLLMs. Datasets will be available at https://github.com/Q-Future/Q-Bench.

多模态大语言模型(MLLMs)的快速发展引领了计算机视觉领域的范式转变,使其朝着多功能基础模型的方向发展。然而,在低级视觉感知和理解方面评估 MLLM 仍然是一个有待探索的领域。为此,我们设计了基准设置来模拟与低级视觉相关的人类语言反应:低级视觉感知(A1),通过与低级属性(如清晰度、照明)相关的视觉问题解答;以及低级视觉描述(A2),用于评估低级文本描述的 MLLM。此外,鉴于成对比较可以更好地避免回答的模糊性,并且已被许多人类实验所采用,我们进一步将 MLLM 的低层次感知相关问题解答和描述评估从单一图像扩展到图像对。具体来说,在感知(A1)方面,我们使用了 LLVisionQA+ 数据集,其中包括 2,990 张单张图像和 1,999 对图像,每张图像都附有一个关于其底层特征的开放式问题;在描述(A2)方面,我们提出了 LLDescribe+ 数据集,在 499 张单张图像和 450 对图像上评估了用于底层描述的 MLLM。此外,我们还评估了 MLLM 的评估(A3)能力,即预测得分,通过采用基于 softmax 的方法,使所有 MLLM 都能生成可量化的质量评级,并在 7 个图像质量评估(IQA)数据集中根据人类意见进行测试。通过对 24 种 MLLM 的评估,我们证明了几种 MLLM 在单幅图像上都具有不错的低级视觉能力,但只有 GPT-4V 在成对比较上比单幅图像评估(如人类)表现出更高的准确性。我们希望我们的基准能激励人们进一步研究如何发掘和提高 MLLM 的这些新生能力。数据集将发布在 https://github.com/Q-Future/Q-Bench 网站上。
{"title":"Q-BENCH: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs.","authors":"Zicheng Zhang, Haoning Wu, Erli Zhang, Guangtao Zhai, Weisi Lin","doi":"10.1109/TPAMI.2024.3445770","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3445770","url":null,"abstract":"<p><p>The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in low-level visual perception and understanding remains a yet-to-explore domain. To this end, we design benchmark settings to emulate human language responses related to low-level vision: the low-level visual perception (A1) via visual question answering related to low-level attributes (e.g. clarity, lighting); and the low-level visual description (A2), on evaluating MLLMs for low-level text descriptions. Furthermore, given that pairwise comparison can better avoid ambiguity of responses and has been adopted by many human experiments, we further extend the low-level perception-related questionanswering and description evaluations of MLLMs from single images to image pairs. Specifically, for perception (A1), we carry out the LLVisionQA+ dataset, comprising 2,990 single images and 1,999 image pairs each accompanied by an open-ended question about its low-level features; for description (A2), we propose the LLDescribe+ dataset, evaluating MLLMs for low-level descriptions on 499 single images and 450 pairs. Additionally, we evaluate MLLMs on assessment (A3) ability, i.e. predicting score, by employing a softmax-based approach to enable all MLLMs to generate quantifiable quality ratings, tested against human opinions in 7 image quality assessment (IQA) datasets. With 24 MLLMs under evaluation, we demonstrate that several MLLMs have decent low-level visual competencies on single images, but only GPT-4V exhibits higher accuracy on pairwise comparisons than single image evaluations (like humans). We hope that our benchmark will motivate further research into uncovering and enhancing these nascent capabilities of MLLMs. Datasets will be available at https://github.com/Q-Future/Q-Bench.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tensorized and Compressed Multi-view Subspace Clustering via Structured Constraint. 通过结构化约束进行张量和压缩多视角子空间聚类
Pub Date : 2024-08-20 DOI: 10.1109/TPAMI.2024.3446537
Wei Chang, Huimin Chen, Feiping Nie, Rong Wang, Xuelong Li

Multi-view learning has raised more and more attention in recent years. However, traditional approaches only focus on the difference while ignoring the consistency among views. It may make some views, with the situation of data abnormality or noise, ineffective in the progress of view learning. Besides, the current datasets have become high-dimensional and large-scale gradually. Therefore, this paper proposes a novel multi-view compressed subspace learning method via low-rank tensor constraint, which incorporates the clustering progress and multi-view learning into a unified framework. First, for each view, we take the partial samples to build a small-size dictionary, which can reduce the effect of both redundancy information and computation cost greatly. Then, to find the consistency and difference among views, we impose a low-rank tensor constraint on these representations and further design an auto-weighted mechanism to learn the optimal representation. Last, due to the non-square of the learned representation, the bipartite graph has been introduced, and under the structured constraint, the clustering results can be obtained directly from this graph without any post-processing. Extensive experiments on synthetic and real-world benchmark datasets demonstrate the efficacy and efficiency of our method, especially for the views with noise or outliers.

近年来,多视图学习受到越来越多的关注。然而,传统方法只关注视图之间的差异,而忽略了视图之间的一致性。这可能会使一些存在数据异常或噪声的视图在视图学习过程中失去效果。此外,当前的数据集逐渐变得高维化和大规模化。因此,本文提出了一种通过低秩张量约束的新型多视图压缩子空间学习方法,将聚类进展和多视图学习纳入一个统一的框架。首先,针对每个视图,我们提取部分样本来构建小尺寸字典,这样可以大大降低冗余信息和计算成本的影响。然后,为了找到不同视图之间的一致性和差异性,我们对这些表征施加了低秩张量约束,并进一步设计了一种自动加权机制来学习最优表征。最后,由于学习到的表征是非方形的,因此引入了双方图,在结构化约束下,可以直接从该图中得到聚类结果,而无需任何后处理。在合成数据集和真实基准数据集上进行的大量实验证明了我们的方法的有效性和高效性,尤其是对于有噪声或异常值的视图。
{"title":"Tensorized and Compressed Multi-view Subspace Clustering via Structured Constraint.","authors":"Wei Chang, Huimin Chen, Feiping Nie, Rong Wang, Xuelong Li","doi":"10.1109/TPAMI.2024.3446537","DOIUrl":"10.1109/TPAMI.2024.3446537","url":null,"abstract":"<p><p>Multi-view learning has raised more and more attention in recent years. However, traditional approaches only focus on the difference while ignoring the consistency among views. It may make some views, with the situation of data abnormality or noise, ineffective in the progress of view learning. Besides, the current datasets have become high-dimensional and large-scale gradually. Therefore, this paper proposes a novel multi-view compressed subspace learning method via low-rank tensor constraint, which incorporates the clustering progress and multi-view learning into a unified framework. First, for each view, we take the partial samples to build a small-size dictionary, which can reduce the effect of both redundancy information and computation cost greatly. Then, to find the consistency and difference among views, we impose a low-rank tensor constraint on these representations and further design an auto-weighted mechanism to learn the optimal representation. Last, due to the non-square of the learned representation, the bipartite graph has been introduced, and under the structured constraint, the clustering results can be obtained directly from this graph without any post-processing. Extensive experiments on synthetic and real-world benchmark datasets demonstrate the efficacy and efficiency of our method, especially for the views with noise or outliers.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142010127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approaching the Global Nash Equilibrium of Non-convex Multi-player Games. 接近非凸多人游戏的全局纳什均衡
Pub Date : 2024-08-19 DOI: 10.1109/TPAMI.2024.3445666
Guanpu Chen, Gehui Xu, Fengxiang He, Yiguang Hong, Leszek Rutkowski, Dacheng Tao

Many machine learning problems can be formulated as non-convex multi-player games. Due to non-convexity, it is challenging to obtain the existence condition of the global Nash equilibrium (NE) and design theoretically guaranteed algorithms. This paper studies a class of non-convex multi-player games, where players' payoff functions consist of canonical functions and quadratic operators. We leverage conjugate properties to transform the complementary problem into a variational inequality (VI) problem using a continuous pseudo-gradient mapping. We prove the existence condition of the global NE as the solution to the VI problem satisfies a duality relation. We then design an ordinary differential equation to approach the global NE with an exponential convergence rate. For practical implementation, we derive a discretized algorithm and apply it to two scenarios: multi-player games with generalized monotonicity and multi-player potential games. In the two settings, step sizes are required to be O(1/k) and O(1/√k) to yield the convergence rates of O(1/ k) and O(1/√k), respectively. Extensive experiments on robust neural network training and sensor network localization validate our theory. Our code is available at https://github.com/GuanpuChen/Global-NE.

许多机器学习问题都可以表述为非凸多玩家博弈。由于非凸性,获得全局纳什均衡(NE)的存在条件和设计理论上有保证的算法是一项挑战。本文研究了一类非凸多玩家博弈,其中玩家的报酬函数由典型函数和二次算子组成。我们利用共轭特性,使用连续伪梯度映射将互补问题转化为变不等式(VI)问题。我们证明了全局 NE 的存在条件,因为 VI 问题的解满足对偶关系。然后,我们设计了一个常微分方程,以指数收敛速度逼近全局近似值。在实际应用中,我们推导出一种离散化算法,并将其应用于两种情况:具有广义单调性的多人博弈和多人潜在博弈。在这两种情况下,步长要求分别为 O(1/k) 和 O(1/√k),收敛率分别为 O(1/k) 和 O(1/√k)。鲁棒神经网络训练和传感器网络定位的大量实验验证了我们的理论。我们的代码见 https://github.com/GuanpuChen/Global-NE。
{"title":"Approaching the Global Nash Equilibrium of Non-convex Multi-player Games.","authors":"Guanpu Chen, Gehui Xu, Fengxiang He, Yiguang Hong, Leszek Rutkowski, Dacheng Tao","doi":"10.1109/TPAMI.2024.3445666","DOIUrl":"10.1109/TPAMI.2024.3445666","url":null,"abstract":"<p><p>Many machine learning problems can be formulated as non-convex multi-player games. Due to non-convexity, it is challenging to obtain the existence condition of the global Nash equilibrium (NE) and design theoretically guaranteed algorithms. This paper studies a class of non-convex multi-player games, where players' payoff functions consist of canonical functions and quadratic operators. We leverage conjugate properties to transform the complementary problem into a variational inequality (VI) problem using a continuous pseudo-gradient mapping. We prove the existence condition of the global NE as the solution to the VI problem satisfies a duality relation. We then design an ordinary differential equation to approach the global NE with an exponential convergence rate. For practical implementation, we derive a discretized algorithm and apply it to two scenarios: multi-player games with generalized monotonicity and multi-player potential games. In the two settings, step sizes are required to be O(1/k) and O(1/√k) to yield the convergence rates of O(1/ k) and O(1/√k), respectively. Extensive experiments on robust neural network training and sensor network localization validate our theory. Our code is available at https://github.com/GuanpuChen/Global-NE.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective. 计算机视觉中的图神经网络和图变换器概览:以任务为导向的视角
Pub Date : 2024-08-19 DOI: 10.1109/TPAMI.2024.3445463
Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei Yang, Xiaoguang Han, Yizhou Yu

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.

图神经网络(GNN)在图表示学习方面势头迅猛,并在数据挖掘(如社交网络分析和推荐系统)、计算机视觉(如物体检测和点云学习)以及自然语言处理(如关系提取和序列学习)等多个领域提升了技术水平。随着变换器在自然语言处理和计算机视觉领域的出现,图变换器将图结构嵌入到变换器架构中,以克服局部邻域聚合的局限性,同时避免严格的结构归纳偏差。在本文中,我们从任务导向的角度全面回顾了计算机视觉中的 GNN 和图变换器。具体来说,我们根据输入数据的模式将它们在计算机视觉中的应用分为五类,即二维自然图像、视频、三维数据、视觉 + 语言和医学图像。在每个类别中,我们根据一组视觉任务进一步划分应用。通过这种以任务为导向的分类法,我们可以研究基于 GNN 的不同方法是如何处理每项任务的,以及这些方法的性能如何。基于必要的铺垫,我们提供了任务的定义和挑战、代表性方法的深入介绍,以及有关见解、局限性和未来方向的讨论。
{"title":"A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective.","authors":"Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei Yang, Xiaoguang Han, Yizhou Yu","doi":"10.1109/TPAMI.2024.3445463","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3445463","url":null,"abstract":"<p><p>Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Part Discovery via Dual Representation Alignment. 通过双重表征对齐进行无监督部件发现。
Pub Date : 2024-08-19 DOI: 10.1109/TPAMI.2024.3445582
Jiahao Xia, Wenjian Huang, Min Xu, Jianguo Zhang, Haimin Zhang, Ziyu Sheng, Dong Xu

Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks. Previous research has established that Vision Transformer can learn instance-level attention without labels, extracting high-quality instance-level representations for boosting downstream tasks. In this paper, we achieve unsupervised part-specific attention learning using a novel paradigm and further employ the part representations to improve part discovery performance. Specifically, paired images are generated from the same image with different geometric transformations, and multiple part representations are extracted from these paired images using a novel module, named PartFormer. These part representations from the paired images are then exchanged to improve geometric transformation invariance. Subsequently, the part representations are aligned with the feature map extracted by a feature map encoder, achieving high similarity with the pixel representations of the corresponding part regions and low similarity in irrelevant regions. Finally, the geometric and semantic constraints are applied to the part representations through the intermediate results in alignment for part-specific attention learning, encouraging the PartFormer to focus locally and the part representations to explicitly include the information of the corresponding parts. Moreover, the aligned part representations can further serve as a series of reliable detectors in the testing phase, predicting pixel masks for part discovery. Extensive experiments are carried out on four widely used datasets, and our results demonstrate that the proposed method achieves competitive performance and robustness due to its part-specific attention. The code will be released upon paper acceptance.

物体部件是各种下游任务的重要中间表征,但部件级表征学习仍未像其他视觉任务那样受到广泛关注。先前的研究已经证实,Vision Transformer 可以在没有标签的情况下学习实例级注意力,从而提取高质量的实例级表征,用于促进下游任务。在本文中,我们利用一种新颖的范式实现了无监督的特定部件注意力学习,并进一步利用部件表征来提高部件发现性能。具体来说,我们从具有不同几何变换的同一幅图像中生成配对图像,并使用名为 "PartFormer "的新模块从这些配对图像中提取多个零件表征。然后交换配对图像中的这些零件表示,以提高几何变换不变性。随后,将零件表示与特征图编码器提取的特征图对齐,实现与相应零件区域像素表示的高相似性和无关区域的低相似性。最后,通过对齐的中间结果,将几何和语义约束应用到零件表征中,以实现特定零件的注意力学习,从而鼓励零件成型器局部聚焦,并使零件表征明确包含相应零件的信息。此外,对齐后的部件表征还能在测试阶段进一步充当一系列可靠的检测器,预测用于发现部件的像素掩码。我们在四个广泛使用的数据集上进行了广泛的实验,结果表明,所提出的方法因其对特定部分的关注而获得了具有竞争力的性能和鲁棒性。代码将在论文被接受后发布。
{"title":"Unsupervised Part Discovery via Dual Representation Alignment.","authors":"Jiahao Xia, Wenjian Huang, Min Xu, Jianguo Zhang, Haimin Zhang, Ziyu Sheng, Dong Xu","doi":"10.1109/TPAMI.2024.3445582","DOIUrl":"10.1109/TPAMI.2024.3445582","url":null,"abstract":"<p><p>Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks. Previous research has established that Vision Transformer can learn instance-level attention without labels, extracting high-quality instance-level representations for boosting downstream tasks. In this paper, we achieve unsupervised part-specific attention learning using a novel paradigm and further employ the part representations to improve part discovery performance. Specifically, paired images are generated from the same image with different geometric transformations, and multiple part representations are extracted from these paired images using a novel module, named PartFormer. These part representations from the paired images are then exchanged to improve geometric transformation invariance. Subsequently, the part representations are aligned with the feature map extracted by a feature map encoder, achieving high similarity with the pixel representations of the corresponding part regions and low similarity in irrelevant regions. Finally, the geometric and semantic constraints are applied to the part representations through the intermediate results in alignment for part-specific attention learning, encouraging the PartFormer to focus locally and the part representations to explicitly include the information of the corresponding parts. Moreover, the aligned part representations can further serve as a series of reliable detectors in the testing phase, predicting pixel masks for part discovery. Extensive experiments are carried out on four widely used datasets, and our results demonstrate that the proposed method achieves competitive performance and robustness due to its part-specific attention. The code will be released upon paper acceptance.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SEA++: Multi-Graph-based Higher-Order Sensor Alignment for Multivariate Time-Series Unsupervised Domain Adaptation. SEA++:基于多图的高阶传感器对齐,用于多变量时间序列无监督领域适应。
Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3444904
Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen

Unsupervised Domain Adaptation (UDA) methods have been successful in reducing label dependency by minimizing the domain discrepancy between labeled source domains and unlabeled target domains. However, these methods face challenges when dealing with Multivariate Time-Series (MTS) data. MTS data typically originates from multiple sensors, each with its unique distribution. This property poses difficulties in adapting existing UDA techniques, which mainly focus on aligning global features while overlooking the distribution discrepancies at the sensor level, thus limiting their effectiveness for MTS data. To address this issue, a practical domain adaptation scenario is formulated as Multivariate Time-Series Unsupervised Domain Adaptation (MTS-UDA). In this paper, we propose SEnsor Alignment (SEA) for MTS-UDA, aiming to address domain discrepancy at both local and global sensor levels. At the local sensor level, we design endo-feature alignment, which aligns sensor features and their correlations across domains. To reduce domain discrepancy at the global sensor level, we design exo-feature alignment that enforces restrictions on global sensor features. We further extend SEA to SEA++ by enhancing the endo-feature alignment. Particularly, we incorporate multi-graph-based higher-order alignment for both sensor features and their correlations. Extensive empirical results have demonstrated the state-of-the-art performance of our SEA and SEA++ on six public MTS datasets for MTS-UDA.

无监督域自适应(UDA)方法通过最大限度地减少已标注源域和未标注目标域之间的域差异,成功地降低了标签依赖性。然而,这些方法在处理多变量时间序列(MTS)数据时面临挑战。MTS 数据通常来自多个传感器,每个传感器都有其独特的分布。现有的 UDA 技术主要关注全局特征的对齐,而忽略了传感器层面的分布差异,因此限制了其对 MTS 数据的有效性。为了解决这个问题,我们提出了一种实用的领域适配方案,即多变量时序无监督领域适配(MTS-UDA)。在本文中,我们为 MTS-UDA 提出了传感器对齐(SEA),旨在解决本地和全局传感器层面的领域差异问题。在本地传感器层面,我们设计了内部特征对齐(endo-feature alignment)技术,可对传感器特征及其跨域相关性进行对齐。为了减少全局传感器层面的领域差异,我们设计了外部特征对齐,对全局传感器特征实施限制。我们通过增强内部特征对齐,进一步将 SEA 扩展到 SEA++。特别是,我们为传感器特征及其相关性加入了基于多图的高阶对齐。广泛的实证结果表明,在 MTS-UDA 的六个公共 MTS 数据集上,我们的 SEA 和 SEA++ 具有最先进的性能。
{"title":"SEA++: Multi-Graph-based Higher-Order Sensor Alignment for Multivariate Time-Series Unsupervised Domain Adaptation.","authors":"Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen","doi":"10.1109/TPAMI.2024.3444904","DOIUrl":"10.1109/TPAMI.2024.3444904","url":null,"abstract":"<p><p>Unsupervised Domain Adaptation (UDA) methods have been successful in reducing label dependency by minimizing the domain discrepancy between labeled source domains and unlabeled target domains. However, these methods face challenges when dealing with Multivariate Time-Series (MTS) data. MTS data typically originates from multiple sensors, each with its unique distribution. This property poses difficulties in adapting existing UDA techniques, which mainly focus on aligning global features while overlooking the distribution discrepancies at the sensor level, thus limiting their effectiveness for MTS data. To address this issue, a practical domain adaptation scenario is formulated as Multivariate Time-Series Unsupervised Domain Adaptation (MTS-UDA). In this paper, we propose SEnsor Alignment (SEA) for MTS-UDA, aiming to address domain discrepancy at both local and global sensor levels. At the local sensor level, we design endo-feature alignment, which aligns sensor features and their correlations across domains. To reduce domain discrepancy at the global sensor level, we design exo-feature alignment that enforces restrictions on global sensor features. We further extend SEA to SEA++ by enhancing the endo-feature alignment. Particularly, we incorporate multi-graph-based higher-order alignment for both sensor features and their correlations. Extensive empirical results have demonstrated the state-of-the-art performance of our SEA and SEA++ on six public MTS datasets for MTS-UDA.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
T-Net++: Effective Permutation-Equivariance Network for Two-View Correspondence Pruning. T-Net++:用于双视图对应性剪枝的有效换向-方差网络
Pub Date : 2024-08-16 DOI: 10.1109/TPAMI.2024.3444457
Guobao Xiao, Xin Liu, Zhen Zhong, Xiaoqin Zhang, Jiayi Ma, Haibin Ling

We propose a conceptually novel, flexible, and effective framework (named T-Net++) for the task of two-view correspondence pruning. T-Net++ comprises two unique structures: the "-'' structure and the "|'' structure. The "-'' structure utilizes an iterative learning strategy to process correspondences, while the "|'' structure integrates all feature information of the "-'' structure and produces inlier weights. Moreover, within the "|'' structure, we design a new Local-Global Attention Fusion module to fully exploit valuable information obtained from concatenating features through channel-wise and spatial-wise relationships. Furthermore, we develop a Channel-Spatial Squeeze-and-Excitation module, a modified network backbone that enhances the representation ability of important channels and correspondences through the squeeze-and-excitation operation. T-Net++ not only preserves the permutation-equivariance manner for correspondence pruning, but also gathers rich contextual information, thereby enhancing the effectiveness of the network. Experimental results demonstrate that T-Net++ outperforms other state-of-the-art correspondence pruning methods on various benchmarks and excels in two extended tasks. Our code will be available at https://github.com/guobaoxiao/T-Net.

我们为双视图对应剪枝任务提出了一个概念新颖、灵活有效的框架(命名为 T-Net++)。T-Net++ 包括两种独特的结构:"-''结构和"|''结构。-''结构利用迭代学习策略来处理对应关系,而"|''结构则整合了"-''结构的所有特征信息,并产生离群器权重。此外,在"|''"结构中,我们设计了一个新的局部-全局注意力融合模块,以充分利用通过通道和空间关系串联特征所获得的有价值信息。此外,我们还开发了 "信道-空间挤压-激发 "模块,这是一种改进的网络骨干,通过挤压-激发操作增强了重要信道和对应关系的表示能力。T-Net++ 不仅保留了用于对应关系剪枝的置换-方差方式,还收集了丰富的上下文信息,从而提高了网络的有效性。实验结果表明,T-Net++ 在各种基准测试中的表现优于其他最先进的对应剪枝方法,并在两项扩展任务中表现出色。我们的代码将发布在 https://github.com/guobaoxiao/T-Net 网站上。
{"title":"T-Net++: Effective Permutation-Equivariance Network for Two-View Correspondence Pruning.","authors":"Guobao Xiao, Xin Liu, Zhen Zhong, Xiaoqin Zhang, Jiayi Ma, Haibin Ling","doi":"10.1109/TPAMI.2024.3444457","DOIUrl":"10.1109/TPAMI.2024.3444457","url":null,"abstract":"<p><p>We propose a conceptually novel, flexible, and effective framework (named T-Net++) for the task of two-view correspondence pruning. T-Net++ comprises two unique structures: the \"-<sup>''</sup> structure and the \"|<sup>''</sup> structure. The \"-<sup>''</sup> structure utilizes an iterative learning strategy to process correspondences, while the \"|<sup>''</sup> structure integrates all feature information of the \"-<sup>''</sup> structure and produces inlier weights. Moreover, within the \"|<sup>''</sup> structure, we design a new Local-Global Attention Fusion module to fully exploit valuable information obtained from concatenating features through channel-wise and spatial-wise relationships. Furthermore, we develop a Channel-Spatial Squeeze-and-Excitation module, a modified network backbone that enhances the representation ability of important channels and correspondences through the squeeze-and-excitation operation. T-Net++ not only preserves the permutation-equivariance manner for correspondence pruning, but also gathers rich contextual information, thereby enhancing the effectiveness of the network. Experimental results demonstrate that T-Net++ outperforms other state-of-the-art correspondence pruning methods on various benchmarks and excels in two extended tasks. Our code will be available at https://github.com/guobaoxiao/T-Net.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on pattern analysis and machine intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1