首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
An Efficient Multi-Estimation-Based Parameter Centroid Decision Via Linear Regression Approach. 基于多元估计的参数质心线性回归决策。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tpami.2026.3653765
Yeongyu Choi,Fabien Moutarde,Ju H Park,Ho-Youl Jung
We propose a novel post-processing approach for the local optimization of Locally Optimized RANdom SAmple Consensus (LO-RANSAC), called the Multi-Estimation-based Parameter Centroid (MEPC) decision. It is observed that the optimal thresholds for hypothesis generation and evaluation differ in local optimization with the inner RANSAC. Instead of binary labeling for inliers and outliers, a new ternary labeling for inliers, midliers, and outliers is introduced, using two thresholds. Our experimental results show that the highest-scoring model measured by the ternary method is closer to the real model than that measured by the existing binary method. However, it should be noted that the highest score still does not correspond to the best model due to inaccurate evaluation by data noise. We introduce a new linear model centroid decision method to compensate for the highest-scoring model distorted by noise. In this process, an efficient method for measuring the similarity between two hypotheses is introduced, and candidates close to the real model are found by comparing their similarity with the highest-scoring model. Our approach determines a representative model of the multiple candidate hypotheses, which is defined as the geometric centroid of hyperplanes. We test on various datasets for homography, fundamental, and essential matrices, demonstrating that applying MEPC to existing RANSAC algorithms achieves more accurate and stable model estimation. Moreover, additional experiments on vanishing point detection show the potential of our approach for various model estimation applications.
我们提出了一种新的局部优化随机样本一致性(LO-RANSAC)的局部优化后处理方法,称为基于多估计的参数质心(MEPC)决策。观察到,在局部优化中,假设生成和评估的最优阈值与内部RANSAC不同。用两个阈值代替对内线和离群值的二值标记,引入了一种新的对内线、中线和离群值的三元标记。我们的实验结果表明,与现有的二元方法相比,三元方法测量的最高评分模型更接近真实模型。但是需要注意的是,由于数据噪声的评价不准确,最高的分数仍然不对应最佳模型。提出了一种新的线性模型质心判定方法来补偿受噪声影响的高分模型。在此过程中,引入了一种有效的度量两个假设之间相似度的方法,通过与得分最高的模型的相似度比较,找到接近真实模型的候选模型。我们的方法确定了多个候选假设的代表性模型,该模型被定义为超平面的几何质心。我们在不同的数据集上测试了单应性、基本矩阵和基本矩阵,证明将MEPC应用于现有的RANSAC算法可以获得更准确和稳定的模型估计。此外,关于消失点检测的附加实验显示了我们的方法在各种模型估计应用中的潜力。
{"title":"An Efficient Multi-Estimation-Based Parameter Centroid Decision Via Linear Regression Approach.","authors":"Yeongyu Choi,Fabien Moutarde,Ju H Park,Ho-Youl Jung","doi":"10.1109/tpami.2026.3653765","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653765","url":null,"abstract":"We propose a novel post-processing approach for the local optimization of Locally Optimized RANdom SAmple Consensus (LO-RANSAC), called the Multi-Estimation-based Parameter Centroid (MEPC) decision. It is observed that the optimal thresholds for hypothesis generation and evaluation differ in local optimization with the inner RANSAC. Instead of binary labeling for inliers and outliers, a new ternary labeling for inliers, midliers, and outliers is introduced, using two thresholds. Our experimental results show that the highest-scoring model measured by the ternary method is closer to the real model than that measured by the existing binary method. However, it should be noted that the highest score still does not correspond to the best model due to inaccurate evaluation by data noise. We introduce a new linear model centroid decision method to compensate for the highest-scoring model distorted by noise. In this process, an efficient method for measuring the similarity between two hypotheses is introduced, and candidates close to the real model are found by comparing their similarity with the highest-scoring model. Our approach determines a representative model of the multiple candidate hypotheses, which is defined as the geometric centroid of hyperplanes. We test on various datasets for homography, fundamental, and essential matrices, demonstrating that applying MEPC to existing RANSAC algorithms achieves more accurate and stable model estimation. Moreover, additional experiments on vanishing point detection show the potential of our approach for various model estimation applications.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"52 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learn to Enhance Sparse Spike Streams. 学习增强稀疏尖峰流。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tpami.2026.3653768
Liwen Hu,Yijia Guo,Mianzhi Liu,Yiming Fan,Rui Ma,Shengbo Chen,Lei Ma,Tiejun Huang
High-speed vision tasks have long been a challenge in computer vision. Recently, the spike camera has shown great potential in these tasks due to its high temporal resolution. Unlike traditional cameras, it emits asynchronous spike signals to capture visual information. However, under low-light conditions, spike signals becomehighly sparse, and the sparse spike streamseverely hinders theeffectiveness of existing spike-based methods in high-speed scenarios. To address this challenge,we introduce SS2DS, the first deep learning framework that enhances sparse spike streams into dense spike streams. SS2DS first estimates the spike firing frequency within sparse streams. Subsequently, the spike firing frequency is enhanced by a neural network. Finally, SS2DS decodes the enhanced spike stream from the enhanced spike firing frequency sequence. SS2DS can adjust the temporal distribution of sparse spike streams and improve the performance degradation of existing methods in low-light and high-speed scenarios. In order to evaluate sparse spikestream enhancement,we construct both synthetic and real sparse spike stream datasets. The real dataset iscollected in dynamic scenarios using the third-generation spike camera.By comparing the reconstruction results, enhanced spike streams achieve an average improvement of +0.78 MA, -18.42 BRISQUE, and -1.42 NIQE over sparse spike streams. Moreover, the enhanced spike streams also benefit other spike-based vision tasks, such as 3D reconstruction (+1.325 dB PSNR, +0.005 SSIM, and -0.01 LPIPS) and super-resolution (+0.63 MA, -13.67 BRISQUE, and -1.28 NIQE). Code and datasets will be released after publication.
高速视觉任务一直是计算机视觉领域的一大挑战。近年来,脉冲相机由于其高时间分辨率在这些任务中显示出巨大的潜力。与传统摄像机不同的是,它发射异步尖峰信号来捕捉视觉信息。然而,在弱光条件下,尖峰信号变得高度稀疏,而稀疏的尖峰流严重阻碍了现有基于尖峰的方法在高速场景下的有效性。为了应对这一挑战,我们引入了SS2DS,这是第一个将稀疏尖峰流增强为密集尖峰流的深度学习框架。SS2DS首先估计稀疏流中的尖峰发射频率。随后,通过神经网络增强脉冲发射频率。最后,SS2DS从增强的尖峰发射频率序列中解码增强的尖峰流。SS2DS可以调整稀疏尖峰流的时间分布,改善现有方法在低光和高速场景下的性能下降。为了评估稀疏尖峰流的增强效果,我们构建了合成和真实的稀疏尖峰流数据集。真实数据集是在动态场景中使用第三代spike相机收集的。通过对比重建结果,增强尖峰流比稀疏尖峰流平均提高了+0.78 MA, -18.42 BRISQUE和-1.42 NIQE。此外,增强的尖峰流也有利于其他基于尖峰的视觉任务,如3D重建(+1.325 dB PSNR, +0.005 SSIM和-0.01 LPIPS)和超分辨率(+0.63 MA, -13.67 BRISQUE和-1.28 NIQE)。代码和数据集将在出版后发布。
{"title":"Learn to Enhance Sparse Spike Streams.","authors":"Liwen Hu,Yijia Guo,Mianzhi Liu,Yiming Fan,Rui Ma,Shengbo Chen,Lei Ma,Tiejun Huang","doi":"10.1109/tpami.2026.3653768","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653768","url":null,"abstract":"High-speed vision tasks have long been a challenge in computer vision. Recently, the spike camera has shown great potential in these tasks due to its high temporal resolution. Unlike traditional cameras, it emits asynchronous spike signals to capture visual information. However, under low-light conditions, spike signals becomehighly sparse, and the sparse spike streamseverely hinders theeffectiveness of existing spike-based methods in high-speed scenarios. To address this challenge,we introduce SS2DS, the first deep learning framework that enhances sparse spike streams into dense spike streams. SS2DS first estimates the spike firing frequency within sparse streams. Subsequently, the spike firing frequency is enhanced by a neural network. Finally, SS2DS decodes the enhanced spike stream from the enhanced spike firing frequency sequence. SS2DS can adjust the temporal distribution of sparse spike streams and improve the performance degradation of existing methods in low-light and high-speed scenarios. In order to evaluate sparse spikestream enhancement,we construct both synthetic and real sparse spike stream datasets. The real dataset iscollected in dynamic scenarios using the third-generation spike camera.By comparing the reconstruction results, enhanced spike streams achieve an average improvement of +0.78 MA, -18.42 BRISQUE, and -1.42 NIQE over sparse spike streams. Moreover, the enhanced spike streams also benefit other spike-based vision tasks, such as 3D reconstruction (+1.325 dB PSNR, +0.005 SSIM, and -0.01 LPIPS) and super-resolution (+0.63 MA, -13.67 BRISQUE, and -1.28 NIQE). Code and datasets will be released after publication.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"29 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting 360 Depth Estimation With PanoGabor: A New Fusion Perspective. 用PanoGabor重新审视360深度估计:一种新的融合视角。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tpami.2026.3653796
Zhijie Shen,Chunyu Lin,Lang Nie,Kang Liao,Weisi Lin,Yao Zhao
Depth estimation from a monocular 360 image is important to the perception of the entire 3D environment. However, the inherent distortion and large field of view (FoV) in 360 images pose great challenges for this task. To this end, existing mainstream solutions typically introduce additional perspective-based 360 representations (e.g., Cubemap) to achieve effective feature extraction. Nevertheless, regardless of the introduced representations, they eventually need to be unified into the equirectangular projection (ERP) format for the subsequent depth estimation, which inevitably reintroduces additional distortions. In this work, we propose an oriented-distortion-aware Gabor Fusion framework (PGFuse) to address the above challenges. First, we introduce Gabor filters that analyze texture in the frequency domain, extending the receptive fields and enhancing depth cues. To address the reintroduced distortions, we design a latitude-aware distortion representation to generate customized, distortion-aware Gabor filters (PanoGabor filters). Furthermore, we design a channel- wise and spatial- wise unidirectional fusion module (CS-UFM) that integrates the proposed PanoGabor filters to unify other representations into the ERP format, delivering effective and distortion-aware features. Considering the orientation sensitivity of the Gabor transform, we further introduce a spherical gradient constraint to stabilize this sensitivity. Experimental results on three popular indoor 360 benchmarks demonstrate the superiority of the proposed PGFuse to existing state-of-the-art solutions. Code and models will be available at https://github.com/zhijieshen-bjtu/PGFuse.
单目360度图像的深度估计对于整个3D环境的感知非常重要。然而,360度图像固有的畸变和大视场给这一任务带来了很大的挑战。为此,现有的主流解决方案通常会引入额外的基于视角的360表示(例如Cubemap)来实现有效的特征提取。然而,无论引入何种表示,它们最终都需要统一为等矩形投影(ERP)格式以进行后续深度估计,这不可避免地会重新引入额外的失真。在这项工作中,我们提出了一个面向扭曲感知的Gabor融合框架(PGFuse)来解决上述挑战。首先,我们引入Gabor滤波器,在频域分析纹理,扩展接受域并增强深度线索。为了解决重新引入的失真,我们设计了一个纬度感知的失真表示来生成定制的、失真感知的Gabor滤波器(PanoGabor滤波器)。此外,我们设计了一个通道智能和空间智能单向融合模块(CS-UFM),该模块集成了所提出的PanoGabor滤波器,将其他表示统一为ERP格式,提供有效的失真感知特征。考虑到Gabor变换的方向敏感性,我们进一步引入了球面梯度约束来稳定这种敏感性。在三个流行的室内360基准测试上的实验结果表明,所提出的PGFuse优于现有的最先进的解决方案。代码和模型可在https://github.com/zhijieshen-bjtu/PGFuse上获得。
{"title":"Revisiting 360 Depth Estimation With PanoGabor: A New Fusion Perspective.","authors":"Zhijie Shen,Chunyu Lin,Lang Nie,Kang Liao,Weisi Lin,Yao Zhao","doi":"10.1109/tpami.2026.3653796","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653796","url":null,"abstract":"Depth estimation from a monocular 360 image is important to the perception of the entire 3D environment. However, the inherent distortion and large field of view (FoV) in 360 images pose great challenges for this task. To this end, existing mainstream solutions typically introduce additional perspective-based 360 representations (e.g., Cubemap) to achieve effective feature extraction. Nevertheless, regardless of the introduced representations, they eventually need to be unified into the equirectangular projection (ERP) format for the subsequent depth estimation, which inevitably reintroduces additional distortions. In this work, we propose an oriented-distortion-aware Gabor Fusion framework (PGFuse) to address the above challenges. First, we introduce Gabor filters that analyze texture in the frequency domain, extending the receptive fields and enhancing depth cues. To address the reintroduced distortions, we design a latitude-aware distortion representation to generate customized, distortion-aware Gabor filters (PanoGabor filters). Furthermore, we design a channel- wise and spatial- wise unidirectional fusion module (CS-UFM) that integrates the proposed PanoGabor filters to unify other representations into the ERP format, delivering effective and distortion-aware features. Considering the orientation sensitivity of the Gabor transform, we further introduce a spherical gradient constraint to stabilize this sensitivity. Experimental results on three popular indoor 360 benchmarks demonstrate the superiority of the proposed PGFuse to existing state-of-the-art solutions. Code and models will be available at https://github.com/zhijieshen-bjtu/PGFuse.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"120 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Goal-guided Prompting with Adaptive Modality Selection for Efficient Assembly Activity Anticipation in Egocentric Videos. 目标引导提示与自适应模态选择在自我中心视频中高效装配活动预测。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1109/tpami.2026.3653482
Tianshan Liu,Bing-Kun Bao
With the functions of egocentric observation and multimodal perception equipped in augmented reality (AR) devices, the next generation of smart assistants has the potential to reduce human labor and enhance execution efficiency in assembly tasks. Among diverse assembly activity understanding tasks, anticipating the near future activities is crucial yet challenging, which can assist humans or agents to actively plan and engage in interactions with the environment. However, the existing egocentric activity anticipation methods still struggle to achieve a decent trade-off between accuracy and computational efficiency, hindering them to be deployed in practical applications. To address this dilemma, in this paper, we propose a goal-guided prompting framework with adaptive modality selection (GP-AMS), for assembly activity anticipation in egocentric videos. For bridging the semantic gap between the historical observations and unobserved future activities, we inject the inferred high-level goal clues into the constructed prompts, which are further utilized to guide a pre-trained vision-language (V-L) model to compensate relevant semantics of unseen future. Moreover, a mask-and-predict strategy is adopted with two imposed constraints, i.e., casual masking and probabilistic token-dropping, to mine the intrinsic associations between the assembly activities within a specific procedure. For maintaining the benefits of exploiting multimodal information while avoiding extensively increasing the computational burdens, an adaptive modality selection strategy is designed to train a policy network, which learns to dynamically decide which modalities should be sampled for processing by the anticipation model on a per observation time-step basis. By allocating major computation to the selected indicative modalities on-the-fly, the efficiency of the overall model can be improved, thus paving the way for feasibility on real-world devices. Extensive experimental results on two public data sets validate that the proposed method yields not only consistent improvements in anticipation accuracy, but also significant savings in computation budgets.
随着增强现实(AR)设备具备自我中心观察和多模态感知功能,下一代智能助手有可能减少人力劳动,提高装配任务的执行效率。在各种装配活动理解任务中,预测近期的活动是至关重要但具有挑战性的,它可以帮助人类或代理积极规划并参与与环境的交互。然而,现有的以自我为中心的活动预测方法仍然难以在准确性和计算效率之间取得良好的平衡,阻碍了它们在实际应用中的部署。为了解决这一困境,在本文中,我们提出了一个具有自适应模态选择的目标导向提示框架(GP-AMS),用于自我中心视频中的组装活动预期。为了弥合历史观察和未观察到的未来活动之间的语义差距,我们将推断的高级目标线索注入到构建的提示中,这些提示进一步用于指导预训练的视觉语言(V-L)模型来补偿未观察到的未来的相关语义。此外,采用了屏蔽和预测策略,并施加了两个约束,即随机屏蔽和概率令牌掉落,以挖掘特定过程中组装活动之间的内在关联。为了保持利用多模态信息的优势,同时避免大量增加计算量,设计了一种自适应模态选择策略来训练策略网络,该策略网络学习在每个观测时间步的基础上动态决定哪些模态应该被预测模型采样以进行处理。通过将主要计算分配给选定的指示模式,可以提高整个模型的效率,从而为在现实设备上的可行性铺平道路。在两个公开数据集上的大量实验结果表明,该方法不仅在预测精度上取得了一致的提高,而且显著节省了计算预算。
{"title":"Goal-guided Prompting with Adaptive Modality Selection for Efficient Assembly Activity Anticipation in Egocentric Videos.","authors":"Tianshan Liu,Bing-Kun Bao","doi":"10.1109/tpami.2026.3653482","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653482","url":null,"abstract":"With the functions of egocentric observation and multimodal perception equipped in augmented reality (AR) devices, the next generation of smart assistants has the potential to reduce human labor and enhance execution efficiency in assembly tasks. Among diverse assembly activity understanding tasks, anticipating the near future activities is crucial yet challenging, which can assist humans or agents to actively plan and engage in interactions with the environment. However, the existing egocentric activity anticipation methods still struggle to achieve a decent trade-off between accuracy and computational efficiency, hindering them to be deployed in practical applications. To address this dilemma, in this paper, we propose a goal-guided prompting framework with adaptive modality selection (GP-AMS), for assembly activity anticipation in egocentric videos. For bridging the semantic gap between the historical observations and unobserved future activities, we inject the inferred high-level goal clues into the constructed prompts, which are further utilized to guide a pre-trained vision-language (V-L) model to compensate relevant semantics of unseen future. Moreover, a mask-and-predict strategy is adopted with two imposed constraints, i.e., casual masking and probabilistic token-dropping, to mine the intrinsic associations between the assembly activities within a specific procedure. For maintaining the benefits of exploiting multimodal information while avoiding extensively increasing the computational burdens, an adaptive modality selection strategy is designed to train a policy network, which learns to dynamically decide which modalities should be sampled for processing by the anticipation model on a per observation time-step basis. By allocating major computation to the selected indicative modalities on-the-fly, the efficiency of the overall model can be improved, thus paving the way for feasibility on real-world devices. Extensive experimental results on two public data sets validate that the proposed method yields not only consistent improvements in anticipation accuracy, but also significant savings in computation budgets.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"54 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Motion Prediction via Continual Prior Compensation 基于连续先验补偿的人体运动预测
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1109/tpami.2026.3651530
Jianwei Tang, Jian-Fang Hu, Tianming Liang, Xiaotong Lin, Jiangxin Sun, Wei-Shi Zheng, Jianhuang Lai
{"title":"Human Motion Prediction via Continual Prior Compensation","authors":"Jianwei Tang, Jian-Fang Hu, Tianming Liang, Xiaotong Lin, Jiangxin Sun, Wei-Shi Zheng, Jianhuang Lai","doi":"10.1109/tpami.2026.3651530","DOIUrl":"https://doi.org/10.1109/tpami.2026.3651530","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"15 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering 解析,对齐和聚合:图形驱动的视频问答组合推理
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1109/tpami.2026.3650864
Jiangtong Li, Zhaohe Liao, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Haohua Zhao, Li Niu, Guang Chen, Liqing Zhang, Changjun Jiang
{"title":"Parse, Align and Aggregate: Graph-driven Compositional Reasoning for Video Question Answering","authors":"Jiangtong Li, Zhaohe Liao, Fengshun Xiao, Tianjiao Li, Qiang Zhang, Haohua Zhao, Li Niu, Guang Chen, Liqing Zhang, Changjun Jiang","doi":"10.1109/tpami.2026.3650864","DOIUrl":"https://doi.org/10.1109/tpami.2026.3650864","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"48 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models 多模态适应与泛化研究进展:从传统方法到基础模型
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1109/tpami.2026.3651319
Hao Dong, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, Olga Fink
{"title":"Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models","authors":"Hao Dong, Moru Liu, Kaiyang Zhou, Eleni Chatzi, Juho Kannala, Cyrill Stachniss, Olga Fink","doi":"10.1109/tpami.2026.3651319","DOIUrl":"https://doi.org/10.1109/tpami.2026.3651319","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"27 1","pages":"1-20"},"PeriodicalIF":23.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Physics-Informed Noise Models from Dark Frames for Low-Light Raw Image Denoising 从暗帧中学习物理信息噪声模型用于低光原始图像去噪
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1109/tpami.2026.3651447
Hansen Feng, Lizhi Wang, Yiqi Huang, Yuzhi Wang, Lin Zhu, Hua Huang
{"title":"Learning Physics-Informed Noise Models from Dark Frames for Low-Light Raw Image Denoising","authors":"Hansen Feng, Lizhi Wang, Yiqi Huang, Yuzhi Wang, Lin Zhu, Hua Huang","doi":"10.1109/tpami.2026.3651447","DOIUrl":"https://doi.org/10.1109/tpami.2026.3651447","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"243 1","pages":"1-18"},"PeriodicalIF":23.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HGNNv2: Stable Hypergraph Neural Networks HGNNv2:稳定超图神经网络
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1109/tpami.2026.3652225
Yue Gao, Jielong Yan, Yifan Feng, Xiangmin Han, Shihui Ying, Zongze Wu, Han Hu
{"title":"HGNNv2: Stable Hypergraph Neural Networks","authors":"Yue Gao, Jielong Yan, Yifan Feng, Xiangmin Han, Shihui Ying, Zongze Wu, Han Hu","doi":"10.1109/tpami.2026.3652225","DOIUrl":"https://doi.org/10.1109/tpami.2026.3652225","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"27 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Distributed Cooperative Classification with Learned Compressed-Feature Diffusion 基于学习压缩特征扩散的鲁棒分布式协同分类
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1109/tpami.2026.3652297
Xiling Yao, Jie Chen, Jingdong Chen
{"title":"Robust Distributed Cooperative Classification with Learned Compressed-Feature Diffusion","authors":"Xiling Yao, Jie Chen, Jingdong Chen","doi":"10.1109/tpami.2026.3652297","DOIUrl":"https://doi.org/10.1109/tpami.2026.3652297","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"39 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1