首页 > 最新文献

Pattern Recognition最新文献

英文 中文
TranSAC: An unsupervised transferability metric based on task speciality and domain commonality TranSAC:基于任务特殊性和领域共性的无监督可转移性度量
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.patcog.2026.113137
Qianshan Zhan , Xiao-Jun Zeng , Qian Wang
In transfer learning, one fundamental problem is transferability estimation, where a metric measures transfer performance without training. Existing metrics face two issues: 1) requiring target domain labels, and 2) only focusing on task speciality but ignoring equally important domain commonality. To overcome these limitations, we propose TranSAC, a Transferability metric based on task Speciality And domain Commonality, capturing the separation between classes and the similarity between domains. Its main advantages are: 1) unsupervised, 2) fine-tuning free, and 3) applicable to source-dependent and source-free transfer scenarios. To achieve this, we investigate the upper and lower bounds of transfer performance based on fixed representations extracted from the pre-trained model. Theoretical results reveal that unsupervised transfer performance is characterized by entropy-based quantities, naturally reflecting task specificity and domain commonality. These insights motivate the design of TranSAC, which integrates both factors to enhance transferability. Extensive experiments are performed across 12 target datasets with 36 pre-trained models, including supervised CNNs, self-supervised CNNs, and ViTs. Results demonstrate the importance of domain commonality and task speciality, allowing TranSAC as superior to state-of-the-art metrics for pre-trained model ranking, target domain ranking, and source domain ranking.
在迁移学习中,一个基本问题是可迁移性估计,其中度量是在未经训练的情况下度量迁移性能。现有的度量标准面临两个问题:1)需要目标领域标签;2)只关注任务的特殊性,而忽略了同样重要的领域共性。为了克服这些限制,我们提出了TranSAC,一种基于任务特殊性和领域共性的可转移性度量,捕获类之间的分离和领域之间的相似性。它的主要优点是:1)无监督,2)无微调,以及3)适用于依赖源和无源的传输场景。为了实现这一点,我们研究了基于从预训练模型中提取的固定表示的传输性能的上界和下界。理论结果表明,无监督迁移性能具有基于熵的特征量,自然地反映了任务的特殊性和领域的共性。这些见解激发了TranSAC的设计,它整合了这两个因素以增强可转移性。在12个目标数据集和36个预训练模型上进行了广泛的实验,包括监督cnn、自监督cnn和vit。结果证明了领域共性和任务特殊性的重要性,使得TranSAC在预训练模型排名、目标领域排名和源领域排名方面优于最先进的指标。
{"title":"TranSAC: An unsupervised transferability metric based on task speciality and domain commonality","authors":"Qianshan Zhan ,&nbsp;Xiao-Jun Zeng ,&nbsp;Qian Wang","doi":"10.1016/j.patcog.2026.113137","DOIUrl":"10.1016/j.patcog.2026.113137","url":null,"abstract":"<div><div>In transfer learning, one fundamental problem is transferability estimation, where a metric measures transfer performance without training. Existing metrics face two issues: 1) requiring target domain labels, and 2) only focusing on task speciality but ignoring equally important domain commonality. To overcome these limitations, we propose TranSAC, a <strong>Tran</strong>sferability metric based on task <strong>S</strong>peciality <strong>A</strong>nd domain <strong>C</strong>ommonality, capturing the separation between classes and the similarity between domains. Its main advantages are: 1) unsupervised, 2) fine-tuning free, and 3) applicable to source-dependent and source-free transfer scenarios. To achieve this, we investigate the upper and lower bounds of transfer performance based on fixed representations extracted from the pre-trained model. Theoretical results reveal that unsupervised transfer performance is characterized by entropy-based quantities, naturally reflecting task specificity and domain commonality. These insights motivate the design of TranSAC, which integrates both factors to enhance transferability. Extensive experiments are performed across 12 target datasets with 36 pre-trained models, including supervised CNNs, self-supervised CNNs, and ViTs. Results demonstrate the importance of domain commonality and task speciality, allowing TranSAC as superior to state-of-the-art metrics for pre-trained model ranking, target domain ranking, and source domain ranking.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113137"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating transferable attacks across large vision-language models using adversarial deformation learning 利用对抗变形学习在大型视觉语言模型中产生可转移攻击
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.patcog.2026.113194
Daizong Liu , Wangqin Liu , Xiaowen Cai , Pan Zhou , Runwei Guan , Xiaoye Qu , Bo Du
Large Vision-Language Models (LVLMs) have achieved remarkable capabilities in understanding and generating content across diverse modalities, yet their vulnerability to adversarial attacks raises critical security concerns. Traditional adversarial attacks often design noisy manipulation specific to a certain LVLM, suffering from limited transferability and hindering their effectiveness against unseen LVLM models. To address this challenge, we propose a unified adversarial learning framework that enhances transferability by jointly optimizing robust perturbations for both vision and language modalities. In addition to producing perturbations on both input image and prompt, our approach also introduces multi-modal purification/transformation networks within an adversarial learning scheme, which learn worst-case distortions to evade the harmfulness of visual and textual perturbations while enforcing semantic and visual consistency, creating a generalizable training environment for the adversarial examples. Our core insight is to enforce the adversarial examples to resist the most harmful distortions produced by these networks for improving their transferability. Experiments demonstrate that our attack achieves significantly higher transfer-attack success rates compared to existing works, revealing critical robustness gaps in LVLMs.
大型视觉语言模型(LVLMs)在理解和生成跨多种模式的内容方面已经取得了非凡的能力,然而它们对对抗性攻击的脆弱性引起了严重的安全问题。传统的对抗性攻击通常针对特定的LVLM设计噪声操作,具有有限的可转移性,并且阻碍了它们对未知LVLM模型的有效性。为了应对这一挑战,我们提出了一个统一的对抗性学习框架,通过联合优化视觉和语言模式的鲁棒扰动来增强可转移性。除了在输入图像和提示符上产生扰动外,我们的方法还在对抗学习方案中引入了多模态净化/转换网络,该网络学习最坏情况的扭曲以避免视觉和文本扰动的危害,同时加强语义和视觉的一致性,为对抗示例创建一个可推广的训练环境。我们的核心观点是加强对抗性的例子,以抵制这些网络产生的最有害的扭曲,以提高它们的可转移性。实验表明,与现有的工作相比,我们的攻击实现了更高的传输攻击成功率,揭示了LVLMs中关键的鲁棒性差距。
{"title":"Generating transferable attacks across large vision-language models using adversarial deformation learning","authors":"Daizong Liu ,&nbsp;Wangqin Liu ,&nbsp;Xiaowen Cai ,&nbsp;Pan Zhou ,&nbsp;Runwei Guan ,&nbsp;Xiaoye Qu ,&nbsp;Bo Du","doi":"10.1016/j.patcog.2026.113194","DOIUrl":"10.1016/j.patcog.2026.113194","url":null,"abstract":"<div><div>Large Vision-Language Models (LVLMs) have achieved remarkable capabilities in understanding and generating content across diverse modalities, yet their vulnerability to adversarial attacks raises critical security concerns. Traditional adversarial attacks often design noisy manipulation specific to a certain LVLM, suffering from limited transferability and hindering their effectiveness against unseen LVLM models. To address this challenge, we propose a unified adversarial learning framework that enhances transferability by jointly optimizing robust perturbations for both vision and language modalities. In addition to producing perturbations on both input image and prompt, our approach also introduces multi-modal purification/transformation networks within an adversarial learning scheme, which learn worst-case distortions to evade the harmfulness of visual and textual perturbations while enforcing semantic and visual consistency, creating a generalizable training environment for the adversarial examples. Our core insight is to enforce the adversarial examples to resist the most harmful distortions produced by these networks for improving their transferability. Experiments demonstrate that our attack achieves significantly higher transfer-attack success rates compared to existing works, revealing critical robustness gaps in LVLMs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113194"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-efficient generalization for zero-shot composed image retrieval 零镜头合成图像检索的数据高效泛化
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1016/j.patcog.2026.113187
Zining Chen , Zhicheng Zhao , Fei Su , Shijian Lu
Zero-shot Composed Image Retrieval (ZS-CIR) aims to retrieve the target image based on a reference image and a text description without requiring in-distribution triplets for training. One prevalent approach follows the vision-language pretraining paradigm that employs a mapping network to transfer the image embedding to a pseudo-word token in the text embedding space. However, this approach tends to impede network generalization due to modality discrepancy and distribution shift between training and inference. To this end, we propose a Data-efficient Generalization (DeG) framework, including two novel designs, namely, Textual Supplement (TS) module and Semantic Sample Pool (SSP) module. The TS module exploits compositional textual semantics during training, enhancing the pseudo-word token with more linguistic semantics and thus mitigating the modality discrepancy effectively. The SSP module exploits the zero-shot capability of pretrained Vision-Language Models (VLMs), alleviating the distribution shift and mitigating the overfitting issue from the redundancy of the large-scale image-text data. Extensive experiments over four ZS-CIR benchmarks show that DeG outperforms the state-of-the-art (SOTA) methods with much less training data, and saves substantial training and inference time for practical usage.
Zero-shot组合图像检索(ZS-CIR)旨在基于参考图像和文本描述检索目标图像,而不需要在分布中三元组进行训练。一种流行的方法遵循视觉语言预训练范式,使用映射网络将图像嵌入转移到文本嵌入空间中的伪词标记。然而,由于训练和推理之间的模态差异和分布转移,这种方法容易阻碍网络的泛化。为此,我们提出了一个数据高效泛化(DeG)框架,其中包括两个新颖的设计,即文本补充(TS)模块和语义样本池(SSP)模块。TS模块在训练过程中利用组合文本语义,使伪词标记具有更多的语言语义,从而有效地缓解情态差异。SSP模块利用了预训练视觉语言模型(VLMs)的零射击能力,减轻了大规模图像文本数据冗余带来的分布偏移和过拟合问题。在四个ZS-CIR基准上进行的广泛实验表明,DeG在训练数据少得多的情况下优于最先进的(SOTA)方法,并为实际使用节省了大量的训练和推理时间。
{"title":"Data-efficient generalization for zero-shot composed image retrieval","authors":"Zining Chen ,&nbsp;Zhicheng Zhao ,&nbsp;Fei Su ,&nbsp;Shijian Lu","doi":"10.1016/j.patcog.2026.113187","DOIUrl":"10.1016/j.patcog.2026.113187","url":null,"abstract":"<div><div>Zero-shot Composed Image Retrieval (ZS-CIR) aims to retrieve the target image based on a reference image and a text description without requiring in-distribution triplets for training. One prevalent approach follows the vision-language pretraining paradigm that employs a mapping network to transfer the image embedding to a pseudo-word token in the text embedding space. However, this approach tends to impede network generalization due to modality discrepancy and distribution shift between training and inference. To this end, we propose a Data-efficient Generalization (DeG) framework, including two novel designs, namely, Textual Supplement (TS) module and Semantic Sample Pool (SSP) module. The TS module exploits compositional textual semantics during training, enhancing the pseudo-word token with more linguistic semantics and thus mitigating the modality discrepancy effectively. The SSP module exploits the zero-shot capability of pretrained Vision-Language Models (VLMs), alleviating the distribution shift and mitigating the overfitting issue from the redundancy of the large-scale image-text data. Extensive experiments over four ZS-CIR benchmarks show that DeG outperforms the state-of-the-art (SOTA) methods with much less training data, and saves substantial training and inference time for practical usage.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113187"},"PeriodicalIF":7.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A communication efficient boosting method for distributed spectral clustering 分布式频谱聚类的通信高效增强方法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-28 DOI: 10.1016/j.patcog.2026.113168
Yingqiu Zhu , Danyang Huang
Spectral clustering is one of the most popular clustering techniques in statistical inference. When applied to large-scale datasets, distributed spectral clustering typically faces two major challenges. First, distributed storage may disrupt the original network structure. Second, communication among computers within a distributed system results in high communication costs. In this work, we propose a communication-efficient algorithm for distributed spectral clustering. Our motivation stems from a theoretical comparison between spectral clustering on the entire dataset (global spectral clustering) and on a subsample (local spectral clustering), where we analyze the key factors underlying their performance differences. Based on the comparison, we propose a communication-efficient distributed spectral clustering (CEDSC) method, which iteratively aggregates intermediate outputs from local spectral clustering to approximate the corresponding global quantity. In this process, only low-dimensional vectors are exchanged between computers, which is shown to be communication efficient. Simulation studies and real-data applications show that CEDSC attains higher clustering accuracy than existing distributed spectral clustering methods while using only modest communication. When clustering 10,000 objects, CEDSC improves clustering accuracy by about 37% over the best baseline, with communication time below 0.4 seconds and comparable to the most communication-efficient method.
谱聚类是统计推断中最常用的聚类技术之一。当应用于大规模数据集时,分布式光谱聚类通常面临两个主要挑战。首先,分布式存储可能会破坏原有的网络结构。其次,分布式系统中计算机之间的通信导致高通信成本。在这项工作中,我们提出了一种高效通信的分布式频谱聚类算法。我们的动机源于对整个数据集(全局光谱聚类)和子样本(局部光谱聚类)的光谱聚类的理论比较,我们分析了它们性能差异背后的关键因素。在此基础上,提出了一种通信高效的分布式频谱聚类(CEDSC)方法,该方法迭代地聚集局部频谱聚类的中间输出以近似相应的全局量。在此过程中,计算机之间只交换低维向量,具有较高的通信效率。仿真研究和实际数据应用表明,CEDSC在仅使用适度通信的情况下,比现有的分布式频谱聚类方法具有更高的聚类精度。当对10,000个对象进行聚类时,CEDSC比最佳基线提高了约37%的聚类精度,通信时间低于0.4秒,与最有效的通信方法相当。
{"title":"A communication efficient boosting method for distributed spectral clustering","authors":"Yingqiu Zhu ,&nbsp;Danyang Huang","doi":"10.1016/j.patcog.2026.113168","DOIUrl":"10.1016/j.patcog.2026.113168","url":null,"abstract":"<div><div>Spectral clustering is one of the most popular clustering techniques in statistical inference. When applied to large-scale datasets, distributed spectral clustering typically faces two major challenges. First, distributed storage may disrupt the original network structure. Second, communication among computers within a distributed system results in high communication costs. In this work, we propose a communication-efficient algorithm for distributed spectral clustering. Our motivation stems from a theoretical comparison between spectral clustering on the entire dataset (global spectral clustering) and on a subsample (local spectral clustering), where we analyze the key factors underlying their performance differences. Based on the comparison, we propose a communication-efficient distributed spectral clustering (CEDSC) method, which iteratively aggregates intermediate outputs from local spectral clustering to approximate the corresponding global quantity. In this process, only low-dimensional vectors are exchanged between computers, which is shown to be communication efficient. Simulation studies and real-data applications show that CEDSC attains higher clustering accuracy than existing distributed spectral clustering methods while using only modest communication. When clustering 10,000 objects, CEDSC improves clustering accuracy by about 37% over the best baseline, with communication time below 0.4 seconds and comparable to the most communication-efficient method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113168"},"PeriodicalIF":7.6,"publicationDate":"2026-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A two-stage learning framework with a beam image dataset for automatic laser resonator alignment 基于光束图像数据集的激光谐振器自动对准两阶段学习框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.patcog.2026.113145
Shaoxiang Guo , Donald Risbridger , David A. Robb , Xianwen Kong , M. J. Daniel Esser , Michael J. Chantler , Richard M. Carter , Mustafa Suphi Erden
Accurate alignment of a laser resonator is essential for upscaling industrial laser manufacturing and precision processing. However, traditional manual or semi-automatic methods depend heavily on operator expertise, and struggle with the interdependence among multiple alignment parameters. To tackle this, we introduce the first real-world image dataset for automatic laser resonator alignment, collected on a laboratory-built resonator setup. It comprises over 6000 beam profiler images annotated with four key alignment parameters (intracavity iris aperture diameter, output coupler pitch and yaw actuator displacements, and axial position of the output coupler), with over 500,000 paired samples for data-driven alignment. Given a pair of beam profiler images exhibiting distinct beam patterns under different configurations, the system predicts the control-parameter changes required to realign the resonator. Leveraging this dataset, we propose a novel two-stage deep learning framework for automatic resonator alignment. In Stage 1, a multi-scale CNN augmented with cross-attention and correlation-difference modules, extracts features and outputs an initial coarse prediction of alignment parameters. In Stage 2, a feature-difference map is computed by subtracting the paired feature representations and fed into an iterative refinement module to correct residual misalignments. The final prediction combines coarse and refined estimates, integrating global context with fine-grained corrections for accurate inference. Experiments on our dataset and a different instance of the same physical system from which the CNN was trained suggest superior accuracy and practicality to manual alignment.
激光谐振腔的精确对准是提高工业激光制造和精密加工水平的必要条件。然而,传统的手动或半自动方法在很大程度上依赖于操作人员的专业知识,并且难以处理多个对准参数之间的相互依赖关系。为了解决这个问题,我们引入了第一个用于自动激光谐振器对准的真实世界图像数据集,该数据集收集于实验室构建的谐振器设置上。它包括6000多张带有四个关键对准参数(腔内光圈孔径、输出耦合器节距和偏转执行器位移、输出耦合器轴向位置)注释的光束剖面图像,超过50万对样本用于数据驱动的对准。给定一对在不同配置下显示不同光束模式的光束剖面仪图像,该系统预测重新调整谐振器所需的控制参数变化。利用该数据集,我们提出了一种新的两阶段深度学习框架,用于自动谐振器对齐。在阶段1中,多尺度CNN增强了交叉关注和相关差分模块,提取特征并输出对准参数的初始粗预测。在第二阶段,通过减去配对的特征表示来计算特征差图,并将其输入迭代细化模块以纠正剩余的不对齐。最终的预测结合了粗糙和精细的估计,将全局上下文与精细的修正相结合,以进行准确的推断。在我们的数据集和训练CNN的同一物理系统的不同实例上进行的实验表明,人工校准的准确性和实用性优于人工校准。
{"title":"A two-stage learning framework with a beam image dataset for automatic laser resonator alignment","authors":"Shaoxiang Guo ,&nbsp;Donald Risbridger ,&nbsp;David A. Robb ,&nbsp;Xianwen Kong ,&nbsp;M. J. Daniel Esser ,&nbsp;Michael J. Chantler ,&nbsp;Richard M. Carter ,&nbsp;Mustafa Suphi Erden","doi":"10.1016/j.patcog.2026.113145","DOIUrl":"10.1016/j.patcog.2026.113145","url":null,"abstract":"<div><div>Accurate alignment of a laser resonator is essential for upscaling industrial laser manufacturing and precision processing. However, traditional manual or semi-automatic methods depend heavily on operator expertise, and struggle with the interdependence among multiple alignment parameters. To tackle this, we introduce the first real-world image dataset for automatic laser resonator alignment, collected on a laboratory-built resonator setup. It comprises over 6000 beam profiler images annotated with four key alignment parameters (intracavity iris aperture diameter, output coupler pitch and yaw actuator displacements, and axial position of the output coupler), with over 500,000 paired samples for data-driven alignment. Given a pair of beam profiler images exhibiting distinct beam patterns under different configurations, the system predicts the control-parameter changes required to realign the resonator. Leveraging this dataset, we propose a novel two-stage deep learning framework for automatic resonator alignment. In Stage 1, a multi-scale CNN augmented with cross-attention and correlation-difference modules, extracts features and outputs an initial coarse prediction of alignment parameters. In Stage 2, a feature-difference map is computed by subtracting the paired feature representations and fed into an iterative refinement module to correct residual misalignments. The final prediction combines coarse and refined estimates, integrating global context with fine-grained corrections for accurate inference. Experiments on our dataset and a different instance of the same physical system from which the CNN was trained suggest superior accuracy and practicality to manual alignment.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113145"},"PeriodicalIF":7.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MC-MVSNet: When multi-view stereo meets monocular cues 当多视角立体遇到单目线索时
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.patcog.2026.113166
Xincheng Tang , Mengqi Rong , Bin Fan , Hongmin Liu , Shuhan Shen
Learning-based Multi-View Stereo (MVS) has become a key technique for reconstructing dense 3D point clouds from multiple calibrated images. However, real-world challenges such as occlusions and textureless regions often hinder accurate depth estimation. Recent advances in monocular Vision Foundation Models (VFMs) have demonstrated strong generalization capabilities in scene understanding, offering new opportunities to enhance the robustness of MVS. In this paper, we present MC-MVSNet, a novel MVS framework that integrates diverse monocular cues to improve depth estimation under challenging conditions. During feature extraction, we fuse conventional CNN features with VFM-derived representations through a hybrid feature fusion module, effectively combining local details and global context for more discriminative feature matching. We also propose a cost volume filtering module that enforces cross-view geometric consistency on monocular depth predictions, pruning redundant depth hypotheses to reduce the depth search space and mitigate matching ambiguity. Additionally, we leverage monocular surface normals to construct a curved patch cost aggregation module that aggregates costs over geometry-aligned curved patches, which improves depth estimation accuracy in curved and textureless regions. Extensive experiments on the DTU, Tanks and Temples, and ETH3D benchmarks demonstrate that MC-MVSNet achieves state-of-the-art performance and exhibits strong generalization capabilities, validating the effectiveness and robustness of the proposed method.
基于学习的多视点立体(MVS)技术已经成为从多个标定图像中重建密集三维点云的关键技术。然而,现实世界的挑战,如遮挡和无纹理区域往往阻碍准确的深度估计。近年来,单目视觉基础模型(VFMs)在场景理解方面具有较强的泛化能力,为增强单目视觉基础模型的鲁棒性提供了新的机会。在本文中,我们提出了MC-MVSNet,这是一个新的MVS框架,它集成了多种单目线索,以提高在具有挑战性条件下的深度估计。在特征提取过程中,我们通过混合特征融合模块将传统的CNN特征与vfm衍生的表征融合在一起,有效地将局部细节与全局上下文相结合,实现更具判别性的特征匹配。我们还提出了一个代价体积过滤模块,该模块强制单目深度预测的跨视图几何一致性,修剪冗余深度假设以减少深度搜索空间并减轻匹配歧义。此外,我们利用单眼表面法线构建了一个弯曲斑块成本聚合模块,该模块可以聚合几何对齐的弯曲斑块上的成本,从而提高了弯曲和无纹理区域的深度估计精度。在DTU、Tanks and Temples和ETH3D基准测试上进行的大量实验表明,MC-MVSNet实现了最先进的性能,并表现出强大的泛化能力,验证了所提出方法的有效性和鲁棒性。
{"title":"MC-MVSNet: When multi-view stereo meets monocular cues","authors":"Xincheng Tang ,&nbsp;Mengqi Rong ,&nbsp;Bin Fan ,&nbsp;Hongmin Liu ,&nbsp;Shuhan Shen","doi":"10.1016/j.patcog.2026.113166","DOIUrl":"10.1016/j.patcog.2026.113166","url":null,"abstract":"<div><div>Learning-based Multi-View Stereo (MVS) has become a key technique for reconstructing dense 3D point clouds from multiple calibrated images. However, real-world challenges such as occlusions and textureless regions often hinder accurate depth estimation. Recent advances in monocular Vision Foundation Models (VFMs) have demonstrated strong generalization capabilities in scene understanding, offering new opportunities to enhance the robustness of MVS. In this paper, we present MC-MVSNet, a novel MVS framework that integrates diverse monocular cues to improve depth estimation under challenging conditions. During feature extraction, we fuse conventional CNN features with VFM-derived representations through a hybrid feature fusion module, effectively combining local details and global context for more discriminative feature matching. We also propose a cost volume filtering module that enforces cross-view geometric consistency on monocular depth predictions, pruning redundant depth hypotheses to reduce the depth search space and mitigate matching ambiguity. Additionally, we leverage monocular surface normals to construct a curved patch cost aggregation module that aggregates costs over geometry-aligned curved patches, which improves depth estimation accuracy in curved and textureless regions. Extensive experiments on the DTU, Tanks and Temples, and ETH3D benchmarks demonstrate that MC-MVSNet achieves state-of-the-art performance and exhibits strong generalization capabilities, validating the effectiveness and robustness of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113166"},"PeriodicalIF":7.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning discriminative features within forward-Forward algorithm using convolutional prototype 利用卷积原型学习前向算法中的判别特征
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113139
Qiufu Li , Zewen Li , Linlin Shen
Compared to back-propagation algorithms, the forward-forward (FF) algorithm proposed by Hinton [1] can in parallel optimize all layers of deep network models, while requiring less storage and achieving higher computational efficiency. However, the current FF methods cannot fully leverage the label information of samples, which suppress the learning of discriminative features. In this paper, we propose prototype learning within the FF algorithm (PLFF). When optimizing each convolutional layer, PLFF first divides the convolutional kernels into various groups according to the number K of classes, which serve as class prototypes in the optimizing, referred to as convolutional prototypes. For every sample, K goodness scores are calculated based on its convolutional results between the sample data and the convolutional prototypes. Then, using multiple binary cross-entropy losses, PLFF maximizes the positive goodness score corresponding to the sample label while minimizing other negative goodness scores, to learn discriminative features. Meanwhile, PLFF maximizes the cosine distances among the K convolutional prototypes, which enhances their discrimination and, in turn, promotes the learning of features. The image classification results across multiple datasets show that PLFF achieves the best results among different FF methods. Finally, for the first time, we verify the long-tailed recognition performance of different FF methods, demonstrating that our PLFF achieves superior results.
与反向传播算法相比,Hinton[1]提出的forward-forward (FF)算法可以并行优化深度网络模型的所有层,同时需要更少的存储空间和更高的计算效率。然而,目前的FF方法不能充分利用样本的标签信息,这抑制了判别特征的学习。在本文中,我们提出了FF算法中的原型学习(PLFF)。在优化每个卷积层时,PLFF首先根据类的个数K将卷积核分成不同的组,作为优化过程中的类原型,称为卷积原型。对于每个样本,根据样本数据与卷积原型之间的卷积结果计算K个优度分数。然后,利用多重二值交叉熵损失,PLFF最大化样本标签对应的正优值,同时最小化其他负优值,学习判别特征。同时,PLFF最大化了K个卷积原型之间的余弦距离,增强了它们的辨别能力,进而促进了特征的学习。跨多个数据集的图像分类结果表明,在不同的图像分类方法中,PLFF方法的分类效果最好。最后,我们首次验证了不同FF方法的长尾识别性能,表明我们的PLFF方法取得了较好的效果。
{"title":"Learning discriminative features within forward-Forward algorithm using convolutional prototype","authors":"Qiufu Li ,&nbsp;Zewen Li ,&nbsp;Linlin Shen","doi":"10.1016/j.patcog.2026.113139","DOIUrl":"10.1016/j.patcog.2026.113139","url":null,"abstract":"<div><div>Compared to back-propagation algorithms, the forward-forward (FF) algorithm proposed by Hinton [1] can in parallel optimize all layers of deep network models, while requiring less storage and achieving higher computational efficiency. However, the current FF methods cannot fully leverage the label information of samples, which suppress the learning of discriminative features. In this paper, we propose prototype learning within the FF algorithm (PLFF). When optimizing each convolutional layer, PLFF first divides the convolutional kernels into various groups according to the number <em>K</em> of classes, which serve as class prototypes in the optimizing, referred to as convolutional prototypes. For every sample, <em>K</em> goodness scores are calculated based on its convolutional results between the sample data and the convolutional prototypes. Then, using multiple binary cross-entropy losses, PLFF maximizes the positive goodness score corresponding to the sample label while minimizing other negative goodness scores, to learn discriminative features. Meanwhile, PLFF maximizes the cosine distances among the <em>K</em> convolutional prototypes, which enhances their discrimination and, in turn, promotes the learning of features. The image classification results across multiple datasets show that PLFF achieves the best results among different FF methods. Finally, for the first time, we verify the long-tailed recognition performance of different FF methods, demonstrating that our PLFF achieves superior results.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113139"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146079078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EEnvA-Mamba: Effective and environtology-aware adaptive Mamba for road object detection in adverse weather scenes EEnvA-Mamba:在恶劣天气场景中有效的、具有环境意识的自适应Mamba道路目标检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113127
Yonglin Chen , Binzhi Fan , Nan Liu , Yalong Yang , Jinhui Tang
Adverse weather conditions severely degrade visual perception in autonomous driving systems, primarily due to image quality deterioration, object occlusion, and unstable illumination. Current deep learning-based detection methods exhibit limited robustness in such scenarios, as corrupted features and inefficient algorithmic adaptation impair their performance under weather variations. To overcome these challenges, we propose EEnvA-Mamba, a computationally efficient architecture that synergizes real-time processing with high detection accuracy. The framework features three core components: (1) AVSSBlock, a vision state-space block that incorporates environment-aware gating (dynamically adjusting feature-channel weights based on weather conditions) and weather-conditioned channel weighting (unequal channel responses under different weather types), effectively mitigating feature degradation; (2) A linear-complexity computation scheme that replaces conventional quadratic Transformer operations while preserving discriminative feature learning; (3) AStem, an attention-guided dual-branch module that strengthens local feature extraction via spatial-channel interactions while employing frequency domain denoising techniques to suppress noise across various frequencies, ensuring precise dependency modeling. To support rigorous validation, We collected and annotated a dataset: VOC-SNOW—a dedicated snowy road dataset comprising 2700 annotated images with diverse illumination and snowfall levels. Comparative experiments under multiple datasets verify our method’s superiority, demonstrating state-of-the-art performance with 66.4% APval (4.5% higher than leading counterparts). The source code has been released at https://github.com/fbzahwy/EEnvA-Mamba.
恶劣的天气条件严重降低了自动驾驶系统的视觉感知,主要是由于图像质量下降、物体遮挡和照明不稳定。当前基于深度学习的检测方法在这种情况下表现出有限的鲁棒性,因为损坏的特征和低效的算法适应会损害它们在天气变化下的性能。为了克服这些挑战,我们提出了EEnvA-Mamba,这是一种计算效率高的架构,可以将实时处理与高检测精度相结合。该框架具有三个核心组件:(1)AVSSBlock,一个视觉状态空间块,结合了环境感知门控(根据天气条件动态调整特征通道权重)和天气条件通道权重(不同天气类型下的不均匀通道响应),有效缓解了特征退化;(2)一种线性复杂度计算方案,取代了传统的二次型变压器运算,同时保留了判别特征学习;(3) asstem,一个注意引导的双分支模块,通过空间通道相互作用加强局部特征提取,同时采用频域去噪技术抑制不同频率的噪声,确保精确的依赖关系建模。为了支持严格的验证,我们收集并注释了一个数据集:voc - snow -一个专用的雪路数据集,包含2700张不同照明和降雪量的注释图像。在多个数据集上的对比实验验证了我们的方法的优越性,显示出最先进的性能,APval为66.4%(比领先的同行高出4.5%)。源代码已在https://github.com/fbzahwy/EEnvA-Mamba上发布。
{"title":"EEnvA-Mamba: Effective and environtology-aware adaptive Mamba for road object detection in adverse weather scenes","authors":"Yonglin Chen ,&nbsp;Binzhi Fan ,&nbsp;Nan Liu ,&nbsp;Yalong Yang ,&nbsp;Jinhui Tang","doi":"10.1016/j.patcog.2026.113127","DOIUrl":"10.1016/j.patcog.2026.113127","url":null,"abstract":"<div><div>Adverse weather conditions severely degrade visual perception in autonomous driving systems, primarily due to image quality deterioration, object occlusion, and unstable illumination. Current deep learning-based detection methods exhibit limited robustness in such scenarios, as corrupted features and inefficient algorithmic adaptation impair their performance under weather variations. To overcome these challenges, we propose EEnvA-Mamba, a computationally efficient architecture that synergizes real-time processing with high detection accuracy. The framework features three core components: (1) AVSSBlock, a vision state-space block that incorporates environment-aware gating (dynamically adjusting feature-channel weights based on weather conditions) and weather-conditioned channel weighting (unequal channel responses under different weather types), effectively mitigating feature degradation; (2) A linear-complexity computation scheme that replaces conventional quadratic Transformer operations while preserving discriminative feature learning; (3) AStem, an attention-guided dual-branch module that strengthens local feature extraction via spatial-channel interactions while employing frequency domain denoising techniques to suppress noise across various frequencies, ensuring precise dependency modeling. To support rigorous validation, We collected and annotated a dataset: VOC-SNOW—a dedicated snowy road dataset comprising 2700 annotated images with diverse illumination and snowfall levels. Comparative experiments under multiple datasets verify our method’s superiority, demonstrating state-of-the-art performance with 66.4% AP<sup><em>val</em></sup> (4.5% higher than leading counterparts). The source code has been released at <span><span>https://github.com/fbzahwy/EEnvA-Mamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113127"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FeatureSORT: A robust tracker with optimized feature integration FeatureSORT:一个强大的跟踪器与优化的功能集成
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113148
Hamidreza Hashempoor, Rosemary Koikara, Yu Dong Hwang
We introduce FeatureSORT, a simple yet effective online multiple object tracker that reinforces the baselines with a redesigned detector and additional feature cues, while keeping computational complexity low. In contrast to conventional detectors that only provide bounding boxes, our designed detector architecture is extended to output multiple appearance attributes, including clothing color, clothing style, and motion direction, alongside the bounding boxes. These feature cues, together with a ReID network, form complementary embeddings that substantially improve association accuracy. The rationale behind selecting and combining these attributes is thoroughly examined in extensive ablation studies. Furthermore, we incorporate stronger post-processing strategies, such as global linking and Gaussian Smoothing Process interpolation, to handle missing associations and detections. During online tracking, we define a measurement-to-track distance function that jointly considers IoU, direction, color, style, and ReID similarity. This design enables FeatureSORT to maintain consistent identities through longer occlusions while reducing identity switches. Extensive experiments on standard MOT benchmarks demonstrate that FeatureSORT achieves state-of-the-art (SOTA) online performance, with MOTA scores of 79.7 on MOT16, 80.6 on MOT17, 77.9 on MOT20, and 92.2 on DanceTrack, underscoring the effectiveness of feature-enriched detection in advancing multi-object tracking. Our Github repository includes code implementation.
我们介绍FeatureSORT,一个简单而有效的在线多目标跟踪器,通过重新设计的检测器和额外的特征线索加强基线,同时保持较低的计算复杂性。与仅提供边界框的传统检测器相比,我们设计的检测器架构扩展到输出多个外观属性,包括服装颜色,服装样式和运动方向,以及边界框。这些特征线索与ReID网络一起形成互补嵌入,大大提高了关联的准确性。在广泛的消融研究中,选择和组合这些属性背后的基本原理得到了彻底的检验。此外,我们结合了更强的后处理策略,如全局链接和高斯平滑过程插值,来处理缺失的关联和检测。在在线跟踪过程中,我们定义了一个测量到跟踪的距离函数,该函数联合考虑了IoU、方向、颜色、样式和ReID相似性。这种设计使FeatureSORT能够通过更长的遮挡保持一致的身份,同时减少身份切换。在标准MOT基准测试上的大量实验表明,FeatureSORT实现了最先进的(SOTA)在线性能,MOTA在MOT16上的得分为79.7,在MOT17上的得分为80.6,在MOT20上的得分为77.9,在DanceTrack上的得分为92.2,强调了特征丰富检测在推进多目标跟踪方面的有效性。我们的Github存储库包含代码实现。
{"title":"FeatureSORT: A robust tracker with optimized feature integration","authors":"Hamidreza Hashempoor,&nbsp;Rosemary Koikara,&nbsp;Yu Dong Hwang","doi":"10.1016/j.patcog.2026.113148","DOIUrl":"10.1016/j.patcog.2026.113148","url":null,"abstract":"<div><div>We introduce FeatureSORT, a simple yet effective online multiple object tracker that reinforces the baselines with a redesigned detector and additional feature cues, while keeping computational complexity low. In contrast to conventional detectors that only provide bounding boxes, our designed detector architecture is extended to output multiple appearance attributes, including clothing color, clothing style, and motion direction, alongside the bounding boxes. These feature cues, together with a ReID network, form complementary embeddings that substantially improve association accuracy. The rationale behind selecting and combining these attributes is thoroughly examined in extensive ablation studies. Furthermore, we incorporate stronger post-processing strategies, such as global linking and Gaussian Smoothing Process interpolation, to handle missing associations and detections. During online tracking, we define a measurement-to-track distance function that jointly considers IoU, direction, color, style, and ReID similarity. This design enables FeatureSORT to maintain consistent identities through longer occlusions while reducing identity switches. Extensive experiments on standard MOT benchmarks demonstrate that FeatureSORT achieves state-of-the-art (SOTA) online performance, with MOTA scores of 79.7 on MOT16, 80.6 on MOT17, 77.9 on MOT20, and 92.2 on DanceTrack, underscoring the effectiveness of feature-enriched detection in advancing multi-object tracking. <span><span>Our Github repository</span><svg><path></path></svg></span> includes code implementation.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113148"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PoseAdapter: Efficiently transferring 2D human pose estimator to 3D whole-body task via adapter PoseAdapter:通过适配器有效地将2D人体姿势估计器转换为3D全身任务
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.patcog.2026.113154
Ze Feng , Sen Yang , Jiang-Jiang Liu , Wankou Yang
In this paper, we explore the task of 3D whole-body pose estimation based on a single-frame image and propose a new paradigm called PoseAdapter, which exploits a well-pretrained 2D human pose estimation model equipped with Adapter. The mainstream paradigms for 3D human pose estimation typically require multiple stages, such as human box detection, 2D pose estimation, and lifting to 3D coordinates. Such a multi-stage approach probably loses context information in the compression process, resulting in inferior pose results, particularly for the dense prediction tasks such as 3D Whole-Body pose estimation. To improve the accuracy of pose estimation, some methods even use multi-frame fusion to enhance the current pose, including input from future frames, which is inherently non-causal. Considering that end-to-end 2D human pose methods could extract human-related and keypoint-specific visual features, we want to employ them as a general vision-based human analysis model and enable it to predict 3D whole-body poses. By freezing most of the parameters of the 2D model and tuning the newly added adapter, PoseAdapter could transfer the 2D estimator to the 3D pose task in a parameter-efficient manner, while retaining the original ability of distinguishing multiple human instances. Quantitative experimental results on H3WB demonstrate that PoseAdapter with fewer trainable parameters achieves an accuracy of 62.74 mm MPJPE. Qualitative research also shows that PoseAdapter could predict multi-person 3D Whole-Body pose results and can generalize to out-of-domain datasets, such as COCO.
在本文中,我们探索了基于单帧图像的3D全身姿态估计任务,并提出了一种名为PoseAdapter的新范式,该范式利用了配备了Adapter的预训练良好的2D人体姿态估计模型。三维人体姿态估计的主流范式通常需要多个阶段,如人体盒检测、二维姿态估计和提升到三维坐标。这种多阶段的方法可能会在压缩过程中丢失上下文信息,导致姿态结果较差,特别是对于3D全身姿态估计等密集预测任务。为了提高姿态估计的准确性,有些方法甚至使用多帧融合来增强当前姿态,包括来自未来帧的输入,这本身是非因果的。考虑到端到端的2D人体姿态方法可以提取与人体相关和特定关键点的视觉特征,我们希望将其作为一种通用的基于视觉的人体分析模型,并使其能够预测3D全身姿态。通过冻结2D模型的大部分参数并调整新添加的适配器,PoseAdapter可以将2D估计器以参数有效的方式转移到3D姿态任务中,同时保留原有的识别多个人体实例的能力。在H3WB上的定量实验结果表明,在可训练参数较少的情况下,PoseAdapter达到了62.74 mm MPJPE的精度。定性研究还表明,PoseAdapter可以预测多人3D全身姿势结果,并可以推广到域外数据集,如COCO。
{"title":"PoseAdapter: Efficiently transferring 2D human pose estimator to 3D whole-body task via adapter","authors":"Ze Feng ,&nbsp;Sen Yang ,&nbsp;Jiang-Jiang Liu ,&nbsp;Wankou Yang","doi":"10.1016/j.patcog.2026.113154","DOIUrl":"10.1016/j.patcog.2026.113154","url":null,"abstract":"<div><div>In this paper, we explore the task of 3D whole-body pose estimation based on a single-frame image and propose a new paradigm called PoseAdapter, which exploits a well-pretrained 2D human pose estimation model equipped with Adapter. The mainstream paradigms for 3D human pose estimation typically require multiple stages, such as human box detection, 2D pose estimation, and lifting to 3D coordinates. Such a multi-stage approach probably loses context information in the compression process, resulting in inferior pose results, particularly for the dense prediction tasks such as 3D Whole-Body pose estimation. To improve the accuracy of pose estimation, some methods even use multi-frame fusion to enhance the current pose, including input from future frames, which is inherently non-causal. Considering that end-to-end 2D human pose methods could extract human-related and keypoint-specific visual features, we want to employ them as a general vision-based human analysis model and enable it to predict 3D whole-body poses. By freezing most of the parameters of the 2D model and tuning the newly added adapter, PoseAdapter could transfer the 2D estimator to the 3D pose task in a parameter-efficient manner, while retaining the original ability of distinguishing multiple human instances. Quantitative experimental results on H3WB demonstrate that PoseAdapter with fewer trainable parameters achieves an accuracy of 62.74 mm MPJPE. Qualitative research also shows that PoseAdapter could predict multi-person 3D Whole-Body pose results and can generalize to out-of-domain datasets, such as COCO.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"175 ","pages":"Article 113154"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1