Zhi Yu , Zhiyong Huang , Mingyang Hou , Yan Yan , Yushi Liu
{"title":"WTSF-ReID: Depth-driven Window-oriented Token Selection and Fusion for multi-modality vehicle re-identification with knowledge consistency constraint","authors":"Zhi Yu , Zhiyong Huang , Mingyang Hou , Yan Yan , Yushi Liu","doi":"10.1016/j.eswa.2025.126921","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-modality vehicle re-identification, as a crucial task in intelligent transportation system, aims to retrieve specific vehicles across non-overlapping cameras by amalgamating visible and infrared images. The main challenge lies in mitigating inter-modality discrepancies and extracting modality-irrelevant vehicle information. Existing methods concentrate on the integration of distinct modalities, but less attention is paid to the modality-specific crucial information. To this end, we propose a novel depth-driven Window-oriented Token Selection and Fusion network, designated as WTSF-ReID. Specifically, WTSF-ReID is comprised of three distinct modules. The initial component is a Multi-modality General Feature Extraction (MGFE) module, which employs a weight-shared vision transformer to extract features from multi-modality images. The subsequent component is a depth-driven Window-oriented Token Selection and Fusion (WTSF) module, which implements local-to-global windows to select the significant tokens, followed by token fusion and feature aggregation to extract modality-specific crucial information while mitigating inter-modality discrepancies. Finally, to further reduce inter-modality heterogeneity and enhance feature discriminability, a Knowledge Consistency Constraint (KCC) loss simultaneously deploying inter-modality token selection constraint, modality center constraint, and modality triplet constraint is constructed. Extensive experiments on the popular datasets demonstrate the competitive performance against state-of-the-art methods. The datasets and codes are available at <span><span>https://github.com/unicofu/WTSF-ReID</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"274 ","pages":"Article 126921"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425005433","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-modality vehicle re-identification, as a crucial task in intelligent transportation system, aims to retrieve specific vehicles across non-overlapping cameras by amalgamating visible and infrared images. The main challenge lies in mitigating inter-modality discrepancies and extracting modality-irrelevant vehicle information. Existing methods concentrate on the integration of distinct modalities, but less attention is paid to the modality-specific crucial information. To this end, we propose a novel depth-driven Window-oriented Token Selection and Fusion network, designated as WTSF-ReID. Specifically, WTSF-ReID is comprised of three distinct modules. The initial component is a Multi-modality General Feature Extraction (MGFE) module, which employs a weight-shared vision transformer to extract features from multi-modality images. The subsequent component is a depth-driven Window-oriented Token Selection and Fusion (WTSF) module, which implements local-to-global windows to select the significant tokens, followed by token fusion and feature aggregation to extract modality-specific crucial information while mitigating inter-modality discrepancies. Finally, to further reduce inter-modality heterogeneity and enhance feature discriminability, a Knowledge Consistency Constraint (KCC) loss simultaneously deploying inter-modality token selection constraint, modality center constraint, and modality triplet constraint is constructed. Extensive experiments on the popular datasets demonstrate the competitive performance against state-of-the-art methods. The datasets and codes are available at https://github.com/unicofu/WTSF-ReID.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.