Pattern Recognition Letters最新文献

英文中文

Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-22 DOI: 10.1016/j.patrec.2025.04.012

Samuel Repka , Bořek Reich , Fedor Zolotarev , Tuomas Eerola , Pavel Zemčík

We propose a novel Graph Neural Network-based method for segmentation based on data fusion of multimodal Scanning Electron Microscope (SEM) images. In most cases, Backscattered Electron (BSE) images obtained using SEM do not contain sufficient information for mineral segmentation. Therefore, imaging is often complemented with point-wise Energy-Dispersive X-ray Spectroscopy (EDS) spectral measurements that provide highly accurate information about the chemical composition but that are time-consuming to acquire. This motivates the use of sparse spectral data in conjunction with BSE images for mineral segmentation. The unstructured nature of the spectral data makes most traditional image fusion techniques unsuitable for BSE-EDS fusion. We propose using graph neural networks to fuse the two modalities and segment the mineral phases simultaneously. Our results demonstrate that providing EDS data for as few as 1% of BSE pixels produces accurate segmentation, enabling rapid analysis of mineral samples. The proposed data fusion pipeline is versatile and can be adapted to other domains that involve image data and point-wise measurements.

{"title":"Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks","authors":"Samuel Repka , Bořek Reich , Fedor Zolotarev , Tuomas Eerola , Pavel Zemčík","doi":"10.1016/j.patrec.2025.04.012","DOIUrl":"10.1016/j.patrec.2025.04.012","url":null,"abstract":"<div><div>We propose a novel Graph Neural Network-based method for segmentation based on data fusion of multimodal Scanning Electron Microscope (SEM) images. In most cases, Backscattered Electron (BSE) images obtained using SEM do not contain sufficient information for mineral segmentation. Therefore, imaging is often complemented with point-wise Energy-Dispersive X-ray Spectroscopy (EDS) spectral measurements that provide highly accurate information about the chemical composition but that are time-consuming to acquire. This motivates the use of sparse spectral data in conjunction with BSE images for mineral segmentation. The unstructured nature of the spectral data makes most traditional image fusion techniques unsuitable for BSE-EDS fusion. We propose using graph neural networks to fuse the two modalities and segment the mineral phases simultaneously. Our results demonstrate that providing EDS data for as few as 1% of BSE pixels produces accurate segmentation, enabling rapid analysis of mineral samples. The proposed data fusion pipeline is versatile and can be adapted to other domains that involve image data and point-wise measurements.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 79-85"},"PeriodicalIF":3.9,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the representation of sparse stochastic matrices with state embedding

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-21 DOI: 10.1016/j.patrec.2025.04.011

Jugurta Montalvão, Gabriel Bastos, Rodrigo Sousa, Ataíde Gualberto

Embeddings are adjusted to allow points representing states and observations in Markov models, where conditional probabilities are approximately encoded as the exponential of (negative) distances, jointly scaled by a density factor. It is shown that the goodness of this approximation can be managed, mainly if the embedding dimension is chosen in function of entropies associated to the corresponding Markov model. Therefore, for sparse (low entropy) models, their representation as state embeddings can save memory and allow fully geometric versions of probabilistic algorithms, as the Viterbi, taken as an example in this work. Besides, evidences are also gathered in favor of potentially useful properties that emerge from the geometric representation of Markov models, such as analogies, superstates (aggregation) and semantic fields.

引用次数: 0

Energy-based pseudo-label refining for source-free domain adaptation

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-21 DOI: 10.1016/j.patrec.2025.04.004

Xinru Meng , Han Sun , Jiamei Liu , Ningzhong Liu , Huiyu Zhou

Source-free domain adaptation (SFDA), which involves adapting models without access to source data, is both demanding and challenging. Existing SFDA techniques typically rely on pseudo-labels generated from confidence levels, leading to negative transfer due to significant noise. To tackle this problem, Energy-Based Pseudo-Label Refining (EBPR) is proposed for SFDA. Pseudo-labels are created for all sample clusters according to their energy scores. Global and class energy thresholds are computed to selectively filter pseudo-labels. Furthermore, a contrastive learning strategy is introduced to filter difficult samples, aligning them with their augmented versions to learn more discriminative features. Our method is validated on the Office-31, Office-Home, and VisDA-C datasets, consistently finding that our model outperformed state-of-the-art methods.

无源域自适应（SFDA）涉及在不获取源数据的情况下对模型进行自适应，要求高且具有挑战性。现有的 SFDA 技术通常依赖于由置信度生成的伪标签，由于噪音较大，会导致负迁移。为解决这一问题，我们提出了基于能量的伪标签提炼（EBPR）技术，用于 SFDA。根据能量得分为所有样本簇创建伪标签。通过计算全局和类能量阈值，可选择性地过滤伪标签。此外，我们还引入了一种对比学习策略来过滤困难样本，将它们与其增强版本对齐，以学习更具区分性的特征。我们的方法在 Office-31、Office-Home 和 VisDA-C 数据集上进行了验证，结果发现我们的模型始终优于最先进的方法。

引用次数: 0

Fake News Detection using Hashtag Context

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-20 DOI: 10.1016/j.patrec.2025.04.008

Sujit Kumar, Shifali Agrahari, Priyank Soni, Aayush Sachdeva, Sanasam Ranbir Singh

The proliferation of social media platforms has resulted in an exponential increase in user-generated content, facilitating the rapid and widespread dissemination of information. However, this ease of sharing content has also paved the way for the spread of false or misleading information, commonly known as fake news, which can have harmful effects on society. Existing studies in the literature rely on content in source posts, social interaction networks, and external evidence to verify the authenticity of the posts. However, studies in the literature fails to detect fake news in the following case. (i) Sparsity and limited words in social media posts heavily affect the performance of content-based methods. (ii) Social interaction-based methods require a huge social interaction network for a given source post, which is easily unavailable for every social media post. (iii) Social media discussions sometimes precede or surpass mainstream media reporting and information from external sources such as Knowledge Base and Wikipedia. Consequently, in such circumstances, getting external information that will help verify the authenticity of social media posts is not readily available. To address the above-mentioned limitations, this study proposes Hashtag Context-aware Fake News Detection (HCFND). Our proposed model, HCFND, leverages information posted under the hashtags mentioned in the source post and relevant posts extracted from named entities mentioned in the source post as external sources of information from the community with interest in similar topics. The extraction of external information from posts under relevant hashtags and profiles mentioned in source tweets enables the HCFND to cross-reference the content of the source post with data from communities sharing similar interests, thereby facilitating the verification of the authenticity of social media posts. We evaluate the performances of the proposed model on three publicly available benchmark datasets. The results indicate that our proposed model outperforms existing state-of-the-art methods in the literature.

{"title":"Fake News Detection using Hashtag Context","authors":"Sujit Kumar, Shifali Agrahari, Priyank Soni, Aayush Sachdeva, Sanasam Ranbir Singh","doi":"10.1016/j.patrec.2025.04.008","DOIUrl":"10.1016/j.patrec.2025.04.008","url":null,"abstract":"<div><div>The proliferation of social media platforms has resulted in an exponential increase in user-generated content, facilitating the rapid and widespread dissemination of information. However, this ease of sharing content has also paved the way for the spread of false or misleading information, commonly known as fake news, which can have harmful effects on society. Existing studies in the literature rely on content in source posts, social interaction networks, and external evidence to verify the authenticity of the posts. However, studies in the literature fails to detect fake news in the following case. (i) Sparsity and limited words in social media posts heavily affect the performance of content-based methods. (ii) Social interaction-based methods require a huge social interaction network for a given source post, which is easily unavailable for every social media post. (iii) Social media discussions sometimes precede or surpass mainstream media reporting and information from external sources such as Knowledge Base and Wikipedia. Consequently, in such circumstances, getting external information that will help verify the authenticity of social media posts is not readily available. To address the above-mentioned limitations, this study proposes <em>Hashtag Context-aware Fake News Detection</em> (HCFND). Our proposed model, HCFND, leverages information posted under the hashtags mentioned in the source post and relevant posts extracted from named entities mentioned in the source post as external sources of information from the community with interest in similar topics. The extraction of external information from posts under relevant hashtags and profiles mentioned in source tweets enables the HCFND to cross-reference the content of the source post with data from communities sharing similar interests, thereby facilitating the verification of the authenticity of social media posts. We evaluate the performances of the proposed model on three publicly available benchmark datasets. The results indicate that our proposed model outperforms existing state-of-the-art methods in the literature.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 43-49"},"PeriodicalIF":3.9,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From visual features to key concepts: A Dynamic and Static Concept-driven approach for video captioning

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-19 DOI: 10.1016/j.patrec.2025.04.007

Xin Ren, Yufeng Han, Bing Wei, Xue-song Tang, Kuangrong Hao

In video captioning, accurately identifying and summarizing key concepts while ignoring irrelevant details remains a significant challenge. Mainstream approaches often suffer from the inclusion of semantically irrelevant features, leading to inaccuracies and hallucinations in the generated captions. This study aims to develop a novel framework, Dynamic and Static Concept-driven video captioning model(DiSCo), to enhance the accuracy and coherence of video captions by effectively leveraging pre-trained models and addressing the issue of semantic irrelevance. DiSCo builds upon the conventional encoder–decoder architecture by incorporating a Semantic Feature Extractor (SFE) and a Static-Dynamic Concept Detector (S-DCD). The SFE filters out semantically irrelevant features extracted by the visual model, while the S-DCD identifies critical concepts to guide the large language model (LLM) in generating captions. Both the visual model and the LLM are pre-trained and their parameters are frozen; only the SFE and S-DCD are trained to optimize the feature extraction and concept detection processes. Comprehensive experiments conducted on the MSVD and MSR-VTT datasets show that DiSCo significantly outperforms existing methods, achieving notable improvements in the quality and relevance of the generated captions. The proposed DiSCo framework demonstrates a robust solution for enhancing the accuracy and coherence of video captions by effectively integrating semantic feature extraction and concept-driven guidance.

{"title":"From visual features to key concepts: A Dynamic and Static Concept-driven approach for video captioning","authors":"Xin Ren, Yufeng Han, Bing Wei, Xue-song Tang, Kuangrong Hao","doi":"10.1016/j.patrec.2025.04.007","DOIUrl":"10.1016/j.patrec.2025.04.007","url":null,"abstract":"<div><div>In video captioning, accurately identifying and summarizing key concepts while ignoring irrelevant details remains a significant challenge. Mainstream approaches often suffer from the inclusion of semantically irrelevant features, leading to inaccuracies and hallucinations in the generated captions. This study aims to develop a novel framework, <strong>D</strong>ynam<strong>i</strong>c and <strong>S</strong>tatic <strong>Co</strong>ncept-driven video captioning model(DiSCo), to enhance the accuracy and coherence of video captions by effectively leveraging pre-trained models and addressing the issue of semantic irrelevance. DiSCo builds upon the conventional encoder–decoder architecture by incorporating a Semantic Feature Extractor (SFE) and a Static-Dynamic Concept Detector (S-DCD). The SFE filters out semantically irrelevant features extracted by the visual model, while the S-DCD identifies critical concepts to guide the large language model (LLM) in generating captions. Both the visual model and the LLM are pre-trained and their parameters are frozen; only the SFE and S-DCD are trained to optimize the feature extraction and concept detection processes. Comprehensive experiments conducted on the MSVD and MSR-VTT datasets show that DiSCo significantly outperforms existing methods, achieving notable improvements in the quality and relevance of the generated captions. The proposed DiSCo framework demonstrates a robust solution for enhancing the accuracy and coherence of video captions by effectively integrating semantic feature extraction and concept-driven guidance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 64-70"},"PeriodicalIF":3.9,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Panoptic segmentation-based semantic embedding matching model for scene graph generation

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-19 DOI: 10.1016/j.patrec.2025.04.005

Ming Zhao, Jing Zhang

Scene Graph Generation aims to construct a structured representation of entities and their relationships in an image. Traditional methods use object detection for entity localization but struggle with relationship modeling in complex scenes. Most approaches also face challenges in predicate classification due to inter-class similarity and intra-class variability. Additionally, when multiple entities are present in an image, the contextual information between them are crucial. To address these challenges, this paper proposes a Panoptic Segmentation-based Semantic Embedding Matching Network, which optimizes the entire process from entity localization to entity-pair and predicate prediction. Specifically, we use a panoptic segmentation module to locate all entities (including the foreground and background), providing comprehensive support for predicate prediction in complex scenes. Simultaneously, a semantic embedding module is introduced to fuse the visual and semantic features of entities and predicates respectively, constructing a similarity-based matching mechanism. Furthermore, we incorporate a graph attention network before the semantic embedding of entities, effectively capturing contextual information among multiple entities and dynamically adjusting the semantic embedding module. Experiments on the PSG dataset validate the proposed method’s effectiveness. The results show that our model outperforms existing methods in relationship detection and generation in complex scenes.

{"title":"Panoptic segmentation-based semantic embedding matching model for scene graph generation","authors":"Ming Zhao, Jing Zhang","doi":"10.1016/j.patrec.2025.04.005","DOIUrl":"10.1016/j.patrec.2025.04.005","url":null,"abstract":"<div><div>Scene Graph Generation aims to construct a structured representation of entities and their relationships in an image. Traditional methods use object detection for entity localization but struggle with relationship modeling in complex scenes. Most approaches also face challenges in predicate classification due to inter-class similarity and intra-class variability. Additionally, when multiple entities are present in an image, the contextual information between them are crucial. To address these challenges, this paper proposes a Panoptic Segmentation-based Semantic Embedding Matching Network, which optimizes the entire process from entity localization to entity-pair and predicate prediction. Specifically, we use a panoptic segmentation module to locate all entities (including the foreground and background), providing comprehensive support for predicate prediction in complex scenes. Simultaneously, a semantic embedding module is introduced to fuse the visual and semantic features of entities and predicates respectively, constructing a similarity-based matching mechanism. Furthermore, we incorporate a graph attention network before the semantic embedding of entities, effectively capturing contextual information among multiple entities and dynamically adjusting the semantic embedding module. Experiments on the PSG dataset validate the proposed method’s effectiveness. The results show that our model outperforms existing methods in relationship detection and generation in complex scenes.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 56-63"},"PeriodicalIF":3.9,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Information enhancement graph representation learning

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-17 DOI: 10.1016/j.patrec.2025.04.006

Jince Wang , Jian Peng , Feihu Huang , Sirui Liao , Pengxiang Zhan , Peiyu Yi

Graph representation learning is an important and fundamental research concentration in complex networks. Graph neural networks design excellent filters and perform positively in downstream tasks. From first principles, the fundamental goal of graph representation learning is to obtain neighbor information to decrease the uncertainty of target nodes. Based on the partial information decomposition (PID), this paper finds that the existing node aggregation strategy does not obtain sufficient information gain from neighbors. Furthermore, the graph contains a huge number of nodes, making mutual information decomposition challenging. Thus, this paper defines Partial Information Decomposition on Graph (PIDG) as a coarse-grained PID, designs a gate to learn the representations for information gains from neighbor nodes, and builds Information Enhancement (IE) module, which enhances nodes’ representation capabilities by combining various forms of information from neighboring nodes. This work achieves information enhancement about the nodes in a graph and is verified on authentic datasets.

{"title":"Information enhancement graph representation learning","authors":"Jince Wang , Jian Peng , Feihu Huang , Sirui Liao , Pengxiang Zhan , Peiyu Yi","doi":"10.1016/j.patrec.2025.04.006","DOIUrl":"10.1016/j.patrec.2025.04.006","url":null,"abstract":"<div><div>Graph representation learning is an important and fundamental research concentration in complex networks. Graph neural networks design excellent filters and perform positively in downstream tasks. From first principles, the fundamental goal of graph representation learning is to obtain neighbor information to decrease the uncertainty of target nodes. Based on the partial information decomposition (PID), this paper finds that the existing node aggregation strategy does not obtain sufficient information gain from neighbors. Furthermore, the graph contains a huge number of nodes, making mutual information decomposition challenging. Thus, this paper defines Partial Information Decomposition on Graph (PIDG) as a coarse-grained PID, designs a gate to learn the representations for information gains from neighbor nodes, and builds Information Enhancement (IE) module, which enhances nodes’ representation capabilities by combining various forms of information from neighboring nodes. This work achieves information enhancement about the nodes in a graph and is verified on authentic datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 36-42"},"PeriodicalIF":3.9,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Voxel and deep learning based depth complementation for transparent objects

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-15 DOI: 10.1016/j.patrec.2025.04.003

Jiaqi Li , Shuhuan Wen , Di Lu , Linxiang Li , Hong Zhang

For the problem of missing depth values of transparent objects in depth-channel captured by RGB-D camera, a voxel-based deep learning depth-completion algorithm for transparent objects is proposed. We mapped the image to the 3D voxel space, calculated the effective point cloud according to the input depth map, and obtained the occupied voxels by the boundary test method. Combined with the camera ray direction, the occupied voxels are filtered for the voxels that intersect the camera ray. Using the image features contained in the RGB image and the valid points in the intersecting voxels calculated from the point cloud image, the multi-layer perception is applied to predict the missing channel of the object, and under the constraint of surface normal consistency, the depth value is optimized. The proposed algorithm achieves improvements of 12.55%, 0.6%, and 1.63% over ClearGrasp in the metrics

δ_{1.05}

δ_{1.10}

, and

δ_{1.25}

, respectively.

{"title":"Voxel and deep learning based depth complementation for transparent objects","authors":"Jiaqi Li , Shuhuan Wen , Di Lu , Linxiang Li , Hong Zhang","doi":"10.1016/j.patrec.2025.04.003","DOIUrl":"10.1016/j.patrec.2025.04.003","url":null,"abstract":"<div><div>For the problem of missing depth values of transparent objects in depth-channel captured by RGB-D camera, a voxel-based deep learning depth-completion algorithm for transparent objects is proposed. We mapped the image to the 3D voxel space, calculated the effective point cloud according to the input depth map, and obtained the occupied voxels by the boundary test method. Combined with the camera ray direction, the occupied voxels are filtered for the voxels that intersect the camera ray. Using the image features contained in the RGB image and the valid points in the intersecting voxels calculated from the point cloud image, the multi-layer perception is applied to predict the missing channel of the object, and under the constraint of surface normal consistency, the depth value is optimized. The proposed algorithm achieves improvements of 12.55%, 0.6%, and 1.63% over ClearGrasp in the metrics <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mn>1</mn><mo>.</mo><mn>05</mn></mrow></msub></math></span>, <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mn>1</mn><mo>.</mo><mn>10</mn></mrow></msub></math></span>, and <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mn>1</mn><mo>.</mo><mn>25</mn></mrow></msub></math></span>, respectively.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 14-20"},"PeriodicalIF":3.9,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pedestrian detection based on vision-language semantics with global adaptive adjustment

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-12 DOI: 10.1016/j.patrec.2025.03.030

Yijing Guo , Fuhang Li , Yi Qiu , Pengyu Xu , Kunhua Li

Pedestrian detection is the primary task of automated driving and intelligent video surveillance systems. Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision (VLPD) greatly improves the detection accuracy of single-stage pedestrian detectors. Meanwhile, to maintain reasoning speed, VLPD adopts ResNet-50 as its backbone network, which undoubtedly poses a significant limitation for single-stage detectors that require direct category prediction and bounding box regression on feature maps. To tap into the potential of CNNs in representation capability, we propose a novel simplified architectural unit, the Channel and Spatial Global Pooling Attention Module (GPA), which integrates activation channels and spatial weights attention maps through parallel computation to achieve adaptive feature refinement of backbone output feature maps. Furthermore, we optimize the module structure of the VLPD self-supervised prototype semantic contrast method, significantly enhancing the detector’s ability to discriminate and detect pedestrians in complex urban street environments. With only a 0.2FPS decrease in reasoning speed, the miss rates on the Heavy Occlusion subsets and Reasonable subsets of the Citypersons dataset are reduced by 2.41% and 0.72%, respectively, achieving state-of-the-art (SOTA) performance for single-stage detectors on this dataset. On the Heavy Occlusion subset and the All subset of the Caltech dataset, the performance decreased by 2.90% and 0.80%, respectively. Without using additional data, this method can rival the detection accuracy of two-stage detectors.

{"title":"Pedestrian detection based on vision-language semantics with global adaptive adjustment","authors":"Yijing Guo , Fuhang Li , Yi Qiu , Pengyu Xu , Kunhua Li","doi":"10.1016/j.patrec.2025.03.030","DOIUrl":"10.1016/j.patrec.2025.03.030","url":null,"abstract":"<div><div>Pedestrian detection is the primary task of automated driving and intelligent video surveillance systems. Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision (VLPD) greatly improves the detection accuracy of single-stage pedestrian detectors. Meanwhile, to maintain reasoning speed, VLPD adopts ResNet-50 as its backbone network, which undoubtedly poses a significant limitation for single-stage detectors that require direct category prediction and bounding box regression on feature maps. To tap into the potential of CNNs in representation capability, we propose a novel simplified architectural unit, the Channel and Spatial <strong>G</strong>lobal <strong>P</strong>ooling <strong>A</strong>ttention Module (GPA), which integrates activation channels and spatial weights attention maps through parallel computation to achieve adaptive feature refinement of backbone output feature maps. Furthermore, we optimize the module structure of the VLPD self-supervised prototype semantic contrast method, significantly enhancing the detector’s ability to discriminate and detect pedestrians in complex urban street environments. With only a 0.2FPS decrease in reasoning speed, the miss rates on the Heavy Occlusion subsets and Reasonable subsets of the Citypersons dataset are reduced by 2.41% and 0.72%, respectively, achieving state-of-the-art (SOTA) performance for single-stage detectors on this dataset. On the Heavy Occlusion subset and the All subset of the Caltech dataset, the performance decreased by 2.90% and 0.80%, respectively. Without using additional data, this method can rival the detection accuracy of two-stage detectors.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 8-13"},"PeriodicalIF":3.9,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Kinship verification via Frequency Feature Decoupling and Fusion

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters

Pub Date : 2025-04-12 DOI: 10.1016/j.patrec.2025.04.002

Shuofeng Sun , Yaohan Yang , Haibin Yan

In this paper, we propose a new Frequency Feature Decoupling and Fusion Network (FDFN) method for robust kinship verification. Our approach begins with a multi-scale fusion module designed to acquire features with enhanced discriminative power, which are then decoupled into high-frequency and low-frequency components. High-frequency features focus on the local details of the face, while low-frequency features emphasize the overall structural information. Furthermore, we introduce a hybrid spatial attention module to refine the high-frequency features, allowing the model to concentrate on more important facial regions. At the same time, the hybrid channel attention module is employed to optimize the low-frequency features, enabling the model to pay attention to the more significant feature channels within the overall structure. Finally, a fusion module then combines the refined high and low-frequency features to produce the final image representation. Our method effectively resolves the conflict between local details and global structure, optimizing each aspect separately to obtain more discriminative facial features. Experimental results on the FIW and Kinface datasets demonstrate that our approach achieves superior performance compared to baseline methods, establishing a robust foundation for kinship verification tasks and advancing the state of fine-grained image analysis in computer vision.

{"title":"Kinship verification via Frequency Feature Decoupling and Fusion","authors":"Shuofeng Sun , Yaohan Yang , Haibin Yan","doi":"10.1016/j.patrec.2025.04.002","DOIUrl":"10.1016/j.patrec.2025.04.002","url":null,"abstract":"<div><div>In this paper, we propose a new Frequency Feature Decoupling and Fusion Network (FDFN) method for robust kinship verification. Our approach begins with a multi-scale fusion module designed to acquire features with enhanced discriminative power, which are then decoupled into high-frequency and low-frequency components. High-frequency features focus on the local details of the face, while low-frequency features emphasize the overall structural information. Furthermore, we introduce a hybrid spatial attention module to refine the high-frequency features, allowing the model to concentrate on more important facial regions. At the same time, the hybrid channel attention module is employed to optimize the low-frequency features, enabling the model to pay attention to the more significant feature channels within the overall structure. Finally, a fusion module then combines the refined high and low-frequency features to produce the final image representation. Our method effectively resolves the conflict between local details and global structure, optimizing each aspect separately to obtain more discriminative facial features. Experimental results on the FIW and Kinface datasets demonstrate that our approach achieves superior performance compared to baseline methods, establishing a robust foundation for kinship verification tasks and advancing the state of fine-grained image analysis in computer vision.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Pattern Recognition Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀