Pattern Recognition最新文献

英文中文

Weakly supervised 3D human pose estimation based on PnP projection model 基于 PnP 投影模型的弱监督三维人体姿态估计

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-18 DOI: 10.1016/j.patcog.2025.111464

Xiaoyan Zhang , Yunlai Chen , Huaijing Lai , Hongzheng Zhang

This paper describes a weakly supervised end-to-end model for estimating 3D human pose from a single image. The model is trained by reprojecting 3D poses to 2D poses for matching ground truth 2D poses for supervision, with minimal need for 3D labels. A mathematical camera model, utilizing intrinsic and extrinsic parameters, enables accurate reprojection and we use EPnP algorithm to estimate precise reprojection. While the uncertainty-aware PnP algorithm further improves the accuracy of estimated reprojection by considering the uncertainty of joint estimation. Further, an adversarial generative network, employing a Transformer-based encoder as generator, is used to predict 3D pose, which utilizes self-attention mechanism to establish dependencies between joints, and fuses features from an edge detection module and a 2D pose estimation module for constraint and spatial information. The model’s efficient reprojection method enables competitive results on Human3.6M and MPI-INF-3DHP, among weakly supervised methods, about 2.5% and 2.45% improvement respectively.

引用次数: 0

CDN4: A cross-view Deep Nearest Neighbor Neural Network for fine-grained few-shot classification

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-18 DOI: 10.1016/j.patcog.2025.111466

Xiaoxu Li , Shuo Ding , Jiyang Xie , Xiaochen Yang , Zhanyu Ma , Jing-Hao Xue

The fine-grained few-shot classification is a challenging task in computer vision, aiming to classify images with subtle and detailed differences given scarce labeled samples. A promising avenue to tackle this challenge is to use spatially local features to densely measure the similarity between query and support samples. Compared with image-level global features, local features contain more low-level information that is rich and transferable across categories. However, methods based on spatially localized features have difficulty distinguishing subtle category differences due to the lack of sample diversity. To address this issue, we propose a novel method called Cross-view Deep Nearest Neighbor Neural Network (CDN4). CDN4 applies a random geometric transformation to augment a different view of support and query samples and subsequently exploits four similarities between the original and transformed views of query local features and those views of support local features. The geometric augmentation increases the diversity between samples of the same class, and the cross-view measurement encourages the model to focus more on discriminative local features for classification through the cross-measurements between the two branches. Extensive experiments validate the superiority of CDN4, which achieves new state-of-the-art results in few-shot classification across various fine-grained benchmarks. Code is available at .

{"title":"CDN4: A cross-view Deep Nearest Neighbor Neural Network for fine-grained few-shot classification","authors":"Xiaoxu Li , Shuo Ding , Jiyang Xie , Xiaochen Yang , Zhanyu Ma , Jing-Hao Xue","doi":"10.1016/j.patcog.2025.111466","DOIUrl":"10.1016/j.patcog.2025.111466","url":null,"abstract":"<div><div>The fine-grained few-shot classification is a challenging task in computer vision, aiming to classify images with subtle and detailed differences given scarce labeled samples. A promising avenue to tackle this challenge is to use spatially local features to densely measure the similarity between query and support samples. Compared with image-level global features, local features contain more low-level information that is rich and transferable across categories. However, methods based on spatially localized features have difficulty distinguishing subtle category differences due to the lack of sample diversity. To address this issue, we propose a novel method called Cross-view Deep Nearest Neighbor Neural Network (CDN4). CDN4 applies a random geometric transformation to augment a different view of support and query samples and subsequently exploits four similarities between the original and transformed views of query local features and those views of support local features. The geometric augmentation increases the diversity between samples of the same class, and the cross-view measurement encourages the model to focus more on discriminative local features for classification through the cross-measurements between the two branches. Extensive experiments validate the superiority of CDN4, which achieves new state-of-the-art results in few-shot classification across various fine-grained benchmarks. Code is available at .</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111466"},"PeriodicalIF":7.5,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Preserving logical and functional dependencies in synthetic tabular data

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-18 DOI: 10.1016/j.patcog.2025.111459

Chaithra Umesh , Kristian Schultz , Manjunath Mahendra , Saptarshi Bej , Olaf Wolkenhauer

Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.

{"title":"Preserving logical and functional dependencies in synthetic tabular data","authors":"Chaithra Umesh , Kristian Schultz , Manjunath Mahendra , Saptarshi Bej , Olaf Wolkenhauer","doi":"10.1016/j.patcog.2025.111459","DOIUrl":"10.1016/j.patcog.2025.111459","url":null,"abstract":"<div><div>Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111459"},"PeriodicalIF":7.5,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A complementary dual model for weakly supervised salient object detection

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-18 DOI: 10.1016/j.patcog.2025.111465

Liyuan Chen , Dawei Zhang , Xiao Wang , Chang Wan , Shan Jin , Zhonglong Zheng

Leveraging scribble annotations for weakly supervised salient object detection significantly reduces reliance on extensive, precise labels during model training. To optimize the use of these sparse annotations, we introduce a novel framework called the Complementary Reliable Region Aggregation Network (CRANet). This framework utilizes a dual-model framework that integrates complementary information from two models with the same architecture but different parameters: a foreground model that generates the saliency map and a background model that identifies regions excluding salient objects. By merging the outputs of both models, we propose a reliable pseudo-label aggregation strategy that expands the supervision capability of scribble annotations, eliminating the necessity for predefined thresholds and other parameterized modules. High-confidence predictions are then combined to create pseudo labels that guide the training process of both models. Additionally, we incorporate a flipping consistency method and a flipped guided loss function to enhance prediction consistency and increase the scale of the training set, effectively addressing the challenges posed by sparse and structurally constrained scribble annotations. Experimental results demonstrate that our approach significantly outperforms existing methods.

{"title":"A complementary dual model for weakly supervised salient object detection","authors":"Liyuan Chen , Dawei Zhang , Xiao Wang , Chang Wan , Shan Jin , Zhonglong Zheng","doi":"10.1016/j.patcog.2025.111465","DOIUrl":"10.1016/j.patcog.2025.111465","url":null,"abstract":"<div><div>Leveraging scribble annotations for weakly supervised salient object detection significantly reduces reliance on extensive, precise labels during model training. To optimize the use of these sparse annotations, we introduce a novel framework called the Complementary Reliable Region Aggregation Network (CRANet). This framework utilizes a dual-model framework that integrates complementary information from two models with the same architecture but different parameters: a foreground model that generates the saliency map and a background model that identifies regions excluding salient objects. By merging the outputs of both models, we propose a reliable pseudo-label aggregation strategy that expands the supervision capability of scribble annotations, eliminating the necessity for predefined thresholds and other parameterized modules. High-confidence predictions are then combined to create pseudo labels that guide the training process of both models. Additionally, we incorporate a flipping consistency method and a flipped guided loss function to enhance prediction consistency and increase the scale of the training set, effectively addressing the challenges posed by sparse and structurally constrained scribble annotations. Experimental results demonstrate that our approach significantly outperforms existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111465"},"PeriodicalIF":7.5,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-17 DOI: 10.1016/j.patcog.2025.111467

Xu Tan , Jiawei Yang , Junqi Chen , Sylwan Rahardja , Susanto Rahardja

The Autoencoder (AE) is popular in Outlier Detection (OD) now due to its strong modeling ability. However, AE-based OD methods face the unexpected reconstruction problem: outliers are reconstructed with low errors, impeding their distinction from inliers. This stems from two aspects. First, AE may overconfidently produce good reconstructions in regions where outliers or potential outliers exist while using the mean squared error. To address this, the aleatoric uncertainty was introduced to construct the Probabilistic Autoencoder (PAE), and the Weighted Negative Log-Likelihood (WNLL) was proposed to enlarge the score disparity between inliers and outliers. Second, AE focuses on global modeling yet lacks the perception of local information. Therefore, the Mean-Shift Scoring (MSS) method was proposed to utilize the local relationship of data to reduce the false inliers caused by AE. Moreover, experiments on 32 real-world OD datasets proved the effectiveness of the proposed methods. The combination of WNLL and MSS achieved 45% relative performance improvement compared to the best baseline. In addition, MSS improved the detection performance of multiple AE-based outlier detectors by an average of 20%. The proposed methods have the potential to advance AE’s development in OD.

{"title":"MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction","authors":"Xu Tan , Jiawei Yang , Junqi Chen , Sylwan Rahardja , Susanto Rahardja","doi":"10.1016/j.patcog.2025.111467","DOIUrl":"10.1016/j.patcog.2025.111467","url":null,"abstract":"<div><div>The Autoencoder (AE) is popular in Outlier Detection (OD) now due to its strong modeling ability. However, AE-based OD methods face the unexpected reconstruction problem: outliers are reconstructed with low errors, impeding their distinction from inliers. This stems from two aspects. First, AE may overconfidently produce good reconstructions in regions where outliers or potential outliers exist while using the mean squared error. To address this, the aleatoric uncertainty was introduced to construct the Probabilistic Autoencoder (PAE), and the Weighted Negative Log-Likelihood (WNLL) was proposed to enlarge the score disparity between inliers and outliers. Second, AE focuses on global modeling yet lacks the perception of local information. Therefore, the Mean-Shift Scoring (MSS) method was proposed to utilize the local relationship of data to reduce the false inliers caused by AE. Moreover, experiments on 32 real-world OD datasets proved the effectiveness of the proposed methods. The combination of WNLL and MSS achieved 45% relative performance improvement compared to the best baseline. In addition, MSS improved the detection performance of multiple AE-based outlier detectors by an average of 20%. The proposed methods have the potential to advance AE’s development in OD.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111467"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SSDFusion: A scene-semantic decomposition approach for visible and infrared image fusion

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-17 DOI: 10.1016/j.patcog.2025.111457

Rui Ming , Yixian Xiao , Xinyu Liu , Guolong Zheng , Guobao Xiao

Visible and infrared image fusion aims to generate fused images with comprehensive scene understanding and detailed contextual information. However, existing methods often struggle to adequately handle relationships between different modalities and optimize for downstream applications. To address these challenges, we propose a novel scene-semantic decomposition-based approach for visible and infrared image fusion, termed SSDFusion. Our method employs a multi-level encoder-fusion network with fusion modules implementing the proposed scene-semantic decomposition and fusion strategy to extract and fuse scene-related and semantic-related components, respectively, and inject the fused semantics into scene features, enriching the contextual information in fused features while sustaining fidelity of fused images. Moreover, we further incorporate meta-feature embedding to connect the encoder-fusion network with the downstream application network during the training process, enhancing our method’s ability to extract semantics, optimize the fusion effect, and serve tasks such as semantic segmentation. Extensive experiments demonstrate that SSDFusion achieves state-of-the-art image fusion performance while enhancing results on semantic segmentation tasks. Our approach bridges the gap between feature decomposition-based image fusion and high-level vision applications, providing a more effective paradigm for multi-modal image fusion. The code is available at https://github.com/YiXian-Xiao/SSDFusion.

{"title":"SSDFusion: A scene-semantic decomposition approach for visible and infrared image fusion","authors":"Rui Ming , Yixian Xiao , Xinyu Liu , Guolong Zheng , Guobao Xiao","doi":"10.1016/j.patcog.2025.111457","DOIUrl":"10.1016/j.patcog.2025.111457","url":null,"abstract":"<div><div>Visible and infrared image fusion aims to generate fused images with comprehensive scene understanding and detailed contextual information. However, existing methods often struggle to adequately handle relationships between different modalities and optimize for downstream applications. To address these challenges, we propose a novel scene-semantic decomposition-based approach for visible and infrared image fusion, termed <em>SSDFusion</em>. Our method employs a multi-level encoder-fusion network with fusion modules implementing the proposed scene-semantic decomposition and fusion strategy to extract and fuse scene-related and semantic-related components, respectively, and inject the fused semantics into scene features, enriching the contextual information in fused features while sustaining fidelity of fused images. Moreover, we further incorporate meta-feature embedding to connect the encoder-fusion network with the downstream application network during the training process, enhancing our method’s ability to extract semantics, optimize the fusion effect, and serve tasks such as semantic segmentation. Extensive experiments demonstrate that SSDFusion achieves state-of-the-art image fusion performance while enhancing results on semantic segmentation tasks. Our approach bridges the gap between feature decomposition-based image fusion and high-level vision applications, providing a more effective paradigm for multi-modal image fusion. The code is available at https://github.com/YiXian-Xiao/SSDFusion.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111457"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Edge-guided 3D reconstruction from multi-view sketches and RGB images 根据多视角草图和 RGB 图像进行边缘引导 3D 重建

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-17 DOI: 10.1016/j.patcog.2025.111462

Wuzhen Shi, Aixue Yin, Yingxiang Li, Yang Wen

Considering that edge maps can well reflect the structure of objects, we novelly propose to train an end-to-end network to reconstruct 3D models from edge maps. Since edge maps can be easily extracted from RGB images and sketches, our edge-based 3D reconstruction network (EBNet) can be used to reconstruct 3D models from both RGB images and sketches. In order to utilize both the texture and edge information of the image to obtain better 3D reconstruction results, we further propose an edge-guided 3D reconstruction network (EGNet), which enhances the perception of structures by edge information to improve the performance of the reconstructed 3D model. Although sketches have less texture information compared to RGB images, experiments show that our EGNet can also help improve the performance of reconstructing 3D models from sketches. To exploit the complementary information among different viewpoints, we further propose a multi-view edge-guided 3D reconstruction network (MEGNet) with a structure-aware fusion module. To the best of our knowledge, we are the first to use edge maps to enhance structural information for multi-view 3D reconstruction. Experimental results on the ShapeNet, Synthetic-LineDrawing benchmarks show that the proposed method outperforms the state-of-the-art methods for reconstructing 3D models from both RGB images and sketches. Ablation studies demonstrate the effectiveness of the proposed different modules.

{"title":"Edge-guided 3D reconstruction from multi-view sketches and RGB images","authors":"Wuzhen Shi, Aixue Yin, Yingxiang Li, Yang Wen","doi":"10.1016/j.patcog.2025.111462","DOIUrl":"10.1016/j.patcog.2025.111462","url":null,"abstract":"<div><div>Considering that edge maps can well reflect the structure of objects, we novelly propose to train an end-to-end network to reconstruct 3D models from edge maps. Since edge maps can be easily extracted from RGB images and sketches, our edge-based 3D reconstruction network (EBNet) can be used to reconstruct 3D models from both RGB images and sketches. In order to utilize both the texture and edge information of the image to obtain better 3D reconstruction results, we further propose an edge-guided 3D reconstruction network (EGNet), which enhances the perception of structures by edge information to improve the performance of the reconstructed 3D model. Although sketches have less texture information compared to RGB images, experiments show that our EGNet can also help improve the performance of reconstructing 3D models from sketches. To exploit the complementary information among different viewpoints, we further propose a multi-view edge-guided 3D reconstruction network (MEGNet) with a structure-aware fusion module. To the best of our knowledge, we are the first to use edge maps to enhance structural information for multi-view 3D reconstruction. Experimental results on the ShapeNet, Synthetic-LineDrawing benchmarks show that the proposed method outperforms the state-of-the-art methods for reconstructing 3D models from both RGB images and sketches. Ablation studies demonstrate the effectiveness of the proposed different modules.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111462"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Infrared and visible image fusion via dual encoder based on dense connection

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-17 DOI: 10.1016/j.patcog.2025.111476

Quan Lu, Hongbin Zhang, Linfei Yin

Aiming at the problems of information loss and edge blurring due to the loss of gradient features that tend to occur during the fusion of infrared and visible images, this study proposes a dual encoder image fusion method (DEFusion) based on dense connectivity. The proposed method processes infrared and visible images by different means, therefore guaranteeing the best possible preservation of the features of the original image. A new progressive fusion strategy is constructed to ensure that the network is better able to capture the detailed information present in visible images while minimizing the gradient loss of the infrared image. Furthermore, a novel loss function that includes gradient loss and content loss, which ensures that the fusion results consider both the detailed information and gradient of the source image, is proposed in this study to facilitate the fusion process. The experimental results with the state-of-art methods on TNO and RoadScene datasets verify that the proposed method exhibits superior performance in most indices. The fused image exhibits excellent subjective contrast and clarity, providing a strong visual perception. The results of the comparison experiment demonstrate that this method exhibits favorable characteristics in terms of generalization and robustness.

针对红外图像和可见光图像融合过程中容易出现的梯度特征丢失导致的信息丢失和边缘模糊问题，本研究提出了一种基于密集连接的双编码器图像融合方法（DEFusion）。该方法采用不同的方法处理红外图像和可见光图像，因此能最大限度地保留原始图像的特征。该方法构建了一种新的渐进式融合策略，以确保网络能够更好地捕捉可见光图像中的详细信息，同时最大限度地减少红外图像的梯度损失。此外，本研究还提出了一种包含梯度损失和内容损失的新型损失函数，以确保融合结果同时考虑源图像的详细信息和梯度，从而促进融合过程。在 TNO 和 RoadScene 数据集上与最先进方法的实验结果验证了所提出的方法在大多数指标上都表现出了卓越的性能。融合后的图像具有极佳的主观对比度和清晰度，提供了强烈的视觉感受。对比实验的结果表明，该方法在通用性和鲁棒性方面表现出良好的特性。

{"title":"Infrared and visible image fusion via dual encoder based on dense connection","authors":"Quan Lu, Hongbin Zhang, Linfei Yin","doi":"10.1016/j.patcog.2025.111476","DOIUrl":"10.1016/j.patcog.2025.111476","url":null,"abstract":"<div><div>Aiming at the problems of information loss and edge blurring due to the loss of gradient features that tend to occur during the fusion of infrared and visible images, this study proposes a dual encoder image fusion method (DEFusion) based on dense connectivity. The proposed method processes infrared and visible images by different means, therefore guaranteeing the best possible preservation of the features of the original image. A new progressive fusion strategy is constructed to ensure that the network is better able to capture the detailed information present in visible images while minimizing the gradient loss of the infrared image. Furthermore, a novel loss function that includes gradient loss and content loss, which ensures that the fusion results consider both the detailed information and gradient of the source image, is proposed in this study to facilitate the fusion process. The experimental results with the state-of-art methods on TNO and RoadScene datasets verify that the proposed method exhibits superior performance in most indices. The fused image exhibits excellent subjective contrast and clarity, providing a strong visual perception. The results of the comparison experiment demonstrate that this method exhibits favorable characteristics in terms of generalization and robustness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111476"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MDFCL: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-16 DOI: 10.1016/j.patcog.2025.111463

Xu Gong , Maotao Liu , Qun Liu , Yike Guo , Guoyin Wang

Molecular property prediction is a critical task with substantial applications for drug design and repositioning. The multiplicity of molecular data modalities and paucity of labeled data present significant challenges that affect algorithmic performance in this domain. Nevertheless, conventional approaches typically focus on singular data modalities and ignore either hierarchical structural features or other data pattern information, leading to problems when expressing complex phenomena and relationships. Additionally, the scarcity of labeled data obstructs the accurate mapping of instances to labels in property prediction tasks. To address these issues, we propose the Multimodal Data Fusion-based graph Contrastive Learning framework (MDFCL) for molecular property prediction. Specifically, we incorporate exhaustive information from dual molecular data modalities, namely graph and sequence structures. Subsequently, adaptive data augmentation strategies are designed based on the molecular backbones and side chains for multimodal data. Built upon these augmentation strategies, we develop a graph contrastive learning framework and pre-train it with unlabeled data (

\sim

10M molecules). MDFCL is tested using 13 molecular property prediction benchmark datasets, demonstrating its effectiveness through empirical findings. In addition, a visualization study demonstrates that MDFCL can embed molecules into representative features and steer the distribution of molecular representations.

{"title":"MDFCL: Multimodal data fusion-based graph contrastive learning framework for molecular property prediction","authors":"Xu Gong , Maotao Liu , Qun Liu , Yike Guo , Guoyin Wang","doi":"10.1016/j.patcog.2025.111463","DOIUrl":"10.1016/j.patcog.2025.111463","url":null,"abstract":"<div><div>Molecular property prediction is a critical task with substantial applications for drug design and repositioning. The multiplicity of molecular data modalities and paucity of labeled data present significant challenges that affect algorithmic performance in this domain. Nevertheless, conventional approaches typically focus on singular data modalities and ignore either hierarchical structural features or other data pattern information, leading to problems when expressing complex phenomena and relationships. Additionally, the scarcity of labeled data obstructs the accurate mapping of instances to labels in property prediction tasks. To address these issues, we propose the <strong>M</strong>ultimodal <strong>D</strong>ata <strong>F</strong>usion-based graph <strong>C</strong>ontrastive <strong>L</strong>earning framework (MDFCL) for molecular property prediction. Specifically, we incorporate exhaustive information from dual molecular data modalities, namely graph and sequence structures. Subsequently, adaptive data augmentation strategies are designed based on the molecular backbones and side chains for multimodal data. Built upon these augmentation strategies, we develop a graph contrastive learning framework and pre-train it with unlabeled data (<span><math><mo>∼</mo></math></span> 10M molecules). MDFCL is tested using 13 molecular property prediction benchmark datasets, demonstrating its effectiveness through empirical findings. In addition, a visualization study demonstrates that MDFCL can embed molecules into representative features and steer the distribution of molecular representations.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111463"},"PeriodicalIF":7.5,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143436846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2025-02-15 DOI: 10.1016/j.patcog.2025.111447

Wenfeng Song , Zhongyong Ye , Meng Sun , Xia Hou , Shuai Li , Aimin Hao

In the progressive domain of computer vision, generating high-fidelity facial images from textual descriptions with precision remains a complex challenge. While existing diffusion models have demonstrated capabilities in text-to-image synthesis, they often struggle with capturing intricate details from complex, multi-attribute textual descriptions, leading to entity or attribute loss and inaccurate combinations. We propose AttriDiffuser, a novel model designed to ensure that each entity and attribute in textual descriptions is distinctly and accurately represented in the synthesized images. AttriDiffuser utilizes a text-driven attribute diffusion adversarial model, enhancing the correspondence between textual attributes and image features. It incorporates an attribute-gating cross-attention mechanism seamlessly into the adversarial learning enhanced diffusion model. AttriDiffuser advances traditional diffusion models by integrating a face diversity discriminator, which augments adversarial training and promotes the generation of diverse yet precise facial images in alignment with complex textual descriptions. Our empirical evaluation, conducted on the renowned Multimodal VoxCeleb and CelebA-HQ datasets, and benchmarked against other state-of-the-art models, demonstrates AttriDiffuser’s superior efficacy. The results indicate its unparalleled capability to synthesize high-quality facial images with rigorous adherence to complex, multi-faceted textual descriptions, marking a significant advancement in text-to-facial attribute synthesis. Our code and model will be made publicly available at https://github.com/sunmeng7/AttriDiffuser.

{"title":"AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis","authors":"Wenfeng Song , Zhongyong Ye , Meng Sun , Xia Hou , Shuai Li , Aimin Hao","doi":"10.1016/j.patcog.2025.111447","DOIUrl":"10.1016/j.patcog.2025.111447","url":null,"abstract":"<div><div>In the progressive domain of computer vision, generating high-fidelity facial images from textual descriptions with precision remains a complex challenge. While existing diffusion models have demonstrated capabilities in text-to-image synthesis, they often struggle with capturing intricate details from complex, multi-attribute textual descriptions, leading to entity or attribute loss and inaccurate combinations. We propose AttriDiffuser, a novel model designed to ensure that each entity and attribute in textual descriptions is distinctly and accurately represented in the synthesized images. AttriDiffuser utilizes a text-driven attribute diffusion adversarial model, enhancing the correspondence between textual attributes and image features. It incorporates an attribute-gating cross-attention mechanism seamlessly into the adversarial learning enhanced diffusion model. AttriDiffuser advances traditional diffusion models by integrating a face diversity discriminator, which augments adversarial training and promotes the generation of diverse yet precise facial images in alignment with complex textual descriptions. Our empirical evaluation, conducted on the renowned Multimodal VoxCeleb and CelebA-HQ datasets, and benchmarked against other state-of-the-art models, demonstrates AttriDiffuser’s superior efficacy. The results indicate its unparalleled capability to synthesize high-quality facial images with rigorous adherence to complex, multi-faceted textual descriptions, marking a significant advancement in text-to-facial attribute synthesis. Our code and model will be made publicly available at <span><span>https://github.com/sunmeng7/AttriDiffuser</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111447"},"PeriodicalIF":7.5,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀