首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Self-supervised video object segmentation via pseudo label rectification
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-12 DOI: 10.1016/j.patcog.2025.111428
Pinxue Guo , Wei Zhang , Xiaoqiang Li , Jianping Fan , Wenqiang Zhang
In this paper we propose a novel self-supervised framework for video object segmentation (VOS) which consists of siamese encoders and bi-decoders. Siamese encoders extract multi-level features and generate pseudo labels for each pixel by cross attention in visual-semantic space. Such siamese encoders are learned via the colorization task without any labeled video data. Bi-decoders take in features from different layers of the encoder and output refined segmentation masks. Such bi-decoders are trained by the pseudo labels, and in turn pseudo labels are rectified via bi-decoders mutual learning. The variation of the bi-decoders’ outputs is minimized such that the gap between pseudo labels and the ground-truth is reduced. Experimental results on the challenging datasets DAVIS-2017 and YouTube-VOS demonstrate the effectiveness of our proposed approach.
{"title":"Self-supervised video object segmentation via pseudo label rectification","authors":"Pinxue Guo ,&nbsp;Wei Zhang ,&nbsp;Xiaoqiang Li ,&nbsp;Jianping Fan ,&nbsp;Wenqiang Zhang","doi":"10.1016/j.patcog.2025.111428","DOIUrl":"10.1016/j.patcog.2025.111428","url":null,"abstract":"<div><div>In this paper we propose a novel self-supervised framework for video object segmentation (VOS) which consists of siamese encoders and bi-decoders. Siamese encoders extract multi-level features and generate pseudo labels for each pixel by cross attention in visual-semantic space. Such siamese encoders are learned via the colorization task without any labeled video data. Bi-decoders take in features from different layers of the encoder and output refined segmentation masks. Such bi-decoders are trained by the pseudo labels, and in turn pseudo labels are rectified via bi-decoders mutual learning. The variation of the bi-decoders’ outputs is minimized such that the gap between pseudo labels and the ground-truth is reduced. Experimental results on the challenging datasets DAVIS-2017 and YouTube-VOS demonstrate the effectiveness of our proposed approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111428"},"PeriodicalIF":7.5,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio-visual representation learning via knowledge distillation from speech foundation models
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-12 DOI: 10.1016/j.patcog.2025.111432
Jing-Xuan Zhang , Genshun Wan , Jianqing Gao , Zhen-Hua Ling
Audio-visual representation learning is crucial for advancing multimodal speech processing tasks, such as lipreading and audio-visual speech recognition. Recently, speech foundation models (SFMs) have shown remarkable generalization capabilities across various speech-related tasks. Building on this progress, we propose an audio-visual representation learning model that leverages cross-modal knowledge distillation from SFMs. In our method, SFMs serve as teachers, from which multi-layer hidden representations are extracted using clean audio inputs. We also introduce a multi-teacher ensemble method to distill the student, which receives audio-visual data as inputs. A novel representational knowledge distillation loss is employed to train the student during pretraining, which is also applied during finetuning to further enhance the performance on downstream tasks. Our experiments utilized both a self-supervised SFM, WavLM, and a supervised SFM, iFLYTEK-speech. The results demonstrated that our proposed method achieved superior or at least comparable performance to previous state-of-the-art baselines across automatic speech recognition, visual speech recognition, and audio-visual speech recognition tasks. Additionally, comprehensive ablation studies and the visualization of learned representations were conducted to evaluate the effectiveness of our proposed method.
{"title":"Audio-visual representation learning via knowledge distillation from speech foundation models","authors":"Jing-Xuan Zhang ,&nbsp;Genshun Wan ,&nbsp;Jianqing Gao ,&nbsp;Zhen-Hua Ling","doi":"10.1016/j.patcog.2025.111432","DOIUrl":"10.1016/j.patcog.2025.111432","url":null,"abstract":"<div><div>Audio-visual representation learning is crucial for advancing multimodal speech processing tasks, such as lipreading and audio-visual speech recognition. Recently, speech foundation models (SFMs) have shown remarkable generalization capabilities across various speech-related tasks. Building on this progress, we propose an audio-visual representation learning model that leverages cross-modal knowledge distillation from SFMs. In our method, SFMs serve as teachers, from which multi-layer hidden representations are extracted using clean audio inputs. We also introduce a multi-teacher ensemble method to distill the student, which receives audio-visual data as inputs. A novel representational knowledge distillation loss is employed to train the student during pretraining, which is also applied during finetuning to further enhance the performance on downstream tasks. Our experiments utilized both a self-supervised SFM, WavLM, and a supervised SFM, iFLYTEK-speech. The results demonstrated that our proposed method achieved superior or at least comparable performance to previous state-of-the-art baselines across automatic speech recognition, visual speech recognition, and audio-visual speech recognition tasks. Additionally, comprehensive ablation studies and the visualization of learned representations were conducted to evaluate the effectiveness of our proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111432"},"PeriodicalIF":7.5,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143402804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An effective bipartite graph fusion and contrastive label correlation for multi-view multi-label classification
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.patcog.2025.111430
Dawei Zhao , Hong Li , Yixiang Lu , Dong Sun , Qingwei Gao
Graph-based multi-view multi-label learning effectively utilizes the graph structure underlying the samples to integrate information from different views. However, most existing graph construction techniques are computationally complex. We propose an anchor-based bipartite graph fusion method to accelerate graph learning and perform label propagation. First, we employ an ensemble learning strategy that assigns weights to different views to capture complementary information. Second, heterogeneous graphs from different views are linearly fused to obtain a consensus graph, and graph comparative learning is utilized to bring inter-class relationships closer and enhance the quality of label correlation. Finally, we incorporate anchor samples into the decision-making process and jointly optimize the model using bipartite graph fusion and soft label classification with nonlinear extensions. Experimental results on multiple real-world benchmark datasets demonstrate the effectiveness and scalability of our approach compared to state-of-the-art methods.
{"title":"An effective bipartite graph fusion and contrastive label correlation for multi-view multi-label classification","authors":"Dawei Zhao ,&nbsp;Hong Li ,&nbsp;Yixiang Lu ,&nbsp;Dong Sun ,&nbsp;Qingwei Gao","doi":"10.1016/j.patcog.2025.111430","DOIUrl":"10.1016/j.patcog.2025.111430","url":null,"abstract":"<div><div>Graph-based multi-view multi-label learning effectively utilizes the graph structure underlying the samples to integrate information from different views. However, most existing graph construction techniques are computationally complex. We propose an anchor-based bipartite graph fusion method to accelerate graph learning and perform label propagation. First, we employ an ensemble learning strategy that assigns weights to different views to capture complementary information. Second, heterogeneous graphs from different views are linearly fused to obtain a consensus graph, and graph comparative learning is utilized to bring inter-class relationships closer and enhance the quality of label correlation. Finally, we incorporate anchor samples into the decision-making process and jointly optimize the model using bipartite graph fusion and soft label classification with nonlinear extensions. Experimental results on multiple real-world benchmark datasets demonstrate the effectiveness and scalability of our approach compared to state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111430"},"PeriodicalIF":7.5,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143387706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WtNGAN: Unpaired image translation from white light images to narrow-band images
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.patcog.2025.111431
Qinghua Lin , Zuoyong Li , Kun Zeng , Jie Wen , Yuting Jiang , Jian Chen
As one of the most dangerous cancers, gastric cancer poses a serious threat to human health. Currently, gastroscopy remains the preferred method for gastric cancer diagnosis. In gastroscopy, white light and narrow-band light image are two necessary modalities providing deep learning-based multimodal-assisted diagnosis possibilities. However, there is no paired dataset of white-light images (WLIs) and narrow-band images (NBIs), which hinders the development of these methods. To address this problem, we propose an unpaired image-to-image translation network for translating WLI to NBI. Specifically, we first design a generative adversarial network based on Vision Mamba. The generator enhances the detailed representation capability by establishing long-range dependencies and generating images similar to authentic images. Then, we propose a structural consistency constraint to preserve the original tissue structure of the generated images. We also utilize contrastive learning (CL) to maximize the information interaction between the source and target domains. We conduct extensive experiments on a private gastroscopy dataset for translation between WLIs and NBIs. To verify the effectiveness of the proposed method, we also perform the translation between T1 and T2 magnetic resonance images (MRIs) on the BraTS 2021 dataset. The experimental results demonstrate that the proposed method outperforms state-of-the-art methods.
{"title":"WtNGAN: Unpaired image translation from white light images to narrow-band images","authors":"Qinghua Lin ,&nbsp;Zuoyong Li ,&nbsp;Kun Zeng ,&nbsp;Jie Wen ,&nbsp;Yuting Jiang ,&nbsp;Jian Chen","doi":"10.1016/j.patcog.2025.111431","DOIUrl":"10.1016/j.patcog.2025.111431","url":null,"abstract":"<div><div>As one of the most dangerous cancers, gastric cancer poses a serious threat to human health. Currently, gastroscopy remains the preferred method for gastric cancer diagnosis. In gastroscopy, white light and narrow-band light image are two necessary modalities providing deep learning-based multimodal-assisted diagnosis possibilities. However, there is no paired dataset of white-light images (WLIs) and narrow-band images (NBIs), which hinders the development of these methods. To address this problem, we propose an unpaired image-to-image translation network for translating WLI to NBI. Specifically, we first design a generative adversarial network based on Vision Mamba. The generator enhances the detailed representation capability by establishing long-range dependencies and generating images similar to authentic images. Then, we propose a structural consistency constraint to preserve the original tissue structure of the generated images. We also utilize contrastive learning (CL) to maximize the information interaction between the source and target domains. We conduct extensive experiments on a private gastroscopy dataset for translation between WLIs and NBIs. To verify the effectiveness of the proposed method, we also perform the translation between T1 and T2 magnetic resonance images (MRIs) on the BraTS 2021 dataset. The experimental results demonstrate that the proposed method outperforms state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111431"},"PeriodicalIF":7.5,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Max360IQ: Blind omnidirectional image quality assessment with multi-axis attention
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.patcog.2025.111429
Jiebin Yan, Ziwen Tan, Yuming Fang, Jiale Rao, Yifan Zuo
Omnidirectional image, also called 360-degree image, is able to capture the entire 360-degree scene, thereby providing more realistic immersive feelings for users than general 2D image and stereoscopic image. Meanwhile, this feature brings great challenges to measuring the perceptual quality of omnidirectional images, which is closely related to users’ quality of experience, especially when the omnidirectional images suffer from non-uniform distortion. In this paper, we propose a novel and effective blind omnidirectional image quality assessment (BOIQA) model with multi-axis attention (Max360IQ), which can proficiently measure not only the quality of uniformly distorted omnidirectional images but also the quality of non-uniformly distorted omnidirectional images. Specifically, the proposed Max360IQ is mainly composed of a backbone with stacked multi-axis attention modules for capturing both global and local spatial interactions of extracted viewports, a multi-scale feature integration (MSFI) module to fuse multi-scale features and a quality regression module with deep semantic guidance for predicting the quality of omnidirectional images. Experimental results demonstrate that the proposed Max360IQ outperforms the state-of-the-art Assessor360 by 3.6% in terms of SRCC on the JUFE database with non-uniform distortion, and gains improvement of 0.4% and 0.8% in terms of SRCC on the OIQA and CVIQ databases, respectively. The source code is available at https://github.com/WenJuing/Max360IQ.
{"title":"Max360IQ: Blind omnidirectional image quality assessment with multi-axis attention","authors":"Jiebin Yan,&nbsp;Ziwen Tan,&nbsp;Yuming Fang,&nbsp;Jiale Rao,&nbsp;Yifan Zuo","doi":"10.1016/j.patcog.2025.111429","DOIUrl":"10.1016/j.patcog.2025.111429","url":null,"abstract":"<div><div>Omnidirectional image, also called 360-degree image, is able to capture the entire 360-degree scene, thereby providing more realistic immersive feelings for users than general 2D image and stereoscopic image. Meanwhile, this feature brings great challenges to measuring the perceptual quality of omnidirectional images, which is closely related to users’ quality of experience, especially when the omnidirectional images suffer from non-uniform distortion. In this paper, we propose a novel and effective blind omnidirectional image quality assessment (BOIQA) model with multi-axis attention (Max360IQ), which can proficiently measure not only the quality of uniformly distorted omnidirectional images but also the quality of non-uniformly distorted omnidirectional images. Specifically, the proposed Max360IQ is mainly composed of a backbone with stacked multi-axis attention modules for capturing both global and local spatial interactions of extracted viewports, a multi-scale feature integration (MSFI) module to fuse multi-scale features and a quality regression module with deep semantic guidance for predicting the quality of omnidirectional images. Experimental results demonstrate that the proposed Max360IQ outperforms the state-of-the-art Assessor360 by 3.6% in terms of SRCC on the JUFE database with non-uniform distortion, and gains improvement of 0.4% and 0.8% in terms of SRCC on the OIQA and CVIQ databases, respectively. The source code is available at <span><span>https://github.com/WenJuing/Max360IQ</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111429"},"PeriodicalIF":7.5,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep learning approach for effective classification of fingerprint patterns and human behavior analysis
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.patcog.2025.111439
Atul Bhimrao Mokal, Brijendra Gupta
The classification of fingerprints is an imperative guarantee for effective and precise fingerprint detection particularly when discovering fingerprints, but because of higher intra-class variations, smaller inter-class alterations, and clatters the prior techniques need to improve their performance. The aim is to provide a safe and convenient identification and authentication system. It offers more precise consultation for psychologists to identify the behavior of humans and catalogue them. The goal of the study is to identify the behavioral traits of the human based on fingerprint patterns. An automated deep model for the classification of fingerprints for analyzing human behaviours is provided in this paper. Gaussian filter is engaged for abandoning noise from an image. Thereafter the important features like texture-based features and minutiae features are extracted. For determining fingerprint patterns, a Deep Convolutional Neural Network (DCNN) is utilized. The Gannet Bald Optimization (GBO) is employed for training the DCNN to generate the classified patterns that include left loop, plain arch, right loop, tented arch, and whorl. Moreover, each classified pattern is matched with the dictionaries for human behaviour recognition. The proposed GBO-based DCNN obtained high performance and provided better competence when compared with the traditional models.
{"title":"A deep learning approach for effective classification of fingerprint patterns and human behavior analysis","authors":"Atul Bhimrao Mokal,&nbsp;Brijendra Gupta","doi":"10.1016/j.patcog.2025.111439","DOIUrl":"10.1016/j.patcog.2025.111439","url":null,"abstract":"<div><div>The classification of fingerprints is an imperative guarantee for effective and precise fingerprint detection particularly when discovering fingerprints, but because of higher intra-class variations, smaller inter-class alterations, and clatters the prior techniques need to improve their performance. The aim is to provide a safe and convenient identification and authentication system. It offers more precise consultation for psychologists to identify the behavior of humans and catalogue them. The goal of the study is to identify the behavioral traits of the human based on fingerprint patterns. An automated deep model for the classification of fingerprints for analyzing human behaviours is provided in this paper. Gaussian filter is engaged for abandoning noise from an image. Thereafter the important features like texture-based features and minutiae features are extracted. For determining fingerprint patterns, a Deep Convolutional Neural Network (DCNN) is utilized. The Gannet Bald Optimization (GBO) is employed for training the DCNN to generate the classified patterns that include left loop, plain arch, right loop, tented arch, and whorl. Moreover, each classified pattern is matched with the dictionaries for human behaviour recognition. The proposed GBO-based DCNN obtained high performance and provided better competence when compared with the traditional models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111439"},"PeriodicalIF":7.5,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge enhanced prompt learning framework for financial news recommendation
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-11 DOI: 10.1016/j.patcog.2025.111461
ShaoBo Sun , Xiaoming Pan , Shuang Qi , Jun Gao
The aim of financial news recommendation systems is to deliver personalized and timely financial information. Traditional methods face challenges, including the complexity of financial news, which requires stock-related external knowledge and accounts for users' interests in various stocks, industries, and concepts. Additionally, the financial domain's timeliness necessitates adaptable recommender systems, especially in few-shot and cold-start scenarios. To address these challenges, we propose a knowledge-enhanced prompt learning framework for financial news recommendation (FNRKPL). FNRKPL incorporates a financial news knowledge graph and transforms triple information into prompt language to strengthen the recommendation model's knowledge base. Personalized prompt templates are designed to account for users' topic preferences and sentiment tendencies, integrating knowledge, topic, and sentiment prompts. Furthermore, a knowledge-enhanced prompt learning mechanism enhances the model's generalization and adaptability in few-shot and cold-start scenarios. Extensive experiments on real-world corporate datasets validate FNRKPL's effectiveness in both data-rich and resource-poor conditions.
{"title":"Knowledge enhanced prompt learning framework for financial news recommendation","authors":"ShaoBo Sun ,&nbsp;Xiaoming Pan ,&nbsp;Shuang Qi ,&nbsp;Jun Gao","doi":"10.1016/j.patcog.2025.111461","DOIUrl":"10.1016/j.patcog.2025.111461","url":null,"abstract":"<div><div>The aim of financial news recommendation systems is to deliver personalized and timely financial information. Traditional methods face challenges, including the complexity of financial news, which requires stock-related external knowledge and accounts for users' interests in various stocks, industries, and concepts. Additionally, the financial domain's timeliness necessitates adaptable recommender systems, especially in few-shot and cold-start scenarios. To address these challenges, we propose a knowledge-enhanced prompt learning framework for financial news recommendation (FNRKPL). FNRKPL incorporates a financial news knowledge graph and transforms triple information into prompt language to strengthen the recommendation model's knowledge base. Personalized prompt templates are designed to account for users' topic preferences and sentiment tendencies, integrating knowledge, topic, and sentiment prompts. Furthermore, a knowledge-enhanced prompt learning mechanism enhances the model's generalization and adaptability in few-shot and cold-start scenarios. Extensive experiments on real-world corporate datasets validate FNRKPL's effectiveness in both data-rich and resource-poor conditions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111461"},"PeriodicalIF":7.5,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143444503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel deep neural network for identification of sex and ethnicity based on unknown skulls
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-10 DOI: 10.1016/j.patcog.2025.111450
Haibo Zhang, Qianhong Li, Xizhi Wang, Qianyi Wu, Chaohui Ma, Mingquan Zhou, Guohua Geng
The determination of the sex and ethnicity is crucial in the identification of unknown human remains in biology and forensic science. The components of these two biological traits can be effectively evaluated using the skull, which makes it one of the most essential structures for the aforementioned purpose. However, performing simultaneous determination of sex and ethnicity remains a challenge in the identification of unknown humans. In this study, a multi-attribute recognition framework for unknown skulls, which integrates multitask and multiview cross-attention, is proposed. Multi-angle images of the skull first serve as input to a parallel convolutional neural network, yielding its independent view features. To increase the performance of the skull multi-attribute recognition, a view cross-attention mechanism is then introduced. This mechanism uses the independent view features of the skull to obtain global view features. Afterwards, the final output structure is divided into two branches, one used to identify the gender of the skull and the other to identify its ethnicity. The experiment involves 214 samples that consist of 79 samples (41 males and 38 females) from the Han Chinese population in northern China and 135 samples (60 males and 75 females) from the Uyghur population in Xinjiang, China. The results of the experiment demonstrate that the optimal performance of the skull multi-attribute recognition model is obtained when ResNet18 is used as a feature-sharing network. The gender and ethnic identifications for the skull have accuracies of 95.94 % and 98.45 %, respectively. This verifies that the proposed method has high accuracy and generalization ability.
{"title":"A novel deep neural network for identification of sex and ethnicity based on unknown skulls","authors":"Haibo Zhang,&nbsp;Qianhong Li,&nbsp;Xizhi Wang,&nbsp;Qianyi Wu,&nbsp;Chaohui Ma,&nbsp;Mingquan Zhou,&nbsp;Guohua Geng","doi":"10.1016/j.patcog.2025.111450","DOIUrl":"10.1016/j.patcog.2025.111450","url":null,"abstract":"<div><div>The determination of the sex and ethnicity is crucial in the identification of unknown human remains in biology and forensic science. The components of these two biological traits can be effectively evaluated using the skull, which makes it one of the most essential structures for the aforementioned purpose. However, performing simultaneous determination of sex and ethnicity remains a challenge in the identification of unknown humans. In this study, a multi-attribute recognition framework for unknown skulls, which integrates multitask and multiview cross-attention, is proposed. Multi-angle images of the skull first serve as input to a parallel convolutional neural network, yielding its independent view features. To increase the performance of the skull multi-attribute recognition, a view cross-attention mechanism is then introduced. This mechanism uses the independent view features of the skull to obtain global view features. Afterwards, the final output structure is divided into two branches, one used to identify the gender of the skull and the other to identify its ethnicity. The experiment involves 214 samples that consist of 79 samples (41 males and 38 females) from the Han Chinese population in northern China and 135 samples (60 males and 75 females) from the Uyghur population in Xinjiang, China. The results of the experiment demonstrate that the optimal performance of the skull multi-attribute recognition model is obtained when ResNet18 is used as a feature-sharing network. The gender and ethnic identifications for the skull have accuracies of 95.94 % and 98.45 %, respectively. This verifies that the proposed method has high accuracy and generalization ability.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111450"},"PeriodicalIF":7.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143428173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICV-Net: An identity cost volume network for multi-view stereo depth inference
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-09 DOI: 10.1016/j.patcog.2025.111456
Pengpeng He , Yueju Wang , Yangsen Wen , Yong Hu , Wei He
The construction of 3D cost volumes is essential for deep learning-based Multi-view Stereo (MVS) methods. Although cascade cost volumes alleviate the GPU memory overhead and improve depth inference performance in a coarse-to-fine manner, the cascade cost volumes are still not the optimal solution for the learned MVS methods. In this work, we first propose novel identity cost volumes with the identical cost volume size at each stage, which dramatically decreases memory footprint while clearly improving depth prediction accuracy and inference speed. The depth inference is then formulated as a dense-to-sparse search problem that is solved by performing a classification to locate predicted depth values. Specifically, in the first stage, a dense linear search is adopted to calculate an initial depth map. The depth map is then refined by sampling less depth hypotheses in the following two stages. In the final stage, we exploit a binary search with only two depth hypotheses to obtain the final depth map. Combining identity cost volumes with the dense-to-sparse search strategy, we propose an identity cost volume network for MVS, denoted as ICV-Net. The proposed ICV-Net is validated on competitive benchmarks. Experiments show our method can dramatically reduce the memory consumption and extend the learned MVS to higher-resolution scenes. Moreover, our method achieves state-of-the-art accuracy with less runtime. Particularly, among all the learning-based MVS methods, our method achieves the best accuracy (an accuracy score of 0.286) on DTU benchmark with the least GPU memory (with a testing memory overhead of 1221 MB).
{"title":"ICV-Net: An identity cost volume network for multi-view stereo depth inference","authors":"Pengpeng He ,&nbsp;Yueju Wang ,&nbsp;Yangsen Wen ,&nbsp;Yong Hu ,&nbsp;Wei He","doi":"10.1016/j.patcog.2025.111456","DOIUrl":"10.1016/j.patcog.2025.111456","url":null,"abstract":"<div><div>The construction of 3D cost volumes is essential for deep learning-based Multi-view Stereo (MVS) methods. Although cascade cost volumes alleviate the GPU memory overhead and improve depth inference performance in a coarse-to-fine manner, the cascade cost volumes are still not the optimal solution for the learned MVS methods. In this work, we first propose novel identity cost volumes with the identical cost volume size at each stage, which dramatically decreases memory footprint while clearly improving depth prediction accuracy and inference speed. The depth inference is then formulated as a dense-to-sparse search problem that is solved by performing a classification to locate predicted depth values. Specifically, in the first stage, a dense linear search is adopted to calculate an initial depth map. The depth map is then refined by sampling less depth hypotheses in the following two stages. In the final stage, we exploit a binary search with only two depth hypotheses to obtain the final depth map. Combining identity cost volumes with the dense-to-sparse search strategy, we propose an identity cost volume network for MVS, denoted as ICV-Net. The proposed ICV-Net is validated on competitive benchmarks. Experiments show our method can dramatically reduce the memory consumption and extend the learned MVS to higher-resolution scenes. Moreover, our method achieves state-of-the-art accuracy with less runtime. Particularly, among all the learning-based MVS methods, our method achieves the best accuracy (an accuracy score of 0.286) on DTU benchmark with the least GPU memory (with a testing memory overhead of 1221 MB).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111456"},"PeriodicalIF":7.5,"publicationDate":"2025-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143387630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Points Splatting for real-time dynamic Hand Reconstruction
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-07 DOI: 10.1016/j.patcog.2025.111426
Zheheng Jiang , Hossein Rahmani , Sue Black , Bryan Williams
We present 3D Points Splatting Hand Reconstruction (3D-PSHR), a real-time and photo-realistic hand reconstruction approach. We propose a self-adaptive canonical points upsampling strategy to achieve high-resolution hand geometry representation. This is followed by a self-adaptive deformation that deforms the hand from the canonical space to the target pose, adapting to the dynamic changing of canonical points which, in contrast to the common practice of subdividing the MANO model, offers greater flexibility and results in improved geometry fitting. To model texture, we disentangle the appearance color into the intrinsic albedo and pose-aware shading, which are learned through a Context-Attention module. Moreover, our approach allows the geometric and the appearance models to be trained simultaneously in an end-to-end manner. We demonstrate that our method is capable of producing animatable, photorealistic and relightable hand reconstructions using multiple datasets, including monocular videos captured with handheld smartphones and large-scale multi-view videos featuring various hand poses. We also demonstrate that our approach achieves real-time rendering speeds while simultaneously maintaining superior performance compared to existing state-of-the-art methods.
{"title":"3D Points Splatting for real-time dynamic Hand Reconstruction","authors":"Zheheng Jiang ,&nbsp;Hossein Rahmani ,&nbsp;Sue Black ,&nbsp;Bryan Williams","doi":"10.1016/j.patcog.2025.111426","DOIUrl":"10.1016/j.patcog.2025.111426","url":null,"abstract":"<div><div>We present 3D Points Splatting Hand Reconstruction (3D-PSHR), a real-time and photo-realistic hand reconstruction approach. We propose a self-adaptive canonical points upsampling strategy to achieve high-resolution hand geometry representation. This is followed by a self-adaptive deformation that deforms the hand from the canonical space to the target pose, adapting to the dynamic changing of canonical points which, in contrast to the common practice of subdividing the MANO model, offers greater flexibility and results in improved geometry fitting. To model texture, we disentangle the appearance color into the intrinsic albedo and pose-aware shading, which are learned through a Context-Attention module. Moreover, our approach allows the geometric and the appearance models to be trained simultaneously in an end-to-end manner. We demonstrate that our method is capable of producing animatable, photorealistic and relightable hand reconstructions using multiple datasets, including monocular videos captured with handheld smartphones and large-scale multi-view videos featuring various hand poses. We also demonstrate that our approach achieves real-time rendering speeds while simultaneously maintaining superior performance compared to existing state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111426"},"PeriodicalIF":7.5,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1