首页 > 最新文献

Pattern Recognition Letters最新文献

英文 中文
Semantic-aware hyper-space deformable neural radiance fields for facial avatar reconstruction 用于面部头像重建的语义感知超空间可变形神经辐射场
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-10 DOI: 10.1016/j.patrec.2024.08.004
Kaixin Jin, Xiaoling Gu, Zimeng Wang, Zhenzhong Kuang, Zizhao Wu, Min Tan, Jun Yu

High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at https://github.com/jematy/SAHS-Deformable-Nerf.

从单目视频中重建高保真面部头像是计算机图形学和计算机视觉领域的一个突出研究课题。神经辐射场(NeRF)的最新进展表明,该技术在渲染新颖视图方面具有出色的能力,并因其在面部头像重建方面的潜力而备受关注。然而,以前的方法忽略了头部、躯干和复杂面部特征的复杂运动动态。此外,基于 NeRF 的面部头像重建通用框架也存在不足,既不能适应 3DMM 系数,也不能适应音频输入。为了应对这些挑战,我们提出了一个创新框架,利用语义感知超空间可变形 NeRF,促进从 3DMM 系数或音频特征重建高保真面部头像。我们的框架通过语义引导和统一的超空间变形模块,有效地解决了局部面部运动和更广泛的头部和躯干运动问题。具体来说,我们采用动态加权射线采样策略,将不同程度的注意力分配给不同的语义区域,通过语义引导来增强可变形 NeRF 框架,从而捕捉不同面部区域的精细细节。此外,我们还引入了超空间变形模块,可将观察空间坐标转换为规范超空间坐标,从而学习自然的面部变形和头躯干运动。广泛的实验验证了我们的框架优于现有的最先进方法,证明了它在制作逼真且富有表现力的面部化身方面的有效性。我们的代码见 https://github.com/jematy/SAHS-Deformable-Nerf。
{"title":"Semantic-aware hyper-space deformable neural radiance fields for facial avatar reconstruction","authors":"Kaixin Jin,&nbsp;Xiaoling Gu,&nbsp;Zimeng Wang,&nbsp;Zhenzhong Kuang,&nbsp;Zizhao Wu,&nbsp;Min Tan,&nbsp;Jun Yu","doi":"10.1016/j.patrec.2024.08.004","DOIUrl":"10.1016/j.patrec.2024.08.004","url":null,"abstract":"<div><p>High-fidelity facial avatar reconstruction from monocular videos is a prominent research problem in computer graphics and computer vision. Recent advancements in the Neural Radiance Field (NeRF) have demonstrated remarkable proficiency in rendering novel views and garnered attention for its potential in facial avatar reconstruction. However, previous methodologies have overlooked the complex motion dynamics present across the head, torso, and intricate facial features. Additionally, a deficiency exists in a generalized NeRF-based framework for facial avatar reconstruction adaptable to either 3DMM coefficients or audio input. To tackle these challenges, we propose an innovative framework that leverages semantic-aware hyper-space deformable NeRF, facilitating the reconstruction of high-fidelity facial avatars from either 3DMM coefficients or audio features. Our framework effectively addresses both localized facial movements and broader head and torso motions through semantic guidance and a unified hyper-space deformation module. Specifically, we adopt a dynamic weighted ray sampling strategy to allocate varying degrees of attention to distinct semantic regions, enhancing the deformable NeRF framework with semantic guidance to capture fine-grained details across diverse facial regions. Moreover, we introduce a hyper-space deformation module that enables the transformation of observation space coordinates into canonical hyper-space coordinates, allowing for the learning of natural facial deformation and head-torso movements. Extensive experiments validate the superiority of our framework over existing state-of-the-art methods, demonstrating its effectiveness in producing realistic and expressive facial avatars. Our code is available at <span><span>https://github.com/jematy/SAHS-Deformable-Nerf</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 160-166"},"PeriodicalIF":3.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convolutional Spiking Neural Networks targeting learning and inference in highly imbalanced datasets 以高度不平衡数据集的学习和推理为目标的卷积尖峰神经网络
IF 5.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.patrec.2024.08.002
Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva
Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.
尖峰神经网络(SNN)可在神经形态硬件上实现,因此被视为人工智能的下一个前沿领域,为该领域在现实世界中的应用铺平了道路。SNNs 提供了一种受生物启发的解决方案,具有事件驱动、节能和稀疏的特点。虽然取得了可喜的成果,但仍有一些挑战需要解决。例如,整合架构、学习、超参数优化和推理的 "设计-构建-评估 "流程需要针对具体问题进行定制。这对于金融服务等关键的高风险行业尤为重要。在本文中,我们介绍了一种新型深度卷积尖峰神经网络(CSNN)--SpikeConv,并在高度不平衡的在线银行开户欺诈问题中研究了这一过程。我们的方法与深度尖峰神经网络(DSNN)和梯度提升决策树(GBDT)进行了比较,结果显示我们的方法很有竞争力。
{"title":"Convolutional Spiking Neural Networks targeting learning and inference in highly imbalanced datasets","authors":"Bernardete Ribeiro, Francisco Antunes, Dylan Perdigão, Catarina Silva","doi":"10.1016/j.patrec.2024.08.002","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.08.002","url":null,"abstract":"Spiking Neural Networks (SNNs) are regarded as the next frontier in AI, as they can be implemented on neuromorphic hardware, paving the way for advancements in real-world applications in the field. SNNs provide a biologically inspired solution that is event-driven, energy-efficient and sparse. While showing promising results, there are challenges that need to be addressed. For example, the design-build-evaluate process for integrating the architecture, learning, hyperparameter optimization and inference need to be tailored to a specific problem. This is particularly important in critical high-stakes industries such as finance services. In this paper, we present SpikeConv, a novel deep Convolutional Spiking Neural Network (CSNN), and investigate this process in the context of a highly imbalanced online bank account opening fraud problem. Our approach is compared with Deep Spiking Neural Networks (DSNNs) and Gradient Boosting Decision Trees (GBDT) showing competitive results.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"86 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the special issue on “Computer vision solutions for part-based image analysis and classification (CV_PARTIAL)” "基于部件的图像分析和分类的计算机视觉解决方案(CV_PARTIAL)"特刊简介
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.patrec.2024.07.023
Fabio Narducci, Piercalo Dondi, David Freire Obregón, Florin Pop
{"title":"Introduction to the special issue on “Computer vision solutions for part-based image analysis and classification (CV_PARTIAL)”","authors":"Fabio Narducci,&nbsp;Piercalo Dondi,&nbsp;David Freire Obregón,&nbsp;Florin Pop","doi":"10.1016/j.patrec.2024.07.023","DOIUrl":"10.1016/j.patrec.2024.07.023","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 150-151"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature-consistent coplane-pair correspondence- and fusion-based point cloud registration 基于特征一致的共面对对应和融合的点云注册
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.patrec.2024.08.001
Kuo-Liang Chung, Chia-Chi Hsu, Pei-Hsuan Hsieh

It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.

两个点云的配准是一项重要而具有挑战性的任务,估计出的配准方案可应用于三维视觉。本文首先提出了一种去除离群点的方法,以删除冗余的共面对对应关系,从而构建三个特征一致的共面对对应子集。接着,采用罗德里格斯公式和基于评分的方法求解每个对应子集的代表性配准解。然后,提出一种稳健的融合方法,将三个代表性方案融合为最终的配准方案。基于典型测试数据集的综合实验结果表明,与最先进的方法相比,我们的配准算法在获得良好配准精度的同时,还能显著缩短执行时间。
{"title":"Feature-consistent coplane-pair correspondence- and fusion-based point cloud registration","authors":"Kuo-Liang Chung,&nbsp;Chia-Chi Hsu,&nbsp;Pei-Hsuan Hsieh","doi":"10.1016/j.patrec.2024.08.001","DOIUrl":"10.1016/j.patrec.2024.08.001","url":null,"abstract":"<div><p>It is an important and challenging task to register two point clouds, and the estimated registration solution can be applied in 3D vision. In this paper, an outlier removal method is first proposed to delete redundant coplane-pair correspondences for constructing three feature-consistent coplane-pair correspondence subsets. Next, Rodrigues’ formula and a scoring-based method are adopted to solve the representative registration solution of each correspondence subset. Then, a robust fusion method is proposed to fuse the three representative solutions as the final registration solution. Based on typical testing datasets, comprehensive experimental results demonstrated that with good registration accuracy, our registration algorithm achieves significant execution time reduction effect when compared with the state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 143-149"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing zero-shot object detection with external knowledge-guided robust contrast learning 利用外部知识引导的鲁棒对比度学习增强零镜头物体检测能力
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1016/j.patrec.2024.08.003
Lijuan Duan , Guangyuan Liu , Qing En , Zhaoying Liu , Zhi Gong , Bian Ma

Zero-shot object detection aims to identify objects from unseen categories not present during training. Existing methods rely on category labels to create pseudo-features for unseen categories, but they face limitations in exploring semantic information and lack robustness. To address these issues, we introduce a novel framework, EKZSD, enhancing zero-shot object detection by incorporating external knowledge and contrastive paradigms. This framework enriches semantic diversity, enhancing discriminative ability and robustness. Specifically, we introduce a novel external knowledge extraction module that leverages attribute and relationship prompts to enrich semantic information. Moreover, a novel external knowledge contrastive learning module is proposed to enhance the model’s discriminative and robust capabilities by exploring pseudo-visual features. Additionally, we use cycle consistency learning to align generated visual features with original semantic features and adversarial learning to align visual features with semantic features. Collaboratively trained with contrast learning loss, cycle consistency loss, adversarial learning loss, and classification loss, our framework outperforms superior performance on the MSCOCO and Ship-43 datasets, as demonstrated in experimental results.

零镜头对象检测旨在从训练过程中未出现的未知类别中识别对象。现有方法依赖于类别标签来为未见类别创建伪特征,但这些方法在探索语义信息方面存在局限性,并且缺乏鲁棒性。为了解决这些问题,我们引入了一个新颖的框架 EKZSD,通过结合外部知识和对比范例来增强零镜头对象检测。该框架丰富了语义多样性,增强了判别能力和鲁棒性。具体来说,我们引入了一个新颖的外部知识提取模块,利用属性和关系提示来丰富语义信息。此外,我们还提出了一个新颖的外部知识对比学习模块,通过探索伪视觉特征来增强模型的判别能力和鲁棒性。此外,我们还利用循环一致性学习将生成的视觉特征与原始语义特征相一致,并利用对抗学习将视觉特征与语义特征相一致。通过对比学习损失、周期一致性损失、对抗学习损失和分类损失的协同训练,我们的框架在 MSCOCO 和 Ship-43 数据集上表现出了卓越的性能,实验结果也证明了这一点。
{"title":"Enhancing zero-shot object detection with external knowledge-guided robust contrast learning","authors":"Lijuan Duan ,&nbsp;Guangyuan Liu ,&nbsp;Qing En ,&nbsp;Zhaoying Liu ,&nbsp;Zhi Gong ,&nbsp;Bian Ma","doi":"10.1016/j.patrec.2024.08.003","DOIUrl":"10.1016/j.patrec.2024.08.003","url":null,"abstract":"<div><p>Zero-shot object detection aims to identify objects from unseen categories not present during training. Existing methods rely on category labels to create pseudo-features for unseen categories, but they face limitations in exploring semantic information and lack robustness. To address these issues, we introduce a novel framework, EKZSD, enhancing zero-shot object detection by incorporating external knowledge and contrastive paradigms. This framework enriches semantic diversity, enhancing discriminative ability and robustness. Specifically, we introduce a novel external knowledge extraction module that leverages attribute and relationship prompts to enrich semantic information. Moreover, a novel external knowledge contrastive learning module is proposed to enhance the model’s discriminative and robust capabilities by exploring pseudo-visual features. Additionally, we use cycle consistency learning to align generated visual features with original semantic features and adversarial learning to align visual features with semantic features. Collaboratively trained with contrast learning loss, cycle consistency loss, adversarial learning loss, and classification loss, our framework outperforms superior performance on the MSCOCO and Ship-43 datasets, as demonstrated in experimental results.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 152-159"},"PeriodicalIF":3.9,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring percolation features with polynomial algorithms for classifying Covid-19 in chest X-ray images 利用多项式算法探索渗流特征,对胸部 X 光图像中的 Covid-19 进行分类
IF 5.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-02 DOI: 10.1016/j.patrec.2024.07.022
Guilherme F. Roberto, Danilo C. Pereira, Alessandro S. Martins, Thaína A.A. Tosta, Carlos Soares, Alessandra Lumini, Guilherme B. Rozendo, Leandro A. Neves, Marcelo Z. Nascimento
Covid-19 is a severe illness caused by the Sars-CoV-2 virus, initially identified in China in late 2019 and swiftly spreading globally. Since the virus primarily impacts the lungs, analyzing chest X-rays stands as a reliable and widely accessible means of diagnosing the infection. In computer vision, deep learning models such as CNNs have been the main adopted approach for detection of Covid-19 in chest X-ray images. However, we believe that handcrafted features can also provide relevant results, as shown previously in similar image classification challenges. In this study, we propose a method for identifying Covid-19 in chest X-ray images by extracting and classifying local and global percolation-based features. This technique was tested on three datasets: one comprising 2,002 segmented samples categorized into two groups (Covid-19 and Healthy); another with 1,125 non-segmented samples categorized into three groups (Covid-19, Healthy, and Pneumonia); and a third one composed of 4,809 non-segmented images representing three classes (Covid-19, Healthy, and Pneumonia). Then, 48 percolation features were extracted and give as input into six distinct classifiers. Subsequently, the AUC and accuracy metrics were assessed. We used the 10-fold cross-validation approach and evaluated lesion sub-types via binary and multiclass classification using the Hermite polynomial classifier, a novel approach in this domain. The Hermite polynomial classifier exhibited the most promising outcomes compared to five other machine learning algorithms, wherein the best obtained values for accuracy and AUC were 98.72% and 0.9917, respectively. We also evaluated the influence of noise in the features and in the classification accuracy. These results, based in the integration of percolation features with the Hermite polynomial, hold the potential for enhancing lesion detection and supporting clinicians in their diagnostic endeavors.
Covid-19是由Sars-CoV-2病毒引起的一种严重疾病,最初于2019年底在中国发现,并迅速在全球蔓延。由于该病毒主要影响肺部,因此分析胸部 X 光片是诊断该病毒感染的一种可靠而广泛的手段。在计算机视觉领域,CNN 等深度学习模型一直是检测胸部 X 光图像中 Covid-19 的主要方法。不过,我们认为,正如之前在类似的图像分类挑战中所显示的那样,手工制作的特征也能提供相关结果。在本研究中,我们提出了一种通过提取和分类基于局部和全局渗滤的特征来识别胸部 X 光图像中 Covid-19 的方法。该技术在三个数据集上进行了测试:一个数据集由 2,002 个分割样本组成,分为两组(Covid-19 和健康);另一个数据集由 1,125 个非分割样本组成,分为三组(Covid-19、健康和肺炎);第三个数据集由 4,809 个非分割图像组成,代表三个类别(Covid-19、健康和肺炎)。然后,提取 48 个渗滤特征,并将其作为输入输入到六个不同的分类器中。随后,评估了 AUC 和准确度指标。我们采用了 10 倍交叉验证方法,并使用 Hermite 多项式分类器(这是该领域的一种新方法)通过二分类和多分类对病变子类型进行了评估。与其他五种机器学习算法相比,Hermite 多项式分类器表现出了最有前途的结果,准确率和 AUC 的最佳值分别为 98.72% 和 0.9917。我们还评估了噪声对特征和分类准确率的影响。这些结果基于渗滤特征与赫米特多项式的整合,有望提高病变检测能力,并为临床医生的诊断工作提供支持。
{"title":"Exploring percolation features with polynomial algorithms for classifying Covid-19 in chest X-ray images","authors":"Guilherme F. Roberto, Danilo C. Pereira, Alessandro S. Martins, Thaína A.A. Tosta, Carlos Soares, Alessandra Lumini, Guilherme B. Rozendo, Leandro A. Neves, Marcelo Z. Nascimento","doi":"10.1016/j.patrec.2024.07.022","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.07.022","url":null,"abstract":"Covid-19 is a severe illness caused by the Sars-CoV-2 virus, initially identified in China in late 2019 and swiftly spreading globally. Since the virus primarily impacts the lungs, analyzing chest X-rays stands as a reliable and widely accessible means of diagnosing the infection. In computer vision, deep learning models such as CNNs have been the main adopted approach for detection of Covid-19 in chest X-ray images. However, we believe that handcrafted features can also provide relevant results, as shown previously in similar image classification challenges. In this study, we propose a method for identifying Covid-19 in chest X-ray images by extracting and classifying local and global percolation-based features. This technique was tested on three datasets: one comprising 2,002 segmented samples categorized into two groups (Covid-19 and Healthy); another with 1,125 non-segmented samples categorized into three groups (Covid-19, Healthy, and Pneumonia); and a third one composed of 4,809 non-segmented images representing three classes (Covid-19, Healthy, and Pneumonia). Then, 48 percolation features were extracted and give as input into six distinct classifiers. Subsequently, the AUC and accuracy metrics were assessed. We used the 10-fold cross-validation approach and evaluated lesion sub-types via binary and multiclass classification using the Hermite polynomial classifier, a novel approach in this domain. The Hermite polynomial classifier exhibited the most promising outcomes compared to five other machine learning algorithms, wherein the best obtained values for accuracy and AUC were 98.72% and 0.9917, respectively. We also evaluated the influence of noise in the features and in the classification accuracy. These results, based in the integration of percolation features with the Hermite polynomial, hold the potential for enhancing lesion detection and supporting clinicians in their diagnostic endeavors.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"24 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature decomposition-based gaze estimation with auxiliary head pose regression 基于特征分解的凝视估计与辅助头部姿态回归
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-02 DOI: 10.1016/j.patrec.2024.07.021
Ke Ni, Jing Chen, Jian Wang, Bo Liu, Ting Lei, Yongtian Wang

Recognition and understanding of facial images or eye images are critical for eye tracking. Recent studies have shown that the simultaneous use of facial and eye images can effectively lower gaze errors. However, these methods typically consider facial and eye images as two unrelated inputs, without taking into account their distinct representational abilities at the feature level. Additionally, implicitly learned head pose from highly coupled facial features would make the trained model less interpretable and prone to the gaze-head overfitting problem. To address these issues, we propose a method that aims to enhance task-relevant features while suppressing other noises by leveraging feature decomposition. We disentangle eye-related features from the facial image via a projection module and further make them distinctive with an attention-based head pose regression task, which could enhance the representation of gaze-related features and make the model less susceptible to task-irrelevant features. After that, the mutually separated eye features and head pose are recombined to achieve more accurate gaze estimation. Experimental results demonstrate that our method achieves state-of-the-art performance, with an estimation error of 3.90° on the MPIIGaze dataset and 5.15° error on the EyeDiap dataset, respectively.

识别和理解面部图像或眼部图像对于眼动跟踪至关重要。最近的研究表明,同时使用面部图像和眼部图像可以有效降低注视误差。然而,这些方法通常将面部图像和眼部图像视为两个互不相关的输入,而没有考虑到它们在特征层面的不同表征能力。此外,从高度耦合的面部特征中隐含学习到的头部姿势会降低训练模型的可解释性,并容易出现凝视-头部过拟合问题。为了解决这些问题,我们提出了一种方法,旨在通过利用特征分解来增强任务相关特征,同时抑制其他噪音。我们通过投影模块将眼部相关特征从面部图像中分离出来,并通过基于注意力的头部姿势回归任务进一步使其与众不同,这可以增强凝视相关特征的代表性,并使模型不易受任务无关特征的影响。然后,将相互分离的眼部特征和头部姿势重新组合,以实现更精确的注视估计。实验结果表明,我们的方法达到了最先进的性能,在 MPIIGaze 数据集上的估计误差为 3.90°,在 EyeDiap 数据集上的误差为 5.15°。
{"title":"Feature decomposition-based gaze estimation with auxiliary head pose regression","authors":"Ke Ni,&nbsp;Jing Chen,&nbsp;Jian Wang,&nbsp;Bo Liu,&nbsp;Ting Lei,&nbsp;Yongtian Wang","doi":"10.1016/j.patrec.2024.07.021","DOIUrl":"10.1016/j.patrec.2024.07.021","url":null,"abstract":"<div><p>Recognition and understanding of facial images or eye images are critical for eye tracking. Recent studies have shown that the simultaneous use of facial and eye images can effectively lower gaze errors. However, these methods typically consider facial and eye images as two unrelated inputs, without taking into account their distinct representational abilities at the feature level. Additionally, implicitly learned head pose from highly coupled facial features would make the trained model less interpretable and prone to the gaze-head overfitting problem. To address these issues, we propose a method that aims to enhance task-relevant features while suppressing other noises by leveraging feature decomposition. We disentangle eye-related features from the facial image via a projection module and further make them distinctive with an attention-based head pose regression task, which could enhance the representation of gaze-related features and make the model less susceptible to task-irrelevant features. After that, the mutually separated eye features and head pose are recombined to achieve more accurate gaze estimation. Experimental results demonstrate that our method achieves state-of-the-art performance, with an estimation error of 3.90° on the MPIIGaze dataset and 5.15° error on the EyeDiap dataset, respectively.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 137-142"},"PeriodicalIF":3.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial self-training for robustness and generalization 逆向自训练,实现稳健性和通用性
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-02 DOI: 10.1016/j.patrec.2024.07.020
Zhuorong Li , Minghui Wu , Canghong Jin , Daiwei Yu , Hongchuan Yu

Adversarial training is currently one of the most promising ways to achieve adversarial robustness of deep models. However, even the most sophisticated training methods is far from satisfactory, as improvement in robustness requires either heuristic strategies or more annotated data, which might be problematic in real-world applications. To alleviate these issues, we propose an effective training scheme that avoids prohibitively high cost of additional labeled data by adapting self-training scheme to adversarial training. In particular, we first use the confident prediction for a randomly-augmented image as the pseudo-label for self-training. Then we enforce the consistency regularization by targeting the adversarially-perturbed version of the same image at the pseudo-label, which implicitly suppresses the distortion of representation in latent space. Despite its simplicity, extensive experiments show that our regularization could bring significant advancement in adversarial robustness of a wide range of adversarial training methods and helps the model to generalize its robustness to larger perturbations or even against unseen adversaries.

是目前最有希望实现深度模型对抗鲁棒性的方法之一。然而,即使是最复杂的训练方法也远不能令人满意,因为提高鲁棒性需要启发式策略或更多标注数据,而这在实际应用中可能会遇到问题。为了缓解这些问题,我们提出了一种有效的训练方案,通过将自我训练方案调整为对抗训练,避免了额外标记数据的高昂成本。具体来说,我们首先使用随机增强图像的可信预测作为自我训练的伪标签。然后,我们通过将同一图像的对抗扰动版本作为伪标签来执行一致性正则化,从而隐式地抑制了潜在空间中的表征失真。尽管方法简单,但大量实验表明,我们的正则化可以显著提高各种对抗训练方法的对抗鲁棒性,并帮助模型将其鲁棒性扩展到更大的扰动,甚至对抗未见过的对抗者。
{"title":"Adversarial self-training for robustness and generalization","authors":"Zhuorong Li ,&nbsp;Minghui Wu ,&nbsp;Canghong Jin ,&nbsp;Daiwei Yu ,&nbsp;Hongchuan Yu","doi":"10.1016/j.patrec.2024.07.020","DOIUrl":"10.1016/j.patrec.2024.07.020","url":null,"abstract":"<div><p><em>Adversarial training</em> is currently one of the most promising ways to achieve adversarial robustness of deep models. However, even the most sophisticated training methods is far from satisfactory, as improvement in robustness requires either heuristic strategies or more annotated data, which might be problematic in real-world applications. To alleviate these issues, we propose an effective training scheme that avoids prohibitively high cost of additional labeled data by adapting self-training scheme to adversarial training. In particular, we first use the confident prediction for a randomly-augmented image as the pseudo-label for self-training. Then we enforce the consistency regularization by targeting the adversarially-perturbed version of the same image at the pseudo-label, which implicitly suppresses the distortion of representation in latent space. Despite its simplicity, extensive experiments show that our regularization could bring significant advancement in adversarial robustness of a wide range of adversarial training methods and helps the model to generalize its robustness to larger perturbations or even against unseen adversaries.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 117-123"},"PeriodicalIF":3.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141945959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Special session on IbPRIA 2023 社论:IbPRIA 2023 特别会议
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-01 DOI: 10.1016/j.patrec.2024.06.023
{"title":"Editorial: Special session on IbPRIA 2023","authors":"","doi":"10.1016/j.patrec.2024.06.023","DOIUrl":"10.1016/j.patrec.2024.06.023","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Page 238"},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141511007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding class dynamics in learning with noisy labels 在有噪声标签的学习中解码类别动态
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-01 DOI: 10.1016/j.patrec.2024.04.012

The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at https://github.com/aldakata/CCLM/

创建由人类注释的大规模数据集不可避免地会引入噪声标签,导致深度学习模型的泛化能力降低。基于样本选择的噪声标签学习是最近的一种方法,它在性能提升方面大有可为。在这些模型的学习过程中,从噪声样本中选择干净样本是一个重要标准。在这项工作中,我们深入探讨了 "干净样本-噪声样本 "的划分决策,并强调了有效划分样本将带来更好性能的观点。我们发现了现有模型中的 "全局噪声难题",即对样本分布进行全局处理。我们提出了一种基于每个类别的局部样本分布方法,并证明了这种方法在更好地划分净噪方面的有效性。我们在多个基准(包括真实基准和合成基准)上验证了我们的建议,结果表明,与不同的先进算法相比,我们的建议有了实质性的改进。我们进一步提出了一个新的指标--分类度,以扩展我们的分析并突出所提方法的有效性。本文的源代码和复制说明可从 https://github.com/aldakata/CCLM/ 网站获取。
{"title":"Decoding class dynamics in learning with noisy labels","authors":"","doi":"10.1016/j.patrec.2024.04.012","DOIUrl":"10.1016/j.patrec.2024.04.012","url":null,"abstract":"<div><p><span>The creation of large-scale datasets annotated by humans inevitably introduces noisy labels, leading to reduced generalization in deep-learning models. Sample selection-based learning with noisy labels is a recent approach that exhibits promising upbeat performance improvements<span>. The selection of clean samples amongst the noisy samples is an important criterion in the learning process of these models. In this work, we delve deeper into the clean-noise split decision and highlight the aspect that effective demarcation of samples would lead to better performance. We identify the Global Noise Conundrum in the existing models, where the distribution of samples is treated globally. We propose a per-class-based local distribution of samples and demonstrate the effectiveness of this approach in having a better clean-noise split. We validate our proposal on several benchmarks — both real and synthetic, and show substantial improvements over different state-of-the-art algorithms. We further propose a new metric, classiness to extend our analysis and highlight the effectiveness of the proposed method. Source code and instructions to reproduce this paper are available at </span></span><span><span>https://github.com/aldakata/CCLM/</span><svg><path></path></svg></span></p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 239-245"},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140777367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1