首页 > 最新文献

Neurocomputing最新文献

英文 中文
Locality-constrained robust discriminant non-negative matrix factorization for depression detection: An fNIRS study 定位约束鲁棒判别非负矩阵分解检测抑郁症:近红外光谱研究
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128887
Yushan Wu , Jitao Zhong , Lu Zhang , Hele Liu , Shuai Shao , Bin Hu , Hong Peng
Major depressive disorder (MDD) is having an increasingly severe impact worldwide, which creates a pressing need for an efficient and objective method of depression detection. Functional near-infrared spectroscopy (fNIRS), which directly monitors changes in cerebral oxygenation, has become an important tool in depression research. Currently, feature extraction methods based on multi-channel fNIRS data often overlook the local structure of the data and the subsequent classification cost. To address these challenges, we introduce an innovative feature extraction algorithm, namely locality-constrained robust discriminant non-negative matrix factorization (LRDNMF). The algorithm incorporates 2,1 regularization, local coordinate constraints, within-class scatter distance, and total scatter distance, achieving a fusion of robustness, locality, and discrimination. LRDNMF enhances feature representation, reduces noise impact, and significantly boosts classification ability. Based on experimental results from 56 participants, LRDNMF achieves an accuracy of 90.55%, a recall of 91.48%, a precision of 90.46%, and an F1 score of 0.91 under full stimuli. These results outperform existing algorithms, validating the effectiveness of LRDNMF and demonstrating its significant potential in auxiliary diagnosis of depression.
重度抑郁症(MDD)在世界范围内的影响日益严重,迫切需要一种高效、客观的抑郁症检测方法。功能近红外光谱(fNIRS)可以直接监测脑氧合的变化,已成为抑郁症研究的重要工具。目前,基于多通道fNIRS数据的特征提取方法往往忽略了数据的局部结构和后续分类成本。为了解决这些挑战,我们引入了一种创新的特征提取算法,即位置约束鲁棒判别非负矩阵分解(LRDNMF)。该算法结合了1,2正则化、局部坐标约束、类内散射距离和总散射距离,实现了鲁棒性、局域性和判别性的融合。LRDNMF增强了特征表示,降低了噪声影响,显著提高了分类能力。基于56名参与者的实验结果,LRDNMF在全刺激下的准确率为90.55%,查全率为91.48%,查全率为90.46%,F1得分为0.91。这些结果优于现有算法,验证了LRDNMF的有效性,并展示了其在抑郁症辅助诊断方面的巨大潜力。
{"title":"Locality-constrained robust discriminant non-negative matrix factorization for depression detection: An fNIRS study","authors":"Yushan Wu ,&nbsp;Jitao Zhong ,&nbsp;Lu Zhang ,&nbsp;Hele Liu ,&nbsp;Shuai Shao ,&nbsp;Bin Hu ,&nbsp;Hong Peng","doi":"10.1016/j.neucom.2024.128887","DOIUrl":"10.1016/j.neucom.2024.128887","url":null,"abstract":"<div><div>Major depressive disorder (MDD) is having an increasingly severe impact worldwide, which creates a pressing need for an efficient and objective method of depression detection. Functional near-infrared spectroscopy (fNIRS), which directly monitors changes in cerebral oxygenation, has become an important tool in depression research. Currently, feature extraction methods based on multi-channel fNIRS data often overlook the local structure of the data and the subsequent classification cost. To address these challenges, we introduce an innovative feature extraction algorithm, namely locality-constrained robust discriminant non-negative matrix factorization (LRDNMF). The algorithm incorporates <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>1</mn></mrow></msub></math></span> regularization, local coordinate constraints, within-class scatter distance, and total scatter distance, achieving a fusion of robustness, locality, and discrimination. LRDNMF enhances feature representation, reduces noise impact, and significantly boosts classification ability. Based on experimental results from 56 participants, LRDNMF achieves an accuracy of 90.55%, a recall of 91.48%, a precision of 90.46%, and an F1 score of 0.91 under full stimuli. These results outperform existing algorithms, validating the effectiveness of LRDNMF and demonstrating its significant potential in auxiliary diagnosis of depression.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128887"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142745309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending the learning using privileged information paradigm to logistic regression 将利用特权信息的学习范式扩展到逻辑回归
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128869
Mario Martínez-García , Susana García-Gutierrez , Lasai Barreñada , Iñaki Inza , Jose A. Lozano
Learning using privileged information paradigm is a learning scenario that exploits privileged features, available at training time, but not at prediction, as additional information for training models. This paper delves into the learning of logistic regression models using privileged information. We provide two new algorithms. For its development, the parameters of a conventional logistic regression trained with all available features, privileged and regular, are projected onto the parameter space associated to regular features (available at training and prediction time). The projection to obtain the model parameters is performed by the minimization of two different loss functions governed by logit terms and posterior probabilities. In addition, a metric is proposed to determine whether the use of privileged information can enhance performance. Experimental results report improvements of our proposals over the performance of conventional logistic regression learned without privileged information.
利用特权信息范式学习是一种利用在训练时可用但在预测时不可用的特权特征作为额外信息来训练模型的学习方案。本文深入探讨了利用特权信息学习逻辑回归模型。我们提供了两种新算法。在开发过程中,使用所有可用特征(特权特征和常规特征)训练的传统逻辑回归参数被投射到与常规特征(在训练和预测时可用)相关的参数空间上。通过对两个不同的损失函数(受对数项和后验概率的制约)进行最小化,投影得到模型参数。此外,我们还提出了一种衡量标准,以确定特权信息的使用是否能提高性能。实验结果表明,与传统的逻辑回归相比,我们的建议在不使用特权信息的情况下提高了性能。
{"title":"Extending the learning using privileged information paradigm to logistic regression","authors":"Mario Martínez-García ,&nbsp;Susana García-Gutierrez ,&nbsp;Lasai Barreñada ,&nbsp;Iñaki Inza ,&nbsp;Jose A. Lozano","doi":"10.1016/j.neucom.2024.128869","DOIUrl":"10.1016/j.neucom.2024.128869","url":null,"abstract":"<div><div>Learning using privileged information paradigm is a learning scenario that exploits privileged features, available at training time, but not at prediction, as additional information for training models. This paper delves into the learning of logistic regression models using privileged information. We provide two new algorithms. For its development, the parameters of a conventional logistic regression trained with all available features, privileged and regular, are projected onto the parameter space associated to regular features (available at training and prediction time). The projection to obtain the model parameters is performed by the minimization of two different loss functions governed by logit terms and posterior probabilities. In addition, a metric is proposed to determine whether the use of privileged information can enhance performance. Experimental results report improvements of our proposals over the performance of conventional logistic regression learned without privileged information.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128869"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142702527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-modal attention and geometric contextual aggregation network for 6DoF object pose estimation 6DoF目标姿态估计的跨模态关注和几何上下文聚合网络
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128891
Yi Guo , Fei Wang , Hao Chu , Shiguang Wen
The availability of affordable RGB-D sensors has made it more suitable to use RGB-D images for accurate 6D pose estimation, which allows for precise 6D parameter prediction using RGB-D images while maintaining a reasonable cost. A crucial research challenge is effectively exploiting adaptive feature extraction and fusion from the appearance information of RGB images and the geometric information of depth images. Moreover, previous methods have neglected the spatial geometric relationships of local position and the properties of point features, which are beneficial for tackling pose estimation in occlusion scenarios. In this work, we propose a cross-attention fusion framework for learning 6D pose estimation from RGB-D images. During the feature extraction stage, we design a geometry-aware context network that encodes local geometric properties of objects in point clouds using dual criteria, distance, and geometric angles. Moreover, we propose a cross-attention framework that combines spatial and channel attention in a cross-modal attention manner. This innovative framework enables us to capture the correlation and importance between RGB and depth features, resulting in improved accuracy in pose estimation, particularly in complex scenes. In the experimental results, we demonstrated that the proposed method outperforms state-of-the-art methods on four challenging benchmark datasets: YCB-Video, LineMOD, Occlusion LineMOD, and MP6D. Video is available at https://youtu.be/4mgdbQKaHOc.
经济实惠的RGB-D传感器的可用性使得它更适合使用RGB-D图像进行准确的6D姿态估计,这允许使用RGB-D图像进行精确的6D参数预测,同时保持合理的成本。如何有效地利用RGB图像的外观信息和深度图像的几何信息进行自适应特征提取和融合,是一个重要的研究挑战。此外,以前的方法忽略了局部位置的空间几何关系和点特征的性质,这有利于解决遮挡场景下的姿态估计问题。在这项工作中,我们提出了一个跨注意力融合框架,用于从RGB-D图像中学习6D姿态估计。在特征提取阶段,我们设计了一个几何感知上下文网络,该网络使用双标准、距离和几何角度编码点云中物体的局部几何属性。此外,我们提出了一个交叉注意框架,以跨模态的方式将空间注意和通道注意结合起来。这个创新的框架使我们能够捕捉RGB和深度特征之间的相关性和重要性,从而提高姿态估计的准确性,特别是在复杂的场景中。在实验结果中,我们证明了所提出的方法在四个具有挑战性的基准数据集上优于最先进的方法:YCB-Video, LineMOD, Occlusion LineMOD和MP6D。视频可在https://youtu.be/4mgdbQKaHOc上获得。
{"title":"Cross-modal attention and geometric contextual aggregation network for 6DoF object pose estimation","authors":"Yi Guo ,&nbsp;Fei Wang ,&nbsp;Hao Chu ,&nbsp;Shiguang Wen","doi":"10.1016/j.neucom.2024.128891","DOIUrl":"10.1016/j.neucom.2024.128891","url":null,"abstract":"<div><div>The availability of affordable RGB-D sensors has made it more suitable to use RGB-D images for accurate 6D pose estimation, which allows for precise 6D parameter prediction using RGB-D images while maintaining a reasonable cost. A crucial research challenge is effectively exploiting adaptive feature extraction and fusion from the appearance information of RGB images and the geometric information of depth images. Moreover, previous methods have neglected the spatial geometric relationships of local position and the properties of point features, which are beneficial for tackling pose estimation in occlusion scenarios. In this work, we propose a cross-attention fusion framework for learning 6D pose estimation from RGB-D images. During the feature extraction stage, we design a geometry-aware context network that encodes local geometric properties of objects in point clouds using dual criteria, distance, and geometric angles. Moreover, we propose a cross-attention framework that combines spatial and channel attention in a cross-modal attention manner. This innovative framework enables us to capture the correlation and importance between RGB and depth features, resulting in improved accuracy in pose estimation, particularly in complex scenes. In the experimental results, we demonstrated that the proposed method outperforms state-of-the-art methods on four challenging benchmark datasets: YCB-Video, LineMOD, Occlusion LineMOD, and MP6D. Video is available at <span><span>https://youtu.be/4mgdbQKaHOc</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128891"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142745312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fully end-to-end EEG to speech translation using multi-scale optimized dual generative adversarial network with cycle-consistency loss 基于循环一致性损失的多尺度优化双生成对抗网络的完全端到端脑电语音翻译
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128916
Chen Ma , Yue Zhang , Yina Guo , Xin Liu , Hong Shangguan , Juan Wang , Luqing Zhao
Decoding auditory evoked electroencephalographic (EEG) signals to correlate them with speech acoustic features and construct transitional signals between different domain signals is a challenging and fascinating research topic. Brain–computer interface (BCI) technologies that incorporate auditory evoked potentials (AEPs) can not only leverage encoder–decoder architectures for signal decoding, but also employ generative adversarial networks (GANs) to translate from human neural activity to speech (T-HNAS). However, in previous research, the cascading ratio of transitional signals leads to varying degrees of information loss in the two-domain signals, and the optimal ratio of transitional signals differs across datasets, impacting the translation effectiveness. To address these issues, an improved dual generative adversarial network based on multi-scale optimization and cycle-consistency loss (MSCC-DualGAN) is proposed. We leverage the feature of cycle consistency loss, which facilitates cross-modal signal conversion, to replace transitional signals and maintain the integrity of signals in both domains during the loss computation process. Multi-scale optimization is utilized to refine the details of signals downsampled by the network, improving the similarity between features, thus enabling efficient, fully end-to-end EEG to speech translation. Furthermore, to validate the efficacy of this network, we construct a new EEG dataset and conduct studies using metrics such as mel cepstral distortion (MCD), pearson correlation coefficient (PCC), and structural similarity index measure (SSIM). Experimental results demonstrate that this new network significantly outperforms previous methods on auditory stimulus datasets.
对听诱发脑电图信号进行解码,使其与语音声学特征相关联,并在不同域信号之间构建过渡信号是一个具有挑战性和吸引力的研究课题。结合听觉诱发电位(AEPs)的脑机接口(BCI)技术不仅可以利用编码器-解码器架构进行信号解码,还可以使用生成对抗网络(gan)将人类神经活动转化为语音(T-HNAS)。然而,在以往的研究中,过渡信号的级联比例导致了二域信号中不同程度的信息损失,并且不同数据集的过渡信号的最佳比例不同,影响了翻译的有效性。为了解决这些问题,提出了一种改进的基于多尺度优化和循环一致性损失的双生成对抗网络(MSCC-DualGAN)。在损失计算过程中,我们利用周期一致性损失的特性来替换过渡信号,并保持两个域信号的完整性。利用多尺度优化对网络下采样的信号细节进行细化,提高特征之间的相似性,从而实现高效、完全端到端脑电到语音的翻译。此外,为了验证该网络的有效性,我们构建了一个新的EEG数据集,并使用mel - cepstral distortion (MCD)、pearson correlation coefficient (PCC)和structural similarity index measure (SSIM)等指标进行了研究。实验结果表明,该网络在听觉刺激数据集上显著优于以往的方法。
{"title":"Fully end-to-end EEG to speech translation using multi-scale optimized dual generative adversarial network with cycle-consistency loss","authors":"Chen Ma ,&nbsp;Yue Zhang ,&nbsp;Yina Guo ,&nbsp;Xin Liu ,&nbsp;Hong Shangguan ,&nbsp;Juan Wang ,&nbsp;Luqing Zhao","doi":"10.1016/j.neucom.2024.128916","DOIUrl":"10.1016/j.neucom.2024.128916","url":null,"abstract":"<div><div>Decoding auditory evoked electroencephalographic (EEG) signals to correlate them with speech acoustic features and construct transitional signals between different domain signals is a challenging and fascinating research topic. Brain–computer interface (BCI) technologies that incorporate auditory evoked potentials (AEPs) can not only leverage encoder–decoder architectures for signal decoding, but also employ generative adversarial networks (GANs) to translate from human neural activity to speech (T-HNAS). However, in previous research, the cascading ratio of transitional signals leads to varying degrees of information loss in the two-domain signals, and the optimal ratio of transitional signals differs across datasets, impacting the translation effectiveness. To address these issues, an improved dual generative adversarial network based on multi-scale optimization and cycle-consistency loss (MSCC-DualGAN) is proposed. We leverage the feature of cycle consistency loss, which facilitates cross-modal signal conversion, to replace transitional signals and maintain the integrity of signals in both domains during the loss computation process. Multi-scale optimization is utilized to refine the details of signals downsampled by the network, improving the similarity between features, thus enabling efficient, fully end-to-end EEG to speech translation. Furthermore, to validate the efficacy of this network, we construct a new EEG dataset and conduct studies using metrics such as mel cepstral distortion (MCD), pearson correlation coefficient (PCC), and structural similarity index measure (SSIM). Experimental results demonstrate that this new network significantly outperforms previous methods on auditory stimulus datasets.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128916"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSscheduler: A parameter synchronization scheduling algorithm for distributed machine learning in reconfigurable optical networks PSscheduler:用于可重构光网络中分布式机器学习的参数同步调度算法
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128876
Ling Liu , Xiaoqiong Xu , Pan Zhou , Xi Chen , Daji Ergu , Hongfang Yu , Gang Sun , Mohsen Guizani
With the increasing size of training datasets and models, parameter synchronization stage puts a heavy burden on the network, and communication has become one of the main performance bottlenecks of distributed machine learning (DML). Concurrently, optical circuit switch (OCS) with high bandwidth and reconfigurable features has increasingly introduced into the construction of network topology, obtaining the reconfigurable optical networks. Actually, OCS is conducive to accelerating the parameter synchronization stage, and thus improves training performance. However, unreasonable circuit scheduling algorithm has a great impact on parameter synchronization time because of non-negligible OCS switching delay. Besides, most of the existing circuit scheduling algorithms do not effectively use the training characteristics of DML, and the performance gains are limited. Therefore, in this paper, we study the parameter synchronization scheduling algorithm in reconfigurable optical networks, and propose PSscheduler by jointly optimizing the circuit scheduling and deployment of parameter servers in parameter server (PS) architecture. Specifically, a mathematical optimization model is established first, which takes into account the deployment of parameter servers, the allocation of parameter blocks and circuit scheduling. Subsequently, the mathematical model is solved by relaxed variables and deterministic rounding approach. The results of simulation based on real DML workloads demonstrate that compared to Sunflow and HLF , PSscheduler is more stable and can reduce parameter synchronization time (PST) by up to 46.61% and 25%, respectively.
随着训练数据集和模型规模的不断扩大,参数同步阶段给网络带来了沉重的负担,通信成为分布式机器学习(DML)的主要性能瓶颈之一。同时,具有高带宽和可重构特性的光电路交换机(OCS)也越来越多地引入到网络拓扑结构中,实现了光网络的可重构。实际上,OCS有利于加速参数同步阶段,从而提高训练性能。然而,由于OCS切换时延不可忽略,不合理的电路调度算法对参数同步时间的影响很大。此外,现有的电路调度算法大多没有有效利用DML的训练特性,性能提升有限。因此,本文研究了可重构光网络中的参数同步调度算法,并通过共同优化参数服务器(PS)架构中参数服务器的电路调度和部署,提出了PSscheduler。具体而言,首先建立了考虑参数服务器部署、参数块分配和电路调度的数学优化模型;然后,采用松弛变量法和确定性舍入法求解数学模型。基于实际DML工作负载的仿真结果表明,与Sunflow和HLF相比,PSscheduler更加稳定,可将参数同步时间(PST)分别减少46.61%和25%。
{"title":"PSscheduler: A parameter synchronization scheduling algorithm for distributed machine learning in reconfigurable optical networks","authors":"Ling Liu ,&nbsp;Xiaoqiong Xu ,&nbsp;Pan Zhou ,&nbsp;Xi Chen ,&nbsp;Daji Ergu ,&nbsp;Hongfang Yu ,&nbsp;Gang Sun ,&nbsp;Mohsen Guizani","doi":"10.1016/j.neucom.2024.128876","DOIUrl":"10.1016/j.neucom.2024.128876","url":null,"abstract":"<div><div>With the increasing size of training datasets and models, parameter synchronization stage puts a heavy burden on the network, and communication has become one of the main performance bottlenecks of distributed machine learning (DML). Concurrently, optical circuit switch (OCS) with high bandwidth and reconfigurable features has increasingly introduced into the construction of network topology, obtaining the reconfigurable optical networks. Actually, OCS is conducive to accelerating the parameter synchronization stage, and thus improves training performance. However, unreasonable circuit scheduling algorithm has a great impact on parameter synchronization time because of non-negligible OCS switching delay. Besides, most of the existing circuit scheduling algorithms do not effectively use the training characteristics of DML, and the performance gains are limited. Therefore, in this paper, we study the parameter synchronization scheduling algorithm in reconfigurable optical networks, and propose PSscheduler by jointly optimizing the circuit scheduling and deployment of parameter servers in parameter server (PS) architecture. Specifically, a mathematical optimization model is established first, which takes into account the deployment of parameter servers, the allocation of parameter blocks and circuit scheduling. Subsequently, the mathematical model is solved by relaxed variables and deterministic rounding approach. The results of simulation based on real DML workloads demonstrate that compared to <em>Sunflow</em> and <em>HLF</em> , PSscheduler is more stable and can reduce parameter synchronization time (PST) by up to 46.61% and 25%, respectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128876"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Potential Knowledge Extraction Network for Class-Incremental Learning 面向类增量学习的潜在知识抽取网络
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128923
Xidong Xi , Guitao Cao , Wenming Cao , Yong Liu , Yan Li , Hong Wang , He Ren
Class-Incremental Learning (CIL) aims to dynamically learn new classes without forgetting the old ones, and it is typically achieved by extracting knowledge from old data and continuously transferring it to new tasks. In the replay-based approaches, selecting appropriate exemplars is of great importance since exemplars represent the most direct form of retaining old knowledge. In this paper, we propose a novel CIL framework: Potential Knowledge Extraction Network (PKENet), which addresses the issue of neglecting the knowledge of inter-sample relation in most existing works and suggests an innovative approach for exemplar selection. Specifically, to address the challenge of knowledge transfer, we design a relation consistency loss and a hybrid cross-entropy loss, where the former works by extracting structural knowledge from the old model while the latter captures graph-wise knowledge, enabling the new model to acquire more old knowledge. To enhance the anti-forgetting effect of exemplar set, we devise a maximum-forgetting-priority method for selecting samples most susceptible to interference from the model’s update. To overcome the prediction bias problem in CIL, we introduce the Total Direct Effect inference method into our model. Experimental results on CIFAR100, ImageNet-Full and ImageNet-Subset datasets show that multiple state-of-the-art CIL methods can be directly combined with our PKENet to reap significant performance improvement. Code: https://github.com/XXDyeah/PKENet.
类增量学习(Class-Incremental Learning, CIL)旨在动态地学习新的类而不忘记旧的类,它通常通过从旧数据中提取知识并不断地将其转移到新的任务中来实现。在基于重播的方法中,选择合适的范例是非常重要的,因为范例代表了保留旧知识的最直接形式。在本文中,我们提出了一个新的CIL框架:潜在知识抽取网络(PKENet),它解决了大多数现有作品中忽视样本间关系知识的问题,并提出了一种创新的范例选择方法。具体来说,为了解决知识转移的挑战,我们设计了一种关系一致性损失和混合交叉熵损失,前者通过从旧模型中提取结构知识,后者通过捕获图知识,使新模型能够获取更多的旧知识。为了增强样本集的抗遗忘效果,我们设计了一种最大遗忘优先级方法来选择最容易受到模型更新干扰的样本。为了克服CIL中的预测偏差问题,我们在模型中引入了总直接效应推理方法。在CIFAR100、ImageNet-Full和imagenet -子集数据集上的实验结果表明,多种最先进的CIL方法可以直接与我们的PKENet相结合,从而获得显着的性能改进。代码:https://github.com/XXDyeah/PKENet。
{"title":"Potential Knowledge Extraction Network for Class-Incremental Learning","authors":"Xidong Xi ,&nbsp;Guitao Cao ,&nbsp;Wenming Cao ,&nbsp;Yong Liu ,&nbsp;Yan Li ,&nbsp;Hong Wang ,&nbsp;He Ren","doi":"10.1016/j.neucom.2024.128923","DOIUrl":"10.1016/j.neucom.2024.128923","url":null,"abstract":"<div><div>Class-Incremental Learning (CIL) aims to dynamically learn new classes without forgetting the old ones, and it is typically achieved by extracting knowledge from old data and continuously transferring it to new tasks. In the replay-based approaches, selecting appropriate exemplars is of great importance since exemplars represent the most direct form of retaining old knowledge. In this paper, we propose a novel CIL framework: <em>Potential Knowledge Extraction Network</em> (PKENet), which addresses the issue of neglecting the knowledge of inter-sample relation in most existing works and suggests an innovative approach for exemplar selection. Specifically, to address the challenge of knowledge transfer, we design a <em>relation consistency loss</em> and a <em>hybrid cross-entropy loss</em>, where the former works by extracting structural knowledge from the old model while the latter captures graph-wise knowledge, enabling the new model to acquire more old knowledge. To enhance the anti-forgetting effect of exemplar set, we devise a <em>maximum-forgetting-priority</em> method for selecting samples most susceptible to interference from the model’s update. To overcome the prediction bias problem in CIL, we introduce the Total Direct Effect inference method into our model. Experimental results on CIFAR100, ImageNet-Full and ImageNet-Subset datasets show that multiple state-of-the-art CIL methods can be directly combined with our PKENet to reap significant performance improvement. Code: <span><span>https://github.com/XXDyeah/PKENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128923"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Superpixel semantics representation and pre-training for vision–language tasks 视觉语言任务的超像素语义表示和预训练
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-17 DOI: 10.1016/j.neucom.2024.128895
Siyu Zhang , Yeming Chen , Yaoru Sun , Fang Wang , Jun Yang , Lizhi Bai , Shangce Gao
The key to integrating visual language tasks is to establish a good alignment strategy. Recently, visual semantic representation has achieved fine-grained visual understanding by dividing grids or image patches. However, the coarse-grained semantic interactions in image space should not be ignored, which hinders the extraction of complex contextual semantic relations at the scene boundaries. This paper proposes superpixels as comprehensive and robust visual primitives, which mine coarse-grained semantic interactions by clustering perceptually similar pixels, speeding up the subsequent processing of primitives. To capture superpixel-level semantic features, we propose a Multiscale Difference Graph Convolutional Network (MDGCN). It allows parsing the entire image as a fine-to-coarse visual hierarchy. To reason actual semantic relations, we reduce potential noise interference by aggregating difference information between adjacent graph nodes. Finally, we propose a multi-level fusion rule in a bottom-up manner to avoid understanding deviation by mining complementary spatial information at different levels. Experiments show that the proposed method can effectively promote the learning of multiple downstream tasks. Encouragingly, our method outperforms previous methods on all metrics.
整合视觉语言任务的关键在于建立良好的对齐策略。最近,视觉语义表征通过划分网格或图像斑块实现了细粒度的视觉理解。然而,图像空间中的粗粒度语义交互不容忽视,这阻碍了对场景边界复杂语境语义关系的提取。本文提出了超像素作为全面而稳健的视觉基元,通过聚类感知上相似的像素来挖掘粗粒度语义交互,从而加快基元的后续处理速度。为了捕捉超像素级的语义特征,我们提出了多尺度差异图卷积网络(MDGCN)。它可以将整个图像解析为一个从细到粗的视觉层次结构。为了推理实际的语义关系,我们通过聚合相邻图节点之间的差异信息来减少潜在的噪声干扰。最后,我们以自下而上的方式提出了一种多层次融合规则,通过挖掘不同层次的互补空间信息来避免理解偏差。实验表明,所提出的方法能有效促进多个下游任务的学习。令人鼓舞的是,我们的方法在所有指标上都优于之前的方法。
{"title":"Superpixel semantics representation and pre-training for vision–language tasks","authors":"Siyu Zhang ,&nbsp;Yeming Chen ,&nbsp;Yaoru Sun ,&nbsp;Fang Wang ,&nbsp;Jun Yang ,&nbsp;Lizhi Bai ,&nbsp;Shangce Gao","doi":"10.1016/j.neucom.2024.128895","DOIUrl":"10.1016/j.neucom.2024.128895","url":null,"abstract":"<div><div>The key to integrating visual language tasks is to establish a good alignment strategy. Recently, visual semantic representation has achieved fine-grained visual understanding by dividing grids or image patches. However, the coarse-grained semantic interactions in image space should not be ignored, which hinders the extraction of complex contextual semantic relations at the scene boundaries. This paper proposes superpixels as comprehensive and robust visual primitives, which mine coarse-grained semantic interactions by clustering perceptually similar pixels, speeding up the subsequent processing of primitives. To capture superpixel-level semantic features, we propose a Multiscale Difference Graph Convolutional Network (MDGCN). It allows parsing the entire image as a fine-to-coarse visual hierarchy. To reason actual semantic relations, we reduce potential noise interference by aggregating difference information between adjacent graph nodes. Finally, we propose a multi-level fusion rule in a bottom-up manner to avoid understanding deviation by mining complementary spatial information at different levels. Experiments show that the proposed method can effectively promote the learning of multiple downstream tasks. Encouragingly, our method outperforms previous methods on all metrics.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"615 ","pages":"Article 128895"},"PeriodicalIF":5.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142703004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of the head number for multi-head self-attention in remaining useful life prediction of rolling bearing and interpretability 头数对多头自注意在滚动轴承剩余使用寿命预测中的影响及可解释性
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-16 DOI: 10.1016/j.neucom.2024.128946
Qiwu Zhao, Xiaoli Zhang, Fangzhen Wang, Panfeng Fan, Erick Mbeka
As one of the machine learning (ML) models, the multi-head self-attention mechanism (MSM) is competent in encoding high-level feature representations, providing computing superiorities, and systematically processing sequences bypassing the recurrent neural networks (RNN) models. However, the model performance and computational results are affected by head number, and the lack of impact interpretability has become a primary obstacle due to the complex internal working mechanisms. Therefore, the effects of the head number of the MSM on the accuracy of the result, the robustness of the model, and computation efficiency are investigated in the remaining useful life (RUL) prediction of rolling bearings. The results show that the accuracy of prediction results will be reduced caused by large or few head numbers. In addition, the more heads are selected, the more robust and higher the predictive efficiency of the model is achieved. The above effects are explained relying on the visualization of the attention weight distribution and functional networks, which are constructed and solved by the equivalent fully connected layer and graph theory analysis, respectively. The model's attention coefficient distribution during training and prediction shows that the representative information will be captured inadequately if fewer heads are selected, which causes MSM to neglect to assign large attention coefficients to degraded information. On the contrary, representational degradation information and redundant information will be acquired by models with too many heads. MSM will be disturbed by this redundant information in the attention weight distribution, resulting in incorrect allocation of attention. Both of these cases will reduce the accuracy of the prediction results. In addition, the selection rules of the head number are established based on the feature complexity that is measured by the sample entropy (SamEn). The local range for head selection is also found based on the relationship between head number and feature complexity; The effects of the head number of the MSM on the robustness of the model and computation efficiency are explained by the changes in the three parameters (average of the clustering coefficients, global efficiency, and of the average shortest path length) of the graph, which is constructed after solving the function network. The research provides a reference for rolling bearing prediction with high computational accuracy, calculation efficiency, and strong robustness using MSM.
作为机器学习(ML)模型之一,多头自注意机制(MSM)能够编码高级特征表示,提供计算优势,并绕过循环神经网络(RNN)模型系统地处理序列。然而,模型性能和计算结果受水头数的影响,且由于内部工作机制复杂,缺乏冲击可解释性已成为主要障碍。因此,在滚动轴承剩余使用寿命(RUL)预测中,研究了MSM头数对结果精度、模型鲁棒性和计算效率的影响。结果表明,头数过大或过小都会降低预测结果的准确性。此外,选择的头部越多,模型的鲁棒性越强,预测效率越高。上述效果的解释依赖于注意力权重分布和功能网络的可视化,它们分别由等效全连通层和图论分析构造和求解。模型在训练和预测过程中的注意系数分布表明,如果选择较少的头部,代表性信息将被捕获不足,这导致MSM忽略为退化信息分配较大的注意系数。相反,头部过多的模型会获取表征退化信息和冗余信息。注意权重分布中的冗余信息会干扰男男性行为,导致注意力分配不正确。这两种情况都会降低预测结果的准确性。此外,根据样本熵(SamEn)衡量的特征复杂度,建立头像数的选择规则。根据头像数与特征复杂度的关系找到头像选择的局部范围;通过求解函数网络后构建的图的三个参数(聚类系数平均值、全局效率和平均最短路径长度)的变化来解释MSM头数对模型鲁棒性和计算效率的影响。该研究为基于MSM的滚动轴承预测提供了计算精度高、计算效率高、鲁棒性强的参考。
{"title":"The effect of the head number for multi-head self-attention in remaining useful life prediction of rolling bearing and interpretability","authors":"Qiwu Zhao,&nbsp;Xiaoli Zhang,&nbsp;Fangzhen Wang,&nbsp;Panfeng Fan,&nbsp;Erick Mbeka","doi":"10.1016/j.neucom.2024.128946","DOIUrl":"10.1016/j.neucom.2024.128946","url":null,"abstract":"<div><div>As one of the machine learning (ML) models, the multi-head self-attention mechanism (MSM) is competent in encoding high-level feature representations, providing computing superiorities, and systematically processing sequences bypassing the recurrent neural networks (RNN) models. However, the model performance and computational results are affected by head number, and the lack of impact interpretability has become a primary obstacle due to the complex internal working mechanisms. Therefore, the effects of the head number of the MSM on the accuracy of the result, the robustness of the model, and computation efficiency are investigated in the remaining useful life (RUL) prediction of rolling bearings. The results show that the accuracy of prediction results will be reduced caused by large or few head numbers. In addition, the more heads are selected, the more robust and higher the predictive efficiency of the model is achieved. The above effects are explained relying on the visualization of the attention weight distribution and functional networks, which are constructed and solved by the equivalent fully connected layer and graph theory analysis, respectively. The model's attention coefficient distribution during training and prediction shows that the representative information will be captured inadequately if fewer heads are selected, which causes MSM to neglect to assign large attention coefficients to degraded information. On the contrary, representational degradation information and redundant information will be acquired by models with too many heads. MSM will be disturbed by this redundant information in the attention weight distribution, resulting in incorrect allocation of attention. Both of these cases will reduce the accuracy of the prediction results. In addition, the selection rules of the head number are established based on the feature complexity that is measured by the sample entropy (SamEn). The local range for head selection is also found based on the relationship between head number and feature complexity; The effects of the head number of the MSM on the robustness of the model and computation efficiency are explained by the changes in the three parameters (average of the clustering coefficients, global efficiency, and of the average shortest path length) of the graph, which is constructed after solving the function network. The research provides a reference for rolling bearing prediction with high computational accuracy, calculation efficiency, and strong robustness using MSM.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128946"},"PeriodicalIF":5.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CRMSP: A semi-supervised approach for key information extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling 基于类再平衡和合并语义伪标记的半监督关键信息提取方法
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-16 DOI: 10.1016/j.neucom.2024.128907
Qi Zhang, Yonghong Song, Pengcheng Guo, Yangyang Hui
There is a growing demand in the field of Key Information Extraction (KIE) to apply semi-supervised learning (SSL) to save manpower and costs, as training document data using fully-supervised methods requires labor-intensive manual annotation. The main challenges of applying SSL in the KIE are (1) underestimation of the confidence of tail classes in the long-tailed distribution and (2) difficulty in achieving intra-class compactness and inter-class separability of tail features. To address these challenges, we propose a novel semi-supervised approach for KIE with Class-Rebalancing and Merged Semantic Pseudo-Labeling (CRMSP). Firstly, the Class-Rebalancing Pseudo-Labeling (CRP) module introduces a reweighting factor to rebalance pseudo-labels, increasing attention to tail classes. Secondly, we propose the Merged Semantic Pseudo-Labeling (MSP) module to cluster tail features of unlabeled data by assigning samples to Merged Prototypes (MP). Additionally, we designed a new contrastive loss specifically for MSP. Extensive experimental results on three well-known benchmarks demonstrate that CRMSP achieves state-of-the-art performance. Remarkably, CRMSP achieves 3.24% f1-score improvement over state-of-the-art on the CORD.
关键信息提取(Key Information Extraction, KIE)领域对半监督学习(semi-supervised learning, SSL)的应用需求日益增长,以节省人力和成本,因为使用全监督方法训练文档数据需要耗费大量劳动的人工标注。在KIE中应用SSL的主要挑战是:(1)低估尾类在长尾分布中的置信度;(2)难以实现尾特征的类内紧密性和类间可分性。为了解决这些挑战,我们提出了一种基于类再平衡和合并语义伪标记(CRMSP)的半监督KIE方法。首先,类再平衡伪标签(CRP)模块引入了一个重新加权因子来重新平衡伪标签,增加了对尾部类的关注。其次,我们提出了合并语义伪标记(MSP)模块,通过将样本分配给合并原型(MP)来对未标记数据的尾部特征进行聚类。此外,我们还专门为MSP设计了一种新的对比损耗。在三个知名基准测试上的大量实验结果表明,CRMSP达到了最先进的性能。值得注意的是,CRMSP在CORD上的得分比最先进的水平提高了3.24%。
{"title":"CRMSP: A semi-supervised approach for key information extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling","authors":"Qi Zhang,&nbsp;Yonghong Song,&nbsp;Pengcheng Guo,&nbsp;Yangyang Hui","doi":"10.1016/j.neucom.2024.128907","DOIUrl":"10.1016/j.neucom.2024.128907","url":null,"abstract":"<div><div>There is a growing demand in the field of Key Information Extraction (KIE) to apply semi-supervised learning (SSL) to save manpower and costs, as training document data using fully-supervised methods requires labor-intensive manual annotation. The main challenges of applying SSL in the KIE are (1) underestimation of the confidence of tail classes in the long-tailed distribution and (2) difficulty in achieving intra-class compactness and inter-class separability of tail features. To address these challenges, we propose a novel semi-supervised approach for KIE with Class-Rebalancing and Merged Semantic Pseudo-Labeling (CRMSP). Firstly, the Class-Rebalancing Pseudo-Labeling (CRP) module introduces a reweighting factor to rebalance pseudo-labels, increasing attention to tail classes. Secondly, we propose the Merged Semantic Pseudo-Labeling (MSP) module to cluster tail features of unlabeled data by assigning samples to Merged Prototypes (MP). Additionally, we designed a new contrastive loss specifically for MSP. Extensive experimental results on three well-known benchmarks demonstrate that CRMSP achieves state-of-the-art performance. Remarkably, CRMSP achieves 3.24% f1-score improvement over state-of-the-art on the CORD.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128907"},"PeriodicalIF":5.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HifiDiff: High-fidelity diffusion model for face hallucination from tiny non-frontal faces HifiDiff:来自微小非正面面孔的面部幻觉的高保真扩散模型
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-16 DOI: 10.1016/j.neucom.2024.128882
Wei Wang, Xing Wang, Yuguang Shi, Xiaobo Lu
Obtaining a high-quality frontal facial image from a low-resolution (LR) non-frontal facial image is crucial for many facial analysis tasks. Recently, diffusion models (DMs) have made impressive progress in near-frontal face super-resolution. However, when faced with non-frontal LR faces, the existing DMs exhibit poor identity preservation and facial detail fidelity. In this paper, we present a novel high-fidelity DM named HifiDiff for simultaneously super-resolving and frontalizing tiny non-frontal facial images. It consists of a two-stage pipeline: facial preview and facial refinement. In the first stage, we pretrain a coarse restoration module to obtain a coarse high-resolution (HR) frontal face, which serves as a superior constraint condition to enhance the ability to solve complex inverse transform issues. In the second stage, we leverage the strong generation capabilities of the latent DM to refine the facial details. Specifically, we design a two-pathway control structure that consists of a facial prior guidance (FPG) module and an identity consistency (IDC) module to control the denoising process. FPG encodes multilevel features derived from latent coarse HR frontal faces and employs hybrid cross-attention to capture their intrinsic correlations with the denoiser features, thereby improving the fidelity of the facial details. IDC utilizes contrastive learning to extract high-level semantic identity-representing features to constrain the denoiser, thereby maintaining the fidelity of facial identities. Extensive experiments demonstrate that our HifiDiff produces both high-fidelity and realistic HR frontal facial images, surpassing other state-of-the-art methods in qualitative and quantitative analyses, as well as in downstream facial recognition tasks.
从低分辨率(LR)的非正面面部图像中获得高质量的正面面部图像对于许多面部分析任务至关重要。近年来,扩散模型(DMs)在近正面人脸超分辨率方面取得了令人瞩目的进展。然而,当面对非正面的LR面孔时,现有的dm表现出较差的身份保存和面部细节保真度。在本文中,我们提出了一种名为HifiDiff的新型高保真DM,用于同时超分辨和正面化微小的非正面面部图像。它包括两个阶段的流水线:面部预览和面部细化。在第一阶段,我们对粗复原模块进行预训练,获得粗高分辨率(HR)正面人脸,这是提高求解复杂反变换问题能力的优越约束条件。在第二阶段,我们利用潜在DM的强大生成能力来细化面部细节。具体来说,我们设计了一个由面部先验引导(FPG)模块和身份一致性(IDC)模块组成的双路径控制结构来控制去噪过程。FPG对来自潜在粗HR正面人脸的多层特征进行编码,并采用混合交叉注意捕获其与去噪特征的内在相关性,从而提高面部细节的保真度。IDC利用对比学习提取高级语义身份表征特征来约束去噪器,从而保持面部身份的保真度。大量的实验表明,我们的HifiDiff可以产生高保真度和逼真的人力资源正面面部图像,在定性和定量分析以及下游面部识别任务中超越其他最先进的方法。
{"title":"HifiDiff: High-fidelity diffusion model for face hallucination from tiny non-frontal faces","authors":"Wei Wang,&nbsp;Xing Wang,&nbsp;Yuguang Shi,&nbsp;Xiaobo Lu","doi":"10.1016/j.neucom.2024.128882","DOIUrl":"10.1016/j.neucom.2024.128882","url":null,"abstract":"<div><div>Obtaining a high-quality frontal facial image from a low-resolution (LR) non-frontal facial image is crucial for many facial analysis tasks. Recently, diffusion models (DMs) have made impressive progress in near-frontal face super-resolution. However, when faced with non-frontal LR faces, the existing DMs exhibit poor identity preservation and facial detail fidelity. In this paper, we present a novel high-fidelity DM named HifiDiff for simultaneously super-resolving and frontalizing tiny non-frontal facial images. It consists of a two-stage pipeline: facial preview and facial refinement. In the first stage, we pretrain a coarse restoration module to obtain a coarse high-resolution (HR) frontal face, which serves as a superior constraint condition to enhance the ability to solve complex inverse transform issues. In the second stage, we leverage the strong generation capabilities of the latent DM to refine the facial details. Specifically, we design a two-pathway control structure that consists of a facial prior guidance (FPG) module and an identity consistency (IDC) module to control the denoising process. FPG encodes multilevel features derived from latent coarse HR frontal faces and employs hybrid cross-attention to capture their intrinsic correlations with the denoiser features, thereby improving the fidelity of the facial details. IDC utilizes contrastive learning to extract high-level semantic identity-representing features to constrain the denoiser, thereby maintaining the fidelity of facial identities. Extensive experiments demonstrate that our HifiDiff produces both high-fidelity and realistic HR frontal facial images, surpassing other state-of-the-art methods in qualitative and quantitative analyses, as well as in downstream facial recognition tasks.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"616 ","pages":"Article 128882"},"PeriodicalIF":5.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neurocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1