首页 > 最新文献

Neural Networks最新文献

英文 中文
GlobalSR: Global context network for single image super-resolution via deformable convolution attention and fast Fourier convolution GlobalSR:通过可变形卷积注意和快速傅立叶卷积实现单幅图像超分辨率的全局上下文网络
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-31 DOI: 10.1016/j.neunet.2024.106686

Vision Transformer have achieved impressive performance in image super-resolution. However, they suffer from low inference speed mainly because of the quadratic complexity of multi-head self-attention (MHSA), which is the key to learning long-range dependencies. On the contrary, most CNN-based methods neglect the important effect of global contextual information, resulting in inaccurate and blurring details. If one can make the best of both Transformers and CNNs, it will achieve a better trade-off between image quality and inference speed. Based on this observation, firstly assume that the main factor affecting the performance in the Transformer-based SR models is the general architecture design, not the specific MHSA component. To verify this, some ablation studies are made by replacing MHSA with large kernel convolutions, alongside other essential module replacements. Surprisingly, the derived models achieve competitive performance. Therefore, a general architecture design GlobalSR is extracted by not specifying the core modules including blocks and domain embeddings of Transformer-based SR models. It also contains three practical guidelines for designing a lightweight SR network utilizing image-level global contextual information to reconstruct SR images. Following the guidelines, the blocks and domain embeddings of GlobalSR are instantiated via Deformable Convolution Attention Block (DCAB) and Fast Fourier Convolution Domain Embedding (FCDE), respectively. The instantiation of general architecture, termed GlobalSR-DF, proposes a DCA to extract the global contextual feature by utilizing Deformable Convolution and a Hadamard product as the attention map at the block level. Meanwhile, the FCDE utilizes the Fast Fourier to transform the input spatial feature into frequency space and then extract image-level global information from it by convolutions. Extensive experiments demonstrate that GlobalSR is the key part in achieving a superior trade-off between SR quality and efficiency. Specifically, our proposed GlobalSR-DF outperforms state-of-the-art CNN-based and ViT-based SISR models regarding accuracy-speed trade-offs with sharp and natural details.

视觉变换器在图像超分辨率方面取得了令人瞩目的成就。然而,它们的推理速度较低,主要原因是多头自注意(MHSA)的二次方复杂性,而多头自注意是学习长距离依赖关系的关键。相反,大多数基于 CNN 的方法忽视了全局上下文信息的重要作用,导致细节不准确和模糊。如果能同时利用变换器和 CNN,就能在图像质量和推理速度之间实现更好的权衡。基于这一观点,我们首先假设影响基于变换器的 SR 模型性能的主要因素是总体架构设计,而不是特定的 MHSA 组件。为了验证这一点,我们进行了一些消融研究,将 MHSA 替换为大内核卷积,同时替换了其他重要模块。出乎意料的是,衍生模型实现了具有竞争力的性能。因此,通过不指定核心模块(包括基于变换器的 SR 模型的模块和域嵌入),提取出了通用架构设计 GlobalSR。它还包含三个实用指南,用于设计利用图像级全局上下文信息重建 SR 图像的轻量级 SR 网络。根据指南,GlobalSR 的块和域嵌入分别通过可变形卷积注意块(DCAB)和快速傅立叶卷积域嵌入(FCDE)实例化。一般架构的实例化被称为 GlobalSR-DF,它通过利用可变形卷积和哈达玛乘积作为块级的注意图,提出了一种 DCA 来提取全局上下文特征。同时,FCDE 利用快速傅里叶将输入的空间特征转换到频率空间,然后通过卷积从中提取图像级的全局信息。广泛的实验证明,GlobalSR 是在 SR 质量和效率之间实现出色权衡的关键部分。具体来说,我们提出的 GlobalSR-DF 在精度-速度权衡方面优于最先进的基于 CNN 和基于 ViT 的 SISR 模型,细节清晰自然。
{"title":"GlobalSR: Global context network for single image super-resolution via deformable convolution attention and fast Fourier convolution","authors":"","doi":"10.1016/j.neunet.2024.106686","DOIUrl":"10.1016/j.neunet.2024.106686","url":null,"abstract":"<div><p>Vision Transformer have achieved impressive performance in image super-resolution. However, they suffer from low inference speed mainly because of the quadratic complexity of multi-head self-attention (MHSA), which is the key to learning long-range dependencies. On the contrary, most CNN-based methods neglect the important effect of global contextual information, resulting in inaccurate and blurring details. If one can make the best of both Transformers and CNNs, it will achieve a better trade-off between image quality and inference speed. Based on this observation, firstly assume that the main factor affecting the performance in the Transformer-based SR models is the general architecture design, not the specific MHSA component. To verify this, some ablation studies are made by replacing MHSA with large kernel convolutions, alongside other essential module replacements. Surprisingly, the derived models achieve competitive performance. Therefore, a general architecture design GlobalSR is extracted by not specifying the core modules including blocks and domain embeddings of Transformer-based SR models. It also contains three practical guidelines for designing a lightweight SR network utilizing image-level global contextual information to reconstruct SR images. Following the guidelines, the blocks and domain embeddings of GlobalSR are instantiated via Deformable Convolution Attention Block (DCAB) and Fast Fourier Convolution Domain Embedding (FCDE), respectively. The instantiation of general architecture, termed GlobalSR-DF, proposes a DCA to extract the global contextual feature by utilizing Deformable Convolution and a Hadamard product as the attention map at the block level. Meanwhile, the FCDE utilizes the Fast Fourier to transform the input spatial feature into frequency space and then extract image-level global information from it by convolutions. Extensive experiments demonstrate that GlobalSR is the key part in achieving a superior trade-off between SR quality and efficiency. Specifically, our proposed GlobalSR-DF outperforms state-of-the-art CNN-based and ViT-based SISR models regarding accuracy-speed trade-offs with sharp and natural details.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0893608024006105/pdfft?md5=827cf63b9cd5ed18c3a60b975de54883&pid=1-s2.0-S0893608024006105-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142162053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JFDI: Joint Feature Differentiation and Interaction for domain adaptive object detection JFDI:用于域自适应物体检测的联合特征区分和交互。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-31 DOI: 10.1016/j.neunet.2024.106682

In unsupervised domain adaptive object detection, learning target-specific features is pivotal in enhancing detector performance. However, previous methods mostly concentrated on aligning domain-invariant features across domains and neglected integrating the specific features. To tackle this issue, we introduce a novel feature learning method called Joint Feature Differentiation and Interaction (JFDI), which significantly boosts the adaptability of the object detector. We construct a dual-path architecture based on we proposed feature differentiate modules: One path, guided by the source domain data, utilizes multiple discriminators to confuse and align domain-invariant features. The other path, specifically tailored to the target domain, learns its distinctive characteristics based on pseudo-labeled target data. Subsequently, we implement an interactive enhanced mechanism between these paths to ensure stable learning of features and mitigate interference from pseudo-label noise during the iterative optimization. Additionally, we devise a hierarchical pseudo-label fusion module that consolidates more comprehensive and reliable results. In addition, we analyze the generalization error bound of JFDI, which provides a theoretical basis for the effectiveness of JFDI. Extensive empirical evaluations across diverse benchmark scenarios demonstrate that our method is advanced and efficient.

在无监督域自适应目标检测中,学习目标特定特征对提高检测器性能至关重要。然而,以往的方法大多集中于跨域对齐域不变特征,而忽视了对特定特征的整合。为了解决这个问题,我们引入了一种名为联合特征区分和交互(JFDI)的新型特征学习方法,它能显著提高目标检测器的适应性。我们基于所提出的特征区分模块构建了一个双路径架构:其中一条路径以源领域数据为指导,利用多个判别器来混淆和对齐与领域无关的特征。另一条路径专门针对目标领域,根据伪标记的目标数据学习其独特特征。随后,我们在这些路径之间实施了一种交互式增强机制,以确保特征的稳定学习,并在迭代优化过程中减少伪标签噪声的干扰。此外,我们还设计了一个分层伪标签融合模块,以巩固更全面、更可靠的结果。此外,我们还分析了 JFDI 的泛化误差边界,为 JFDI 的有效性提供了理论依据。在不同基准场景下进行的广泛实证评估证明了我们的方法是先进而高效的。
{"title":"JFDI: Joint Feature Differentiation and Interaction for domain adaptive object detection","authors":"","doi":"10.1016/j.neunet.2024.106682","DOIUrl":"10.1016/j.neunet.2024.106682","url":null,"abstract":"<div><p>In unsupervised domain adaptive object detection, learning target-specific features is pivotal in enhancing detector performance. However, previous methods mostly concentrated on aligning domain-invariant features across domains and neglected integrating the specific features. To tackle this issue, we introduce a novel feature learning method called Joint Feature Differentiation and Interaction (JFDI), which significantly boosts the adaptability of the object detector. We construct a dual-path architecture based on we proposed feature differentiate modules: One path, guided by the source domain data, utilizes multiple discriminators to confuse and align domain-invariant features. The other path, specifically tailored to the target domain, learns its distinctive characteristics based on pseudo-labeled target data. Subsequently, we implement an interactive enhanced mechanism between these paths to ensure stable learning of features and mitigate interference from pseudo-label noise during the iterative optimization. Additionally, we devise a hierarchical pseudo-label fusion module that consolidates more comprehensive and reliable results. In addition, we analyze the generalization error bound of JFDI, which provides a theoretical basis for the effectiveness of JFDI. Extensive empirical evaluations across diverse benchmark scenarios demonstrate that our method is advanced and efficient.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dual-region speech enhancement method based on voiceprint segmentation 基于声纹分割的双区域语音增强方法
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-31 DOI: 10.1016/j.neunet.2024.106683

Single-channel speech enhancement primarily relies on deep learning models to recover clean speech signals from noise-contaminated speech. These models establish a mapping relationship between noisy and clean speech. However, considering the sparse distribution characteristics of speech energy across the entire time–frequency spectrogram, constructing the mapping relationship from noisy to clean speech exhibits significant differences in regions where speech energy is concentrated and non-concentrated. Utilizing one deep model to simultaneously address these two distinct regression tasks increases the complexity of the mapping relationships, consequently restricting the model’s performance. To validate our hypothesis, we propose a dual-region speech enhancement model based on voiceprint region segmentation. Specifically, we first train a voiceprint segmentation model to classify noisy speech into two regions. Subsequently, we establish dedicated speech enhancement models for each region, with the dual-region models concurrently constructing mapping relationships for noise-corrupted speech to clean speech in distinct regions. Finally, by merging the results, the complete restored speech can be obtained. Experimental results on public datasets demonstrate that our method achieves competitive speech enhancement performance, outperforming the state-of-the-art. Ablation study results confirm the effectiveness of the proposed approach in enhancing model performance.

单通道语音增强主要依靠深度学习模型从噪声污染的语音中恢复干净的语音信号。这些模型建立了噪声语音和干净语音之间的映射关系。然而,考虑到语音能量在整个时频频谱图中的稀疏分布特性,在语音能量集中和不集中的区域,构建从噪声语音到干净语音的映射关系会表现出显著差异。利用一个深度模型同时处理这两个不同的回归任务会增加映射关系的复杂性,从而限制模型的性能。为了验证我们的假设,我们提出了基于声纹区域分割的双区域语音增强模型。具体来说,我们首先训练一个声纹分割模型,将噪声语音分为两个区域。随后,我们为每个区域建立专门的语音增强模型,双区域模型同时构建噪声干扰语音与不同区域的纯净语音之间的映射关系。最后,通过合并结果,可以得到完整的修复语音。在公共数据集上的实验结果表明,我们的方法实现了有竞争力的语音增强性能,优于最先进的方法。消融研究结果证实了所提方法在增强模型性能方面的有效性。
{"title":"A dual-region speech enhancement method based on voiceprint segmentation","authors":"","doi":"10.1016/j.neunet.2024.106683","DOIUrl":"10.1016/j.neunet.2024.106683","url":null,"abstract":"<div><p>Single-channel speech enhancement primarily relies on deep learning models to recover clean speech signals from noise-contaminated speech. These models establish a mapping relationship between noisy and clean speech. However, considering the sparse distribution characteristics of speech energy across the entire time–frequency spectrogram, constructing the mapping relationship from noisy to clean speech exhibits significant differences in regions where speech energy is concentrated and non-concentrated. Utilizing one deep model to simultaneously address these two distinct regression tasks increases the complexity of the mapping relationships, consequently restricting the model’s performance. To validate our hypothesis, we propose a dual-region speech enhancement model based on voiceprint region segmentation. Specifically, we first train a voiceprint segmentation model to classify noisy speech into two regions. Subsequently, we establish dedicated speech enhancement models for each region, with the dual-region models concurrently constructing mapping relationships for noise-corrupted speech to clean speech in distinct regions. Finally, by merging the results, the complete restored speech can be obtained. Experimental results on public datasets demonstrate that our method achieves competitive speech enhancement performance, outperforming the state-of-the-art. Ablation study results confirm the effectiveness of the proposed approach in enhancing model performance.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142157636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A class-incremental learning approach for learning feature-compatible embeddings 学习特征兼容嵌入的类递增学习法
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-31 DOI: 10.1016/j.neunet.2024.106685

Humans have the ability to constantly learn new knowledge. However, for artificial intelligence, trying to continuously learn new knowledge usually results in catastrophic forgetting, the existing regularization-based and dynamic structure-based approaches have shown great potential for alleviating. Nevertheless, these approaches have certain limitations. They usually do not fully consider the problem of incompatible feature embeddings. Instead, they tend to focus only on the features of new or previous classes and fail to comprehensively consider the entire model. Therefore, we propose a two-stage learning paradigm to solve feature embedding incompatibility problems. Specifically, we retain the previous model and freeze all its parameters in the first stage while dynamically expanding a new module to alleviate feature embedding incompatibility questions. In the second stage, a fusion knowledge distillation approach is used to compress the redundant feature dimensions. Moreover, we propose weight pruning and consolidation approaches to improve the efficiency of the model. Our experimental results obtained on the CIFAR-100, ImageNet-100 and ImageNet-1000 benchmark datasets show that the proposed approaches achieve the best performance among all the compared approaches. For example, on the ImageNet-100 dataset, the maximal accuracy improvement is 5.08%. Code is available at https://github.com/ybyangjing/CIL-FCE.

人类具有不断学习新知识的能力。然而,对于人工智能来说,试图不断学习新知识通常会导致灾难性遗忘,而现有的基于正则化和动态结构的方法在缓解遗忘方面显示出巨大潜力。然而,这些方法也有一定的局限性。它们通常没有充分考虑不兼容特征嵌入的问题。相反,它们往往只关注新类别或以前类别的特征,而未能全面考虑整个模型。因此,我们提出了一种两阶段学习范式来解决特征嵌入不兼容问题。具体来说,我们在第一阶段保留以前的模型并冻结其所有参数,同时动态扩展一个新模块,以缓解特征嵌入不兼容问题。在第二阶段,我们采用融合知识提炼法来压缩冗余特征维度。此外,我们还提出了权重剪枝和合并方法,以提高模型的效率。我们在 CIFAR-100、ImageNet-100 和 ImageNet-1000 基准数据集上获得的实验结果表明,所提出的方法在所有比较方法中取得了最佳性能。例如,在 ImageNet-100 数据集上,最大准确率提高了 5.08%。代码见 https://github.com/ybyangjing/CIL-FCE。
{"title":"A class-incremental learning approach for learning feature-compatible embeddings","authors":"","doi":"10.1016/j.neunet.2024.106685","DOIUrl":"10.1016/j.neunet.2024.106685","url":null,"abstract":"<div><p>Humans have the ability to constantly learn new knowledge. However, for artificial intelligence, trying to continuously learn new knowledge usually results in catastrophic forgetting, the existing regularization-based and dynamic structure-based approaches have shown great potential for alleviating. Nevertheless, these approaches have certain limitations. They usually do not fully consider the problem of incompatible feature embeddings. Instead, they tend to focus only on the features of new or previous classes and fail to comprehensively consider the entire model. Therefore, we propose a two-stage learning paradigm to solve feature embedding incompatibility problems. Specifically, we retain the previous model and freeze all its parameters in the first stage while dynamically expanding a new module to alleviate feature embedding incompatibility questions. In the second stage, a fusion knowledge distillation approach is used to compress the redundant feature dimensions. Moreover, we propose weight pruning and consolidation approaches to improve the efficiency of the model. Our experimental results obtained on the CIFAR-100, ImageNet-100 and ImageNet-1000 benchmark datasets show that the proposed approaches achieve the best performance among all the compared approaches. For example, on the ImageNet-100 dataset, the maximal accuracy improvement is 5.08%. Code is available at <span><span>https://github.com/ybyangjing/CIL-FCE</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relaxed stability criteria of delayed neural networks using delay-parameters-dependent slack matrices 使用延迟参数松弛矩阵的延迟神经网络松弛稳定性标准。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-30 DOI: 10.1016/j.neunet.2024.106676

This note aims to reduce the conservatism of stability criteria for neural networks with time-varying delay. To this goal, on the one hand, we construct an augmented Lyapunov–Krasovskii functional (LKF), incorporating some delay-product terms that capture more information about neural states. On the other hand, when dealing with the derivative of the LKF, we introduce several parameter-dependent slack matrices into an affine integral inequality, zero equations, and the S-procedure. As a result, more relaxed stability criteria are obtained by employing the so-called Lyapunov–Krasovskii Theorem. Two numerical examples show that the proposed stability criteria are of less conservatism compared with some existing methods.

本论文旨在降低具有时变延迟的神经网络稳定性标准的保守性。为此,一方面,我们构建了一个增强的李雅普诺夫-克拉索夫斯基函数(LKF),其中包含了一些能捕捉更多神经状态信息的延迟积项。另一方面,在处理 LKF 的导数时,我们在仿射积分不等式、零方程和 S 过程中引入了几个与参数相关的松弛矩阵。因此,通过使用所谓的 Lyapunov-Krasovskii 定理,可以获得更宽松的稳定性标准。两个数值实例表明,与现有的一些方法相比,所提出的稳定性标准的保守性更低。
{"title":"Relaxed stability criteria of delayed neural networks using delay-parameters-dependent slack matrices","authors":"","doi":"10.1016/j.neunet.2024.106676","DOIUrl":"10.1016/j.neunet.2024.106676","url":null,"abstract":"<div><p>This note aims to reduce the conservatism of stability criteria for neural networks with time-varying delay. To this goal, on the one hand, we construct an augmented Lyapunov–Krasovskii functional (LKF), incorporating some delay-product terms that capture more information about neural states. On the other hand, when dealing with the derivative of the LKF, we introduce several <em>parameter-dependent slack matrices</em> into an affine integral inequality, zero equations, and the <span><math><mi>S</mi></math></span>-procedure. As a result, more relaxed stability criteria are obtained by employing the so-called Lyapunov–Krasovskii Theorem. Two numerical examples show that the proposed stability criteria are of less conservatism compared with some existing methods.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image harmonization with Simple Hybrid CNN-Transformer Network 利用简单的混合 CNN-Transformer 网络协调图像
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-30 DOI: 10.1016/j.neunet.2024.106673

Image harmonization seeks to transfer the illumination distribution of the background to that of the foreground within a composite image. Existing methods lack the ability of establishing global–local pixel illumination dependencies between foreground and background of composite images, which is indispensable for sharp and color-consistent harmonized image generation. To overcome this challenge, we design a novel Simple Hybrid CNN-Transformer Network (SHT-Net), which is formulated into an efficient symmetrical hierarchical architecture. It is composed of two newly designed light-weight Transformer blocks. Firstly, the scale-aware gated block is designed to capture multi-scale features through different heads and expand the receptive fields, which facilitates to generate images with fine-grained details. Secondly, we introduce a simple parallel attention block, which integrates the window-based self-attention and gated channel attention in parallel, resulting in simultaneously global–local pixel illumination relationship modeling capability. Besides, we propose an efficient simple feed forward network to filter out less informative features and allow the features to contribute to generating photo-realistic harmonized results passing through. Extensive experiments on image harmonization benchmarks indicate that our method achieve promising quantitative and qualitative results. The code and pre-trained models are available at https://github.com/guanguanboy/SHT-Net.

图像协调的目的是在合成图像中将背景的光照分布转移到前景的光照分布上。现有的方法缺乏在合成图像的前景和背景之间建立全局-局部像素光照相关性的能力,而这对于生成清晰且色彩一致的协调图像是不可或缺的。为了克服这一难题,我们设计了一种新颖的简单混合 CNN 变换器网络(SHT-Net),它被设计成一种高效的对称分层架构。它由两个新设计的轻量级变换器模块组成。首先,规模感知门控块旨在通过不同的头捕捉多尺度特征并扩大感受野,从而有助于生成具有细粒度细节的图像。其次,我们引入了一个简单的并行注意模块,将基于窗口的自注意和门控通道注意并行整合,从而同时实现全局-局部像素光照关系建模能力。此外,我们还提出了一种高效的简单前馈网络,用于过滤信息量较少的特征,并让这些特征为生成照片般逼真的协调结果做出贡献。对图像协调基准的广泛实验表明,我们的方法在定量和定性方面都取得了可喜的成果。代码和预训练模型可在 https://github.com/guanguanboy/SHT-Net 上获取。
{"title":"Image harmonization with Simple Hybrid CNN-Transformer Network","authors":"","doi":"10.1016/j.neunet.2024.106673","DOIUrl":"10.1016/j.neunet.2024.106673","url":null,"abstract":"<div><p>Image harmonization seeks to transfer the illumination distribution of the background to that of the foreground within a composite image. Existing methods lack the ability of establishing global–local pixel illumination dependencies between foreground and background of composite images, which is indispensable for sharp and color-consistent harmonized image generation. To overcome this challenge, we design a novel Simple Hybrid CNN-Transformer Network (SHT-Net), which is formulated into an efficient symmetrical hierarchical architecture. It is composed of two newly designed light-weight Transformer blocks. Firstly, the scale-aware gated block is designed to capture multi-scale features through different heads and expand the receptive fields, which facilitates to generate images with fine-grained details. Secondly, we introduce a simple parallel attention block, which integrates the window-based self-attention and gated channel attention in parallel, resulting in simultaneously global–local pixel illumination relationship modeling capability. Besides, we propose an efficient simple feed forward network to filter out less informative features and allow the features to contribute to generating photo-realistic harmonized results passing through. Extensive experiments on image harmonization benchmarks indicate that our method achieve promising quantitative and qualitative results. The code and pre-trained models are available at <span><span>https://github.com/guanguanboy/SHT-Net</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142162112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep dual incomplete multi-view multi-label classification via label semantic-guided contrastive learning 通过标签语义引导的对比学习实现深度双不完全多视角多标签分类
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-30 DOI: 10.1016/j.neunet.2024.106674

Multi-view multi-label learning (MVML) aims to train a model that can explore the multi-view information of the input sample to obtain its accurate predictions of multiple labels. Unfortunately, a majority of existing MVML methods are based on the assumption of data completeness, making them useless in practical applications with partially missing views or some uncertain labels. Recently, many approaches have been proposed for incomplete data, but few of them can handle the case of both missing views and labels. Moreover, these few existing works commonly ignore potentially valuable information about unknown labels or do not sufficiently explore latent label information. Therefore, in this paper, we propose a label semantic-guided contrastive learning method named LSGC for the dual incomplete multi-view multi-label classification problem. Concretely, LSGC employs deep neural networks to extract high-level features of samples. Inspired by the observation of exploiting label correlations to improve the feature discriminability, we introduce a graph convolutional network to effectively capture label semantics. Furthermore, we introduce a new sample-label contrastive loss to explore the label semantic information and enhance the feature representation learning. For missing labels, we adopt a pseudo-label filling strategy and develop a weighting mechanism to explore the confidently recovered label information. We validate the framework on five standard datasets and the experimental results show that our method achieves superior performance in comparison with the state-of-the-art methods.

多视图多标签学习(Multi-view multi-label learning,MVML)旨在训练一个能够探索输入样本多视图信息的模型,以获得对多个标签的准确预测。遗憾的是,现有的 MVML 方法大多基于数据完整性的假设,这使得它们在有部分缺失视图或一些不确定标签的实际应用中毫无用处。最近,针对不完整数据提出了很多方法,但其中很少有方法能同时处理缺失视图和标签的情况。此外,这些为数不多的现有作品通常会忽略未知标签的潜在价值信息,或者没有充分挖掘潜在标签信息。因此,在本文中,我们针对双重不完整多视图多标签分类问题提出了一种名为 LSGC 的标签语义引导对比学习方法。具体来说,LSGC 利用深度神经网络提取样本的高级特征。受到利用标签相关性来提高特征可辨别性的观察结果的启发,我们引入了图卷积网络来有效捕捉标签语义。此外,我们还引入了一种新的样本-标签对比损失(sample-label contrastive loss),以探索标签语义信息并增强特征表征学习。对于缺失标签,我们采用了一种伪标签填充策略,并开发了一种加权机制来探索有把握恢复的标签信息。我们在五个标准数据集上对该框架进行了验证,实验结果表明,与最先进的方法相比,我们的方法取得了更优越的性能。
{"title":"Deep dual incomplete multi-view multi-label classification via label semantic-guided contrastive learning","authors":"","doi":"10.1016/j.neunet.2024.106674","DOIUrl":"10.1016/j.neunet.2024.106674","url":null,"abstract":"<div><p>Multi-view multi-label learning (MVML) aims to train a model that can explore the multi-view information of the input sample to obtain its accurate predictions of multiple labels. Unfortunately, a majority of existing MVML methods are based on the assumption of data completeness, making them useless in practical applications with partially missing views or some uncertain labels. Recently, many approaches have been proposed for incomplete data, but few of them can handle the case of both missing views and labels. Moreover, these few existing works commonly ignore potentially valuable information about unknown labels or do not sufficiently explore latent label information. Therefore, in this paper, we propose a label semantic-guided contrastive learning method named LSGC for the dual incomplete multi-view multi-label classification problem. Concretely, LSGC employs deep neural networks to extract high-level features of samples. Inspired by the observation of exploiting label correlations to improve the feature discriminability, we introduce a graph convolutional network to effectively capture label semantics. Furthermore, we introduce a new sample-label contrastive loss to explore the label semantic information and enhance the feature representation learning. For missing labels, we adopt a pseudo-label filling strategy and develop a weighting mechanism to explore the confidently recovered label information. We validate the framework on five standard datasets and the experimental results show that our method achieves superior performance in comparison with the state-of-the-art methods.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal fusion network for ICU patient outcome prediction 用于重症监护室患者预后预测的多模态融合网络
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-29 DOI: 10.1016/j.neunet.2024.106672

Over the past decades, massive Electronic Health Records (EHRs) have been accumulated in Intensive Care Unit (ICU) and many other healthcare scenarios. The rich and comprehensive information recorded presents an exceptional opportunity for patient outcome predictions. Nevertheless, due to the diversity of data modalities, EHRs exhibit a heterogeneous characteristic, raising a difficulty to organically leverage information from various modalities. It is an urgent need to capture the underlying correlations among different modalities. In this paper, we propose a novel framework named Multimodal Fusion Network (MFNet) for ICU patient outcome prediction. First, we incorporate multiple modality-specific encoders to learn different modality representations. Notably, a graph guided encoder is designed to capture underlying global relationships among medical codes, and a text encoder with pre-fine-tuning strategy is adopted to extract appropriate text representations. Second, we propose to pairwise merge multimodal representations with a tailored hierarchical fusion mechanism. The experiments conducted on the eICU-CRD dataset validate that MFNet achieves superior performance on mortality prediction and Length of Stay (LoS) prediction compared with various representative and state-of-the-art baselines. Moreover, comprehensive ablation study demonstrates the effectiveness of each component of MFNet.

在过去的几十年里,重症监护室(ICU)和许多其他医疗场景积累了大量的电子健康记录(EHR)。所记录的丰富而全面的信息为患者预后预测提供了难得的机会。然而,由于数据模式的多样性,电子病历呈现出异构特征,给有机利用各种模式的信息带来了困难。捕捉不同模式之间的潜在关联是当务之急。在本文中,我们提出了一个用于 ICU 患者预后预测的新型框架,名为多模态融合网络(MFNet)。首先,我们结合多种特定模态编码器来学习不同的模态表征。值得注意的是,我们设计了一个图引导编码器来捕捉医疗代码之间的潜在全局关系,并采用了一个具有预微调策略的文本编码器来提取适当的文本表征。其次,我们建议采用量身定制的分层融合机制对多模态表征进行配对合并。在 eICU-CRD 数据集上进行的实验验证了,与各种具有代表性的先进基线相比,MFNet 在死亡率预测和住院时间(LoS)预测方面表现出色。此外,全面的消融研究也证明了 MFNet 每个组件的有效性。
{"title":"Multimodal fusion network for ICU patient outcome prediction","authors":"","doi":"10.1016/j.neunet.2024.106672","DOIUrl":"10.1016/j.neunet.2024.106672","url":null,"abstract":"<div><p>Over the past decades, massive Electronic Health Records (EHRs) have been accumulated in Intensive Care Unit (ICU) and many other healthcare scenarios. The rich and comprehensive information recorded presents an exceptional opportunity for patient outcome predictions. Nevertheless, due to the diversity of data modalities, EHRs exhibit a heterogeneous characteristic, raising a difficulty to organically leverage information from various modalities. It is an urgent need to capture the underlying correlations among different modalities. In this paper, we propose a novel framework named Multimodal Fusion Network (MFNet) for ICU patient outcome prediction. First, we incorporate multiple modality-specific encoders to learn different modality representations. Notably, a graph guided encoder is designed to capture underlying global relationships among medical codes, and a text encoder with pre-fine-tuning strategy is adopted to extract appropriate text representations. Second, we propose to pairwise merge multimodal representations with a tailored hierarchical fusion mechanism. The experiments conducted on the eICU-CRD dataset validate that MFNet achieves superior performance on mortality prediction and Length of Stay (LoS) prediction compared with various representative and state-of-the-art baselines. Moreover, comprehensive ablation study demonstrates the effectiveness of each component of MFNet.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Aperiodically intermittent quantized control-based exponential synchronization of quaternion-valued inertial neural networks 基于指数同步的四元数值惯性神经网络的非周期性间歇量化控制
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-29 DOI: 10.1016/j.neunet.2024.106669

Inertial neural networks are proposed via introducing an inertia term into the Hopfield models, which make their dynamic behavior more complex compared to the traditional first-order models. Besides, the aperiodically intermittent quantized control over conventional feedback control has its potential advantages on reducing communication blocking and saving control cost. Based on these facts, we are mainly devoted to exploring of exponential synchronization of quaternion-valued inertial neural networks under aperiodically intermittent quantized control. Firstly, a compact quaternion-valued aperiodically intermittent quantized control protocol is developed, which can mitigate significantly the complexity of theoretical derivation. Subsequently, several concise criteria involving matrix inequalities are formulated through constructing a type of Lyapunov functional and employing a direct analysis approach. The correctness of the obtained results eventually is verified by a typical example.

惯性神经网络是通过在 Hopfield 模型中引入惯性项而提出的,与传统的一阶模型相比,惯性神经网络的动态行为更加复杂。此外,与传统反馈控制相比,非周期间歇量化控制在减少通信阻塞和节约控制成本方面具有潜在优势。基于这些事实,我们主要致力于探索非周期间歇量化控制下四元值惯性神经网络的指数同步问题。首先,我们开发了一种紧凑的四元数值非周期性间歇量化控制协议,它可以大大降低理论推导的复杂性。随后,通过构建一种 Lyapunov 函数并采用直接分析方法,提出了涉及矩阵不等式的若干简明准则。最终通过一个典型例子验证了所获结果的正确性。
{"title":"Aperiodically intermittent quantized control-based exponential synchronization of quaternion-valued inertial neural networks","authors":"","doi":"10.1016/j.neunet.2024.106669","DOIUrl":"10.1016/j.neunet.2024.106669","url":null,"abstract":"<div><p>Inertial neural networks are proposed via introducing an inertia term into the Hopfield models, which make their dynamic behavior more complex compared to the traditional first-order models. Besides, the aperiodically intermittent quantized control over conventional feedback control has its potential advantages on reducing communication blocking and saving control cost. Based on these facts, we are mainly devoted to exploring of exponential synchronization of quaternion-valued inertial neural networks under aperiodically intermittent quantized control. Firstly, a compact quaternion-valued aperiodically intermittent quantized control protocol is developed, which can mitigate significantly the complexity of theoretical derivation. Subsequently, several concise criteria involving matrix inequalities are formulated through constructing a type of Lyapunov functional and employing a direct analysis approach. The correctness of the obtained results eventually is verified by a typical example.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Backdoor attacks on unsupervised graph representation learning 对无监督图表示学习的后门攻击。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-29 DOI: 10.1016/j.neunet.2024.106668

Unsupervised graph learning techniques have garnered increasing interest among researchers. These methods employ the technique of maximizing mutual information to generate representations of nodes and graphs. We show that these methods are susceptible to backdoor attacks, wherein the adversary can poison a small portion of unlabeled graph data (e.g., node features and graph structure) by introducing triggers into the graph. This tampering disrupts the representations and increases the risk to various downstream applications. Previous backdoor attacks in supervised learning primarily operate directly on the label space and may not be suitable for unlabeled graph data. To tackle this challenge, we introduce GRBA,1 a gradient-based first-order backdoor attack method. To the best of our knowledge, this constitutes a pioneering endeavor in investigating backdoor attacks within the domain of unsupervised graph learning. The initiation of this method does not necessitate prior knowledge of downstream tasks, as it directly focuses on representations. Furthermore, it is versatile and can be applied to various downstream tasks, including node classification, node clustering and graph classification. We evaluate GRBA on state-of-the-art unsupervised learning models, and the experimental results substantiate the effectiveness and evasiveness of GRBA in both node-level and graph-level tasks.

无监督图学习技术越来越受到研究人员的关注。这些方法采用互信息最大化技术来生成节点和图的表示。我们的研究表明,这些方法容易受到后门攻击,即对手可以通过在图中引入触发器,篡改一小部分未标记的图数据(如节点特征和图结构)。这种篡改会破坏图的表示,增加各种下游应用的风险。以往监督学习中的后门攻击主要是直接在标签空间操作,可能不适合无标签图数据。为了应对这一挑战,我们引入了基于梯度的一阶后门攻击方法 GRBA1。据我们所知,这是在无监督图学习领域研究后门攻击的开创性尝试。启动这种方法不需要事先了解下游任务,因为它直接关注表征。此外,它用途广泛,可应用于各种下游任务,包括节点分类、节点聚类和图分类。我们在最先进的无监督学习模型上对 GRBA 进行了评估,实验结果证明了 GRBA 在节点级和图级任务中的有效性和规避性。
{"title":"Backdoor attacks on unsupervised graph representation learning","authors":"","doi":"10.1016/j.neunet.2024.106668","DOIUrl":"10.1016/j.neunet.2024.106668","url":null,"abstract":"<div><p>Unsupervised graph learning techniques have garnered increasing interest among researchers. These methods employ the technique of maximizing mutual information to generate representations of nodes and graphs. We show that these methods are susceptible to backdoor attacks, wherein the adversary can poison a small portion of unlabeled graph data (<em>e.g</em>., node features and graph structure) by introducing triggers into the graph. This tampering disrupts the representations and increases the risk to various downstream applications. Previous backdoor attacks in supervised learning primarily operate directly on the label space and may not be suitable for unlabeled graph data. To tackle this challenge, we introduce GRBA,<span><span><sup>1</sup></span></span> a gradient-based first-order backdoor attack method. To the best of our knowledge, this constitutes a pioneering endeavor in investigating backdoor attacks within the domain of unsupervised graph learning. The initiation of this method does not necessitate prior knowledge of downstream tasks, as it directly focuses on representations. Furthermore, it is versatile and can be applied to various downstream tasks, including node classification, node clustering and graph classification. We evaluate GRBA on state-of-the-art unsupervised learning models, and the experimental results substantiate the effectiveness and evasiveness of GRBA in both node-level and graph-level tasks.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142146646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1