首页 > 最新文献

Neurocomputing最新文献

英文 中文
Differentially private and explainable boosting machine with enhanced utility 增强实用性的差异化私有可解释助推器
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-23 DOI: 10.1016/j.neucom.2024.128424

In this paper, we introduce DP-EBM*, an enhanced utility version of the Differentially Private Explainable Boosting Machine (DP-EBM). DP-EBM* offers predictions for both classification and regression tasks, providing inherent explanations for its predictions while ensuring the protection of sensitive individual information via Differential Privacy. DP-EBM* has two major improvements over DP-EBM. Firstly, we develop an error measure to assess the efficiency of using privacy budget, a crucial factor to accuracy, and optimize this measure. Secondly, we propose a feature pruning method, which eliminates less important features during the training process. Our experimental results demonstrate that DP-EBM* outperforms the state-of-the-art differentially private explainable models.

在本文中,我们介绍了 DP-EBM*,它是差分隐私可解释提升机(DP-EBM)的增强实用版本。DP-EBM* 为分类和回归任务提供预测,为其预测提供内在解释,同时通过差分隐私确保敏感的个人信息得到保护。DP-EBM* 与 DP-EBM 相比有两大改进。首先,我们开发了一种误差测量方法来评估隐私预算的使用效率(这是影响准确性的一个关键因素),并对该方法进行了优化。其次,我们提出了一种特征剪枝方法,在训练过程中剔除不太重要的特征。实验结果表明,DP-EBM* 优于最先进的差异化隐私可解释模型。
{"title":"Differentially private and explainable boosting machine with enhanced utility","authors":"","doi":"10.1016/j.neucom.2024.128424","DOIUrl":"10.1016/j.neucom.2024.128424","url":null,"abstract":"<div><p>In this paper, we introduce DP-EBM*, an enhanced utility version of the Differentially Private Explainable Boosting Machine (DP-EBM). DP-EBM* offers predictions for both classification and regression tasks, providing inherent explanations for its predictions while ensuring the protection of sensitive individual information via Differential Privacy. DP-EBM* has two major improvements over DP-EBM. Firstly, we develop an error measure to assess the efficiency of using privacy budget, a crucial factor to accuracy, and optimize this measure. Secondly, we propose a feature pruning method, which eliminates less important features during the training process. Our experimental results demonstrate that DP-EBM* outperforms the state-of-the-art differentially private explainable models.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GNN-based multi-source domain prototype representation for cross-subject EEG emotion recognition 基于 GNN 的多源域原型表示法用于跨主体脑电图情感识别
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128445

Emotion recognition based on electroencephalography (EEG) signals is a major area of affective computing. However, the existence of distributional differences between subjects has greatly hindered the large-scale application of EEG emotion recognition techniques. Most of the existing cross-subject methods primarily concentrate on treating multiple subjects as a single source domain. These methods lead to significant distributional differences within the source domain, which hinder the model’s ability to generalise effectively to target subjects. In this paper, we propose a new method that combines graph neural network-based prototype representation of multiple source domains with clustering similarity loss. It consists of three parts: multi-source domain prototype representation, graph neural network and loss. Multi-source domain prototype representation treats different subjects in the source domain as sub-source domains and extracts prototype features, which learns a more fine-grained feature representation. Graph neural network can better model the association properties between prototypes and samples. In addition, we propose a similarity loss based on clustering idea. The loss makes maximum use of similarity between samples in the target domain while ensuring that the classification performance does not degrade. We conduct extensive experiments on two benchmark datasets, SEED and SEED IV. The experimental results validate the effectiveness of the proposed multi-source domain fusion approach and indicate its superiority over existing methods in cross-subject classification tasks.

基于脑电图(EEG)信号的情感识别是情感计算的一个重要领域。然而,受试者之间分布差异的存在极大地阻碍了脑电图情感识别技术的大规模应用。现有的大多数跨主体方法主要集中于将多个主体视为单一源域。这些方法会导致源域内出现明显的分布差异,从而阻碍了模型有效推广到目标受试者的能力。在本文中,我们提出了一种新方法,将基于图神经网络的多源域原型表示与聚类相似性损失相结合。它由三部分组成:多源域原型表示、图神经网络和损失。多源域原型表示法将源域中的不同主题视为子源域并提取原型特征,从而学习到更精细的特征表示。图神经网络可以更好地模拟原型和样本之间的关联属性。此外,我们还提出了一种基于聚类思想的相似性损失。该损失最大限度地利用了目标领域中样本之间的相似性,同时确保分类性能不会降低。我们在 SEED 和 SEED IV 这两个基准数据集上进行了广泛的实验。实验结果验证了所提出的多源域融合方法的有效性,并表明它在跨主体分类任务中优于现有方法。
{"title":"GNN-based multi-source domain prototype representation for cross-subject EEG emotion recognition","authors":"","doi":"10.1016/j.neucom.2024.128445","DOIUrl":"10.1016/j.neucom.2024.128445","url":null,"abstract":"<div><p>Emotion recognition based on electroencephalography (EEG) signals is a major area of affective computing. However, the existence of distributional differences between subjects has greatly hindered the large-scale application of EEG emotion recognition techniques. Most of the existing cross-subject methods primarily concentrate on treating multiple subjects as a single source domain. These methods lead to significant distributional differences within the source domain, which hinder the model’s ability to generalise effectively to target subjects. In this paper, we propose a new method that combines graph neural network-based prototype representation of multiple source domains with clustering similarity loss. It consists of three parts: multi-source domain prototype representation, graph neural network and loss. Multi-source domain prototype representation treats different subjects in the source domain as sub-source domains and extracts prototype features, which learns a more fine-grained feature representation. Graph neural network can better model the association properties between prototypes and samples. In addition, we propose a similarity loss based on clustering idea. The loss makes maximum use of similarity between samples in the target domain while ensuring that the classification performance does not degrade. We conduct extensive experiments on two benchmark datasets, SEED and SEED IV. The experimental results validate the effectiveness of the proposed multi-source domain fusion approach and indicate its superiority over existing methods in cross-subject classification tasks.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overcoming language priors in visual question answering with cumulative learning strategy 利用累积学习策略克服视觉问题解答中的语言先验
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128419

The performance of visual question answering(VQA) has witnessed great progress over the last few years. However, many current VQA models tend to rely on superficial linguistic correlations between questions and answers, often failing to sufficiently learn multi-modal knowledge from both vision and language, and thus suffering significant performance drops. To address this issue, the VQA-CP v2.0 dataset was developed to reduce language biases by greedily re-partitioning the distribution of VQA v2.0’s training and test sets. According to the fact that achieving high performance on real-world datasets requires effective learning from minor classes, in this paper we analyze the presence of skewed long-tail distributions in the VQA-CP v2.0 dataset and propose a new ensemble-based parameter-insensitive framework. This framework is built on two representation learning branches and a joint learning block, which are designed to reduce language biases in VQA tasks. Specifically, the representation learning branches can ensure the superior representative ability learned from the major and minor classes. The joint learning block forces the model to initially concentrate on major classes for robust representation and then gradually shifts its focus towards minor classes for classification during the training progress. Experimental results demonstrate that our approach outperforms the state-of-the-art works on the VQA-CP v2.0 dataset without requiring additional annotations. Notably, on the “num” type, our framework exceeds the second-best method (without extra annotations) by 8.64%. Meanwhile, our approach does not sacrifice accuracy performance on the VQA v2.0 dataset compared with the baseline model.

视觉问题解答(VQA)的性能在过去几年中取得了长足进步。然而,目前的许多视觉问题回答模型往往依赖于问题与答案之间肤浅的语言相关性,往往无法从视觉和语言两方面充分学习多模态知识,从而导致性能大幅下降。为了解决这个问题,我们开发了 VQA-CP v2.0 数据集,通过对 VQA v2.0 的训练集和测试集的分布进行贪婪的重新划分来减少语言偏差。要在真实世界的数据集上实现高性能,就必须有效地学习小类,因此我们在本文中分析了 VQA-CP v2.0 数据集中存在的倾斜长尾分布,并提出了一种新的基于集合的参数敏感框架。该框架建立在两个表征学习分支和一个联合学习区块之上,旨在减少 VQA 任务中的语言偏差。具体来说,表征学习分支可以确保从主要和次要类别中学习到的卓越代表能力。联合学习模块迫使模型最初集中于主要类别以获得稳健的表征,然后在训练过程中逐渐将注意力转移到次要类别的分类上。实验结果表明,在 VQA-CP v2.0 数据集上,我们的方法无需额外注释即可超越最先进的方法。值得注意的是,在 "num "类型上,我们的框架比第二好的方法(无额外注释)高出 8.64%。同时,与基线模型相比,我们的方法在 VQA v2.0 数据集上并没有牺牲准确性。
{"title":"Overcoming language priors in visual question answering with cumulative learning strategy","authors":"","doi":"10.1016/j.neucom.2024.128419","DOIUrl":"10.1016/j.neucom.2024.128419","url":null,"abstract":"<div><p>The performance of visual question answering(VQA) has witnessed great progress over the last few years. However, many current VQA models tend to rely on superficial linguistic correlations between questions and answers, often failing to sufficiently learn multi-modal knowledge from both vision and language, and thus suffering significant performance drops. To address this issue, the VQA-CP v2.0 dataset was developed to reduce language biases by greedily re-partitioning the distribution of VQA v2.0’s training and test sets. According to the fact that achieving high performance on real-world datasets requires effective learning from minor classes, in this paper we analyze the presence of skewed long-tail distributions in the VQA-CP v2.0 dataset and propose a new ensemble-based parameter-insensitive framework. This framework is built on two representation learning branches and a joint learning block, which are designed to reduce language biases in VQA tasks. Specifically, the representation learning branches can ensure the superior representative ability learned from the major and minor classes. The joint learning block forces the model to initially concentrate on major classes for robust representation and then gradually shifts its focus towards minor classes for classification during the training progress. Experimental results demonstrate that our approach outperforms the state-of-the-art works on the VQA-CP v2.0 dataset without requiring additional annotations. Notably, on the “num” type, our framework exceeds the second-best method (without extra annotations) by 8.64%. Meanwhile, our approach does not sacrifice accuracy performance on the VQA v2.0 dataset compared with the baseline model.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From text to mask: Localizing entities using the attention of text-to-image diffusion models 从文本到遮罩:利用文本到图像扩散模型的注意力定位实体
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128437

Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. This work proposes a simple but effective method to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without additional training time nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called “personalized referring image segmentation” with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation.

最近,扩散模型在文本到图像生成领域掀起了一股热潮。这种融合文字和图像信息的独特方式使其在生成与文字高度相关的图像方面具有非凡的能力。从另一个角度看,这些生成模型暗示了文字和像素之间精确相关的线索。本研究提出了一种简单而有效的方法,在文本到图像扩散模型的去噪网络中利用注意力机制。无需额外的训练时间和推理时间优化,就能直接获得短语的语义基础。在弱监督语义分割设置下,我们在 Pascal VOC 2012 和 Microsoft COCO 2014 上对我们的方法进行了评估,结果表明我们的方法比之前的方法性能更优。此外,获得的单词-像素相关性可用于定制生成方法的学习文本嵌入,只需稍作修改即可。为了验证我们的发现,我们引入了一项新的实际任务,即 "个性化参考图像分割 "和一个新的数据集。在各种情况下进行的实验证明,在这项任务中,与强大的基准相比,我们的方法更具优势。总之,我们的工作揭示了一种提取隐藏在扩散模型中的丰富多模态知识进行分割的新方法。
{"title":"From text to mask: Localizing entities using the attention of text-to-image diffusion models","authors":"","doi":"10.1016/j.neucom.2024.128437","DOIUrl":"10.1016/j.neucom.2024.128437","url":null,"abstract":"<div><p>Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another perspective, these generative models imply clues about the precise correlation between words and pixels. This work proposes a simple but effective method to utilize the attention mechanism in the denoising network of text-to-image diffusion models. Without additional training time nor inference-time optimization, the semantic grounding of phrases can be attained directly. We evaluate our method on Pascal VOC 2012 and Microsoft COCO 2014 under weakly-supervised semantic segmentation setting and our method achieves superior performance to prior methods. In addition, the acquired word-pixel correlation is generalizable for the learned text embedding of customized generation methods, requiring only a few modifications. To validate our discovery, we introduce a new practical task called “personalized referring image segmentation” with a new dataset. Experiments in various situations demonstrate the advantages of our method compared to strong baselines on this task. In summary, our work reveals a novel way to extract the rich multi-modal knowledge hidden in diffusion models for segmentation.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clusterwise Independent Component Analysis (C-ICA): An R package for clustering subjects based on ICA patterns underlying three-way (brain) data 聚类独立成分分析 (C-ICA):基于三向(大脑)数据的 ICA 模式对受试者进行聚类的 R 软件包
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128396

In many areas of science, like neuroscience, genomics and text mining, several important and challenging research questions imply the study of (subject) heterogeneity present in three-way data. In clinical neuroscience, for example, disclosing differences or heterogeneity between subjects in resting state networks (RSNs) underlying multi-subject fMRI data (i.e., time by voxel by subject three-way data) may advance the subtyping of psychiatric and mental diseases. Recently, the Clusterwise Independent Component Analysis (C-ICA) method was proposed that enables the disclosure of heterogeneity between subjects in RSNs that is present in multi-subject rs-fMRI data [1]. Up to now, however, no publicly available software exists that allows to fit C-ICA to empirical data at hand. The goal of this paper, therefore, is to present the CICA R package, which contains the necessary functions to estimate the C-ICA parameters and to interpret and visualize the analysis output. Further, the package also includes functions to select suitable initial values for the C-ICA model parameters and to determine the optimal number of clusters and components for a given empirical data set (i.e., model selection). The use of the main functions of the package is discussed and demonstrated with simulated data. Herewith, the necessary analytical choices that have to be made by the user (e.g., starting values) are explained and showed step by step. The rich functionality of the package is further illustrated by applying C-ICA to empirical rs-fMRI data from a group of Alzheimer patients and elderly control subjects and to multi-country stock market data. Finally, extensions regarding the C-ICA algorithm and procedures for model selection that could be implemented in future releases of the package are discussed.

在神经科学、基因组学和文本挖掘等许多科学领域,一些重要而具有挑战性的研究问题都意味着要研究三向数据中存在的(受试者)异质性。例如,在临床神经科学领域,揭示多受试者 fMRI 数据(即按时间、体素和受试者的三向数据)基础静息状态网络(RSN)中受试者之间的差异或异质性,可能会推动精神疾病和心理疾病的亚型分类。最近,有人提出了聚类独立成分分析(Clusterwise Independent Component Analysis,C-ICA)方法,可以揭示多受试者 rs-fMRI 数据中存在的 RSNs 受试者之间的异质性[1]。然而,到目前为止,还没有公开可用的软件可以将 C-ICA 与手头的经验数据相匹配。因此,本文的目标是介绍 CICA R 软件包,其中包含估算 C-ICA 参数以及解释和可视化分析输出所需的函数。此外,该软件包还包括为 C-ICA 模型参数选择合适的初始值以及为给定的经验数据集确定最佳聚类和成分数量(即模型选择)的函数。本文讨论了软件包主要功能的使用,并用模拟数据进行了演示。此外,还将逐步解释和展示用户必须做出的必要分析选择(如起始值)。通过将 C-ICA 应用于一组老年痴呆症患者和老年对照组的 rs-fMRI 经验数据以及多国股票市场数据,进一步说明了软件包的丰富功能。最后,还讨论了 C-ICA 算法的扩展功能和模型选择程序,这些都可以在软件包的未来版本中实现。
{"title":"Clusterwise Independent Component Analysis (C-ICA): An R package for clustering subjects based on ICA patterns underlying three-way (brain) data","authors":"","doi":"10.1016/j.neucom.2024.128396","DOIUrl":"10.1016/j.neucom.2024.128396","url":null,"abstract":"<div><p>In many areas of science, like neuroscience, genomics and text mining, several important and challenging research questions imply the study of (subject) heterogeneity present in three-way data. In clinical neuroscience, for example, disclosing differences or heterogeneity between subjects in resting state networks (RSNs) underlying multi-subject fMRI data (i.e., time by voxel by subject three-way data) may advance the subtyping of psychiatric and mental diseases. Recently, the Clusterwise Independent Component Analysis (C-ICA) method was proposed that enables the disclosure of heterogeneity between subjects in RSNs that is present in multi-subject rs-fMRI data <span><span>[1]</span></span>. Up to now, however, no publicly available software exists that allows to fit C-ICA to empirical data at hand. The goal of this paper, therefore, is to present the <span>CICA R</span> package, which contains the necessary functions to estimate the C-ICA parameters and to interpret and visualize the analysis output. Further, the package also includes functions to select suitable initial values for the C-ICA model parameters and to determine the optimal number of clusters and components for a given empirical data set (i.e., model selection). The use of the main functions of the package is discussed and demonstrated with simulated data. Herewith, the necessary analytical choices that have to be made by the user (e.g., starting values) are explained and showed step by step. The rich functionality of the package is further illustrated by applying C-ICA to empirical rs-fMRI data from a group of Alzheimer patients and elderly control subjects and to multi-country stock market data. Finally, extensions regarding the C-ICA algorithm and procedures for model selection that could be implemented in future releases of the package are discussed.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224011676/pdfft?md5=3a56afa04cb1c782ebc52543f39bdb32&pid=1-s2.0-S0925231224011676-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142088284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic dependency and local convolution for enhancing naturalness and tone in text-to-speech synthesis 利用语义依赖和局部卷积增强文本到语音合成的自然度和音调
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128430

Self-attention-based networks have become increasingly popular due to their exceptional performance in parallel training and global context modeling. However, it may fall short of capturing local dependencies, particularly in datasets with strong local correlations. To address this challenge, we propose a novel method that utilizes semantic dependency to extract linguistic information from the original text. The semantic relationship between nodes serves as prior knowledge to refine the self-attention distribution. Additionally, to better fuse local contextual information, we introduce a one-dimensional convolution neural network to generate the query and value matrices in the self-attention mechanism, taking advantage of the strong correlation between input characters. We apply this variant of the self-attention network to text-to-speech tasks and propose a non-autoregressive neural text-to-speech model. To enhance pronunciation accuracy, we separate tones from phonemes as independent features in model training. Experimental results show that our model yields good performance in speech synthesis. Specifically, the proposed method significantly improves the processing of pause, stress, and intonation in speech.

基于自我关注的网络在并行训练和全局上下文建模方面表现出色,因此越来越受欢迎。然而,它可能无法捕捉局部依赖性,尤其是在具有较强局部相关性的数据集中。为了应对这一挑战,我们提出了一种利用语义依赖性从原文中提取语言信息的新方法。节点之间的语义关系是完善自我关注分布的先验知识。此外,为了更好地融合本地上下文信息,我们引入了一维卷积神经网络,利用输入字符之间的强相关性,在自我关注机制中生成查询矩阵和值矩阵。我们将自注意网络的这一变体应用于文本到语音任务,并提出了一种非自回归神经文本到语音模型。为了提高发音的准确性,我们在模型训练中将音调与音素作为独立特征分开。实验结果表明,我们的模型在语音合成中表现出色。具体来说,所提出的方法显著改善了语音中停顿、重音和语调的处理。
{"title":"Semantic dependency and local convolution for enhancing naturalness and tone in text-to-speech synthesis","authors":"","doi":"10.1016/j.neucom.2024.128430","DOIUrl":"10.1016/j.neucom.2024.128430","url":null,"abstract":"<div><p>Self-attention-based networks have become increasingly popular due to their exceptional performance in parallel training and global context modeling. However, it may fall short of capturing local dependencies, particularly in datasets with strong local correlations. To address this challenge, we propose a novel method that utilizes semantic dependency to extract linguistic information from the original text. The semantic relationship between nodes serves as prior knowledge to refine the self-attention distribution. Additionally, to better fuse local contextual information, we introduce a one-dimensional convolution neural network to generate the query and value matrices in the self-attention mechanism, taking advantage of the strong correlation between input characters. We apply this variant of the self-attention network to text-to-speech tasks and propose a non-autoregressive neural text-to-speech model. To enhance pronunciation accuracy, we separate tones from phonemes as independent features in model training. Experimental results show that our model yields good performance in speech synthesis. Specifically, the proposed method significantly improves the processing of pause, stress, and intonation in speech.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep top-down framework towards generalisable multi-view pedestrian detection 自上而下的深度框架,实现可通用的多视角行人检测
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128458

Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.

多摄像头经常被用于检测严重遮挡的行人。最先进的多视角行人深度检测方法通常是将从多个视角提取的特征图通过同构法投射到地平面,进行信息融合。然而,这种自下而上的方法很容易过度拟合训练数据集中的相机位置和方向,从而导致泛化性能较弱,影响其在现实世界中的应用。为了解决这个问题,我们提出了一种自上而下的深度框架 TMVD,在这个框架中,多个视图中离散化地面平面每个单元的矩形框内的特征图以及行人的平均尺寸都会被加权并嵌入到顶视图中。利用卷积神经网络推断行人的位置。与深度多视图行人检测的基准方法相比,所提出的方法大大提高了泛化性能。同时,它的性能也明显优于其他自上而下的方法。
{"title":"A deep top-down framework towards generalisable multi-view pedestrian detection","authors":"","doi":"10.1016/j.neucom.2024.128458","DOIUrl":"10.1016/j.neucom.2024.128458","url":null,"abstract":"<div><p>Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A general method for mode decomposition on additive mixture: Generalized Variational Mode Decomposition and its sequentialization 加性混合物模式分解的一般方法:广义变分模式分解及其序列化
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128390

Variational Mode Decomposition(VMD) method was proposed to separate non-stationary signal mixture by solving a optimization problem. This method is powerful and can reconstruct the signal components precisely when they are orthogonal(or quasi-orthogonal) in frequency domain. The crucial problem for VMD is that it requires the information of modal number before the decomposition. Also its applications are limited in 1D and 2D signal processing fields, of narrow scope.

In this paper, by inheriting and developing the core idea of VMD, we build a general form for this method and extend it to the modal decomposition for common additive mixture, not only limited in signal processing. To overcome the obstacle of modal number, we sequentialize the generalized VMD method, such that the modes can be extracted one by one, without knowing the modal number a priori. After the generalization and sequentialization for the VMD, we apply them in different fields of additive case, such as texture segmentation, Gaussian Mixture Model(GMM), clustering, etc. From the experiments, we conclude that the generalized and sequentialized VMD methods can solve variety classical problems from the view of modal decomposition, which implies that our methods have higher generality and wider applicability. A raw Matlab code for this algorithm is shown in https://github.com/changwangke/SGVMD_additive_Clustering/blob/main/SGVMD_clustering.m.

变分模式分解法(VMD)是通过求解一个优化问题来分离非平稳信号混合物的方法。这种方法功能强大,当信号分量在频域上正交(或准正交)时,可以精确地重建信号分量。VMD 的关键问题在于分解前需要模态数信息。本文在继承和发展 VMD 核心思想的基础上,建立了该方法的一般形式,并将其扩展到普通加性混合物的模态分解,而不仅仅局限于信号处理领域。为了克服模态数的障碍,我们对广义 VMD 方法进行了序列化,这样就可以在不预先知道模态数的情况下逐个提取模态。在对 VMD 进行广义化和序列化后,我们将其应用于不同的加法情况,如纹理分割、高斯混杂模型(GMM)、聚类等。通过实验,我们得出结论:从模态分解的角度来看,广义和序列化 VMD 方法可以解决各种经典问题,这意味着我们的方法具有更高的通用性和更广泛的适用性。该算法的 Matlab 原始代码如 https://github.com/changwangke/SGVMD_additive_Clustering/blob/main/SGVMD_clustering.m 所示。
{"title":"A general method for mode decomposition on additive mixture: Generalized Variational Mode Decomposition and its sequentialization","authors":"","doi":"10.1016/j.neucom.2024.128390","DOIUrl":"10.1016/j.neucom.2024.128390","url":null,"abstract":"<div><p>Variational Mode Decomposition(VMD) method was proposed to separate non-stationary signal mixture by solving a optimization problem. This method is powerful and can reconstruct the signal components precisely when they are orthogonal(or quasi-orthogonal) in frequency domain. The crucial problem for VMD is that it requires the information of modal number before the decomposition. Also its applications are limited in 1D and 2D signal processing fields, of narrow scope.</p><p>In this paper, by inheriting and developing the core idea of VMD, we build a general form for this method and extend it to the modal decomposition for common additive mixture, not only limited in signal processing. To overcome the obstacle of modal number, we sequentialize the generalized VMD method, such that the modes can be extracted one by one, without knowing the modal number a priori. After the generalization and sequentialization for the VMD, we apply them in different fields of additive case, such as texture segmentation, Gaussian Mixture Model(GMM), clustering, etc. From the experiments, we conclude that the generalized and sequentialized VMD methods can solve variety classical problems from the view of modal decomposition, which implies that our methods have higher generality and wider applicability. A raw Matlab code for this algorithm is shown in <span><span>https://github.com/changwangke/SGVMD_additive_Clustering/blob/main/SGVMD_clustering.m</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142077359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-organizing hypercomplex-valued adaptive network 自组织超复值自适应网络
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128429

A novel, unsupervised, artificial intelligence system is presented, whose input signals and trainable weights consist of complex or hypercomplex values. The system uses the effect given by the complex multiplication that the multiplicand is not only scaled but also rotated. The more similar an input signal and the reference signal are, the more likely the input signal belongs to the corresponding class. The data assigned to a class during training is stored on a generic layer as well as on a layer extracting special features of the signal. As a result, the same cluster can hold a general description and the details of the signal. This property is vital for assigning a signal to an existing or a new class. To ensure that only valid new classes are opened, the system determines the variances by comparing each input signal component with the weights and adaptively adjusts its activation and threshold functions for an optimal classification decision. The presented system knows at any time all boundaries of its clusters. Experimentally, it is demonstrated that the system is able to cluster the data of multiple classes autonomously, fast, and with high accuracy.

本文介绍了一种新型的无监督人工智能系统,其输入信号和可训练权重由复数或超复数值组成。该系统利用了复数乘法的效果,即乘方不仅被缩放,而且还被旋转。输入信号与参考信号越相似,输入信号就越有可能属于相应的类别。在训练过程中分配给一个类别的数据会存储在一个通用层和一个提取信号特殊特征的层上。因此,同一个簇既能保存一般描述,也能保存信号的细节。这一特性对于将信号分配到现有类别或新类别至关重要。为确保只开设有效的新类别,系统通过比较每个输入信号分量和权重来确定方差,并自适应地调整其激活和阈值函数,以做出最佳分类决策。所介绍的系统随时了解其聚类的所有边界。实验证明,该系统能够自主、快速、高精度地对多个类别的数据进行聚类。
{"title":"Self-organizing hypercomplex-valued adaptive network","authors":"","doi":"10.1016/j.neucom.2024.128429","DOIUrl":"10.1016/j.neucom.2024.128429","url":null,"abstract":"<div><p>A novel, unsupervised, artificial intelligence system is presented, whose input signals and trainable weights consist of complex or hypercomplex values. The system uses the effect given by the complex multiplication that the multiplicand is not only scaled but also rotated. The more similar an input signal and the reference signal are, the more likely the input signal belongs to the corresponding class. The data assigned to a class during training is stored on a generic layer as well as on a layer extracting special features of the signal. As a result, the same cluster can hold a general description and the details of the signal. This property is vital for assigning a signal to an existing or a new class. To ensure that only valid new classes are opened, the system determines the variances by comparing each input signal component with the weights and adaptively adjusts its activation and threshold functions for an optimal classification decision. The presented system knows at any time all boundaries of its clusters. Experimentally, it is demonstrated that the system is able to cluster the data of multiple classes autonomously, fast, and with high accuracy.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0925231224012001/pdfft?md5=c4ac73d840489a544f0af38bdb8b25c0&pid=1-s2.0-S0925231224012001-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142083744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory-efficient DRASiW Models 内存效率高的 DRASiW 模型
IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.neucom.2024.128443

Weightless Neural Networks (WNN) are ideal for Federated Learning due to their robustness and computational efficiency. These scenarios require models with a small memory footprint and the ability to aggregate knowledge from multiple models. In this work, we demonstrate the effectiveness of using Bloom filter variations to implement DRASiW models—an adaptation of WNN that records both the presence and frequency of patterns—with minimized memory usage. Across various datasets, DRASiW models show competitive performance compared to models like Random Forest, k-Nearest Neighbors, Multi-layer Perceptron, and Support Vector Machines, with an acceptable space trade-off. Furthermore, our findings indicate that Bloom filter variations, such as Count Min Sketch, can reduce the memory footprint of DRASiW models by up to 27% while maintaining performance and enabling distributed and federated learning strategies.

无权重神经网络(WNN)具有鲁棒性和计算效率高的特点,是联合学习(Federated Learning)的理想选择。这些应用场景要求模型内存占用小,并能聚合来自多个模型的知识。在这项工作中,我们展示了使用布鲁姆过滤器变体实现 DRASiW 模型的有效性--DRASiW 模型是对 WNN 的一种调整,可同时记录模式的存在和频率,并将内存使用量降至最低。在各种数据集上,DRASiW 模型与随机森林(Random Forest)、k-近邻(k-Nearest Neighbors)、多层感知器(Multi-layer Perceptron)和支持向量机(Support Vector Machines)等模型相比,在可接受的空间权衡条件下,表现出了极具竞争力的性能。此外,我们的研究结果表明,Bloom 过滤器的变化(如 Count Min Sketch)可将 DRASiW 模型的内存占用减少多达 27%,同时保持性能并支持分布式和联合学习策略。
{"title":"Memory-efficient DRASiW Models","authors":"","doi":"10.1016/j.neucom.2024.128443","DOIUrl":"10.1016/j.neucom.2024.128443","url":null,"abstract":"<div><p>Weightless Neural Networks (WNN) are ideal for Federated Learning due to their robustness and computational efficiency. These scenarios require models with a small memory footprint and the ability to aggregate knowledge from multiple models. In this work, we demonstrate the effectiveness of using Bloom filter variations to implement DRASiW models—an adaptation of WNN that records both the presence and frequency of patterns—with minimized memory usage. Across various datasets, DRASiW models show competitive performance compared to models like Random Forest, <span><math><mi>k</mi></math></span>-Nearest Neighbors, Multi-layer Perceptron, and Support Vector Machines, with an acceptable space trade-off. Furthermore, our findings indicate that Bloom filter variations, such as Count Min Sketch, can reduce the memory footprint of DRASiW models by up to 27% while maintaining performance and enabling distributed and federated learning strategies.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neurocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1