2022 International Joint Conference on Neural Networks (IJCNN)最新文献

英文中文

Dynamic Relation-Aware Multiple Instance Learning for Few-Shot Learning 基于动态关系感知的多实例学习

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892340

Kai Zheng, Liu Cheng, Jiehong Shen

Leveraging patch-level embedding in few-shot learning is widely studied by recent works. However, a fundamental challenge is that labels are actually assigned at image level, whereas patch-level annotations are missing. To deal with this problem, we observe that it exactly matches the applications of multiple instance learning (MIL) and novelly incorporate multiple instance learning with few-shot learning. Specifically, we propose a dynamic relation-aware multiple instance learning framework that explicitly models the spatial and semantic relation on instances and performs iterative aggregation. Extensive experiments demonstrate that the proposed method achieves competitive results compared with state-of-the-arts methods.

利用补丁级嵌入进行少镜头学习是近年来广泛研究的课题。然而，一个基本的挑战是标签实际上是在图像级别分配的，而缺少补丁级别的注释。为了解决这一问题，我们观察到它完全符合多实例学习(MIL)的应用，并且新颖地将多实例学习与少镜头学习结合起来。具体来说，我们提出了一个动态关系感知的多实例学习框架，该框架明确地对实例上的空间和语义关系进行建模，并进行迭代聚合。大量的实验表明，与目前最先进的方法相比，该方法取得了具有竞争力的结果。

引用次数: 0

Neurons Perception Dataset for RoboMaster AI Challenge RoboMaster AI挑战的神经元感知数据集

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892040

Haoran Li, Zicheng Duan, Jiaqi Li, Mingjun Ma, Yaran Chen, Dongbin Zhao

From virtual game to physical robot, games have witnessed the development of artificial intelligence (AI) technology, especially the data-driven technology represented by deep learning. Compared with virtual games, a physical robot game such as RoboMaster AI challenge needs to build a complete closed-loop architecture composed of perception, planning, control, and decision-making to support autonomous confrontation. Perception, as the eye of the robot, its performance in the complex environment depends on a massive dataset. Although there are many open perception datasets, these datasets are difficult to meet the needs of RoboMaster AI challenge due to the high dynamics of the task, the distinctiveness of the objects, and limited computing resources. In this paper, we release a dataset named Neurons11Neurons is a team dedicated to promoting the development of robot with deep neural network. We will release the code and dataset at https://github.com/DRL-CASIA/NeuronsDataset. perception dataset for RoboMaster AI challenge, which covers 3 tasks including monocular depth estimation, lightweight object detection, and multi-view 3D object detection, and makes up the data blank in this field. In addition, we also evaluate State-Of-The-Art (SOTA) methods on each task, hoping to provide an impartial benchmark for the development of perception algorithm.

从虚拟游戏到实体机器人，游戏见证了人工智能(AI)技术的发展，尤其是以深度学习为代表的数据驱动技术。与虚拟游戏相比，RoboMaster AI挑战等实体机器人游戏需要构建一个完整的由感知、规划、控制和决策组成的闭环架构，以支持自主对抗。感知作为机器人的眼睛，其在复杂环境中的表现依赖于海量的数据集。虽然有许多开放的感知数据集，但由于任务的高动态性、对象的独特性以及有限的计算资源，这些数据集难以满足RoboMaster AI挑战的需求。在本文中，我们发布了一个名为neurons11neuron的数据集，神经元是一个致力于推动机器人深度神经网络发展的团队。我们将在https://github.com/DRL-CASIA/NeuronsDataset上发布代码和数据集。RoboMaster AI挑战赛感知数据集，涵盖单目深度估计、轻量目标检测、多视角3D目标检测3个任务，填补了该领域的数据空白。此外，我们还在每个任务上评估了最先进的(SOTA)方法，希望为感知算法的发展提供一个公正的基准。

{"title":"Neurons Perception Dataset for RoboMaster AI Challenge","authors":"Haoran Li, Zicheng Duan, Jiaqi Li, Mingjun Ma, Yaran Chen, Dongbin Zhao","doi":"10.1109/IJCNN55064.2022.9892040","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892040","url":null,"abstract":"From virtual game to physical robot, games have witnessed the development of artificial intelligence (AI) technology, especially the data-driven technology represented by deep learning. Compared with virtual games, a physical robot game such as RoboMaster AI challenge needs to build a complete closed-loop architecture composed of perception, planning, control, and decision-making to support autonomous confrontation. Perception, as the eye of the robot, its performance in the complex environment depends on a massive dataset. Although there are many open perception datasets, these datasets are difficult to meet the needs of RoboMaster AI challenge due to the high dynamics of the task, the distinctiveness of the objects, and limited computing resources. In this paper, we release a dataset named Neurons11Neurons is a team dedicated to promoting the development of robot with deep neural network. We will release the code and dataset at https://github.com/DRL-CASIA/NeuronsDataset. perception dataset for RoboMaster AI challenge, which covers 3 tasks including monocular depth estimation, lightweight object detection, and multi-view 3D object detection, and makes up the data blank in this field. In addition, we also evaluate State-Of-The-Art (SOTA) methods on each task, hoping to provide an impartial benchmark for the development of perception algorithm.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124025415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Learning Generalisable Representations for Offline Signature Verification 学习离线签名验证的广义表示

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892224

Xianmu Cairang, Duojie Zhaxi, Xiaolong Yang, Yan Hou, Qijun Zhao, Dingguo Gao, Pubu Danzeng, Dorji Gesang

Current offline signature verification methods based on deep learning have achieved promising results, but these methods degrade greatly in cross-domain settings. An efficient offline signature verification model with both high performance and for deployment cross-domain without any adaptation. In this paper, we propose a novel approach to learning generalisable representations for offline signature verification. Firstly, we use the Siamese network combined with Triplet loss and Cross Entropy (CE) loss to learn discriminative features. Secondly, we introduce Instance Normalization (IN) into the network to cope with cross-domain discrepancies and propose an Inference Layer Normalization Neck (ILNNeck) module to further improve model generalization. We evalute the method on our self-collected Multilingual Signature dataset (MLSig) and three public datasets: BHSig-H, BHSig-B, and CEDAR. Results show that while our method achieves comparable results in single-domain setting, it is obviously superior to state-of-the-art methods in cross-domain setting.

目前基于深度学习的离线签名验证方法已经取得了不错的效果，但这些方法在跨域环境下性能下降很大。一种高效的离线签名验证模型，具有高性能、跨域部署、无需自适应的特点。在本文中，我们提出了一种学习脱机签名验证的广义表示的新方法。首先，我们使用Siamese网络结合三重损失和交叉熵(CE)损失学习判别特征。其次，我们在网络中引入实例归一化(IN)来处理跨域差异，并提出了推理层归一化颈(ILNNeck)模块来进一步提高模型的泛化能力。我们在自己收集的多语言签名数据集(MLSig)和三个公共数据集(BHSig-H、BHSig-B和CEDAR)上对该方法进行了评估。结果表明，虽然我们的方法在单域环境下取得了相当的结果，但在跨域环境下明显优于最先进的方法。

{"title":"Learning Generalisable Representations for Offline Signature Verification","authors":"Xianmu Cairang, Duojie Zhaxi, Xiaolong Yang, Yan Hou, Qijun Zhao, Dingguo Gao, Pubu Danzeng, Dorji Gesang","doi":"10.1109/IJCNN55064.2022.9892224","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892224","url":null,"abstract":"Current offline signature verification methods based on deep learning have achieved promising results, but these methods degrade greatly in cross-domain settings. An efficient offline signature verification model with both high performance and for deployment cross-domain without any adaptation. In this paper, we propose a novel approach to learning generalisable representations for offline signature verification. Firstly, we use the Siamese network combined with Triplet loss and Cross Entropy (CE) loss to learn discriminative features. Secondly, we introduce Instance Normalization (IN) into the network to cope with cross-domain discrepancies and propose an Inference Layer Normalization Neck (ILNNeck) module to further improve model generalization. We evalute the method on our self-collected Multilingual Signature dataset (MLSig) and three public datasets: BHSig-H, BHSig-B, and CEDAR. Results show that while our method achieves comparable results in single-domain setting, it is obviously superior to state-of-the-art methods in cross-domain setting.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123346139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Model-Agnostic Causal Principle for Unbiased KPI Anomaly Detection 无偏KPI异常检测的模型不可知因果原理

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892664

Jiemin Ji, D. Guan, Yuwen Deng, Weiwei Yuan

KPI anomaly detection plays an important role in operation and maintenance. Due to incomplete or missing labels are common, methods based on VAE (i.e., Variational Auto-Encoder) is widely used. These methods assume that the normal patterns, which is in majority, will be learned, but this assumption is not easy to satisfy since abnormal patterns are inevitably embedded. Existing debias methods merely utilize anomalous labels to eliminate bias in the decoding process, but latent representation generated by the encoder could still be biased and even ill-defined when input KPIs are too abnormal. We propose a model-agnostic causal principle to make the above VAE-based models unbiased. When modifying ELBO (i.e., evidence of lower bound) to utilize anomalous labels, our causal principle indicates that the anomalous labels are confounders between training data and learned representations, leading to the aforementioned bias. Our principle also implements a do-operation to cut off the causal path from anomaly labels to training data. Through do-operation, we can eliminate the anomaly bias in the encoder and reconstruct normal patterns more frequently in the decoder. Our proposed causal improvement on existing VAE-based models, CausalDonut and CausalBagel, improve F1-score up to 5% compared to Donut and Bagel as well as surpassing state-of-the-art supervised and unsupervised models. To empirically prove the debias capability of our method, we also provide a comparison of anomaly scores between the baselines and our models. In addition, the learning process of our principle is interpreted from an entropy perspective.

KPI异常检测在运维中起着重要的作用。由于标签不完整或缺失是常见的，基于VAE(即变分自编码器)的方法被广泛使用。这些方法假设正常模式(占大多数)将被学习，但这种假设不容易满足，因为异常模式不可避免地被嵌入。现有的去偏方法仅仅是利用异常标签来消除解码过程中的偏差，但当输入kpi过于异常时，编码器产生的潜在表征仍然可能存在偏差甚至不定义。我们提出了一个模型不可知的因果原则，以使上述基于模型的模型无偏。当修改ELBO(即下界证据)以利用异常标签时，我们的因果原则表明，异常标签是训练数据和学习表征之间的混杂因素，导致上述偏差。我们的原理还实现了一个do-operation来切断从异常标签到训练数据的因果路径。通过do-operation，我们可以消除编码器中的异常偏置，并在解码器中更频繁地重建正常模式。我们提出了对现有基于ae的模型CausalDonut和CausalBagel的因果改进，与Donut和Bagel相比，f1得分提高了5%，并且超过了最先进的监督和无监督模型。为了从经验上证明我们的方法的去偏能力，我们还提供了基线和我们的模型之间的异常分数的比较。此外，从熵的角度解释了我们原理的学习过程。

{"title":"Model-Agnostic Causal Principle for Unbiased KPI Anomaly Detection","authors":"Jiemin Ji, D. Guan, Yuwen Deng, Weiwei Yuan","doi":"10.1109/IJCNN55064.2022.9892664","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892664","url":null,"abstract":"KPI anomaly detection plays an important role in operation and maintenance. Due to incomplete or missing labels are common, methods based on VAE (i.e., Variational Auto-Encoder) is widely used. These methods assume that the normal patterns, which is in majority, will be learned, but this assumption is not easy to satisfy since abnormal patterns are inevitably embedded. Existing debias methods merely utilize anomalous labels to eliminate bias in the decoding process, but latent representation generated by the encoder could still be biased and even ill-defined when input KPIs are too abnormal. We propose a model-agnostic causal principle to make the above VAE-based models unbiased. When modifying ELBO (i.e., evidence of lower bound) to utilize anomalous labels, our causal principle indicates that the anomalous labels are confounders between training data and learned representations, leading to the aforementioned bias. Our principle also implements a do-operation to cut off the causal path from anomaly labels to training data. Through do-operation, we can eliminate the anomaly bias in the encoder and reconstruct normal patterns more frequently in the decoder. Our proposed causal improvement on existing VAE-based models, CausalDonut and CausalBagel, improve F1-score up to 5% compared to Donut and Bagel as well as surpassing state-of-the-art supervised and unsupervised models. To empirically prove the debias capability of our method, we also provide a comparison of anomaly scores between the baselines and our models. In addition, the learning process of our principle is interpreted from an entropy perspective.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123377184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HE-SNE: Heterogeneous Event Sequence-based Streaming Network Embedding for Dynamic Behaviors HE-SNE:基于异构事件序列的动态行为流网络嵌入

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892872

Yifan Wang, Jianhao Shen, Yiping Song, Sheng Wang, Ming Zhang

Large amounts of user behavior data provide opportunities for user behavior modeling and have great potential in many downstream applications such as advertising and anomaly detection. Compared with traditional methods, embedding-based methods are used more often recently because of their efficiency and scalability. These methods build a “behavior-entity” bipartite graph and learn static embeddings for nodes in the graph. However, behavior patterns in the real world could not be static because entity properties such as user interests usually evolve along with time. In this paper, we formulate user behaviors as a temporal event sequence and propose a stream network embedding approach to capture the evolving nature of user behaviors. Representation of each event is built and used to update the embeddings of nodes. Two contextual behavior modeling tasks are studied for dynamic user behaviors, and experimental results with real-world data demonstrate the effectiveness of our proposed approach over several competitive baselines.

大量的用户行为数据为用户行为建模提供了机会，在广告和异常检测等许多下游应用中具有巨大的潜力。与传统方法相比，基于嵌入的方法以其高效和可扩展性得到了广泛的应用。这些方法构建了一个“行为-实体”二部图，并学习图中节点的静态嵌入。然而，现实世界中的行为模式不可能是静态的，因为用户兴趣等实体属性通常会随着时间的推移而变化。在本文中，我们将用户行为表述为一个时间事件序列，并提出了一种流网络嵌入方法来捕捉用户行为的演变本质。构建每个事件的表示并用于更新节点的嵌入。研究了动态用户行为的两个上下文行为建模任务，使用真实世界数据的实验结果证明了我们提出的方法在几个竞争基线上的有效性。

引用次数: 2

Hypergraph Neural Network Hawkes Process 超图神经网络霍克过程

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892328

Zibo Cheng, Jian-wei Liu, Ze Cao

In real-world application, the temporal asynchronous event sequences are ubiquitous, such as social network, financial engineering, and medical diagonostics, and so on. These data usually show certain intrinsic high-order dependency characteristics. To this end, we propose a hypergraph neural network Hawkes process (HGHP) model, which can extract the high-order correlation from the data through the hypergraph neural network and encode dependent relationships into the hypergraph structure. When processing event sequence data, this method obtains the correlation matrix between different events through hyperedge convolution, and then obtains the latent representation for the event sequence based on the correlation between the data. We conduct experiments on multiple public datasets. Our proposed HGHP model achieves 86.6% accuracy on MIMIC-II dataset, 62.42% on Financial dataset, and 46.79% on Stackoverflow, which is outperforming existing baseline models.

在实际应用中，时间异步事件序列是普遍存在的，例如社会网络、金融工程和医疗诊断等。这些数据通常表现出某些内在的高阶依赖性特征。为此，我们提出了一种超图神经网络Hawkes过程(hypergraph neural network Hawkes process, HGHP)模型，该模型可以通过超图神经网络从数据中提取高阶相关性，并将依赖关系编码到超图结构中。该方法在处理事件序列数据时，通过超边缘卷积得到不同事件之间的相关矩阵，然后根据数据之间的相关性得到事件序列的潜在表示。我们在多个公共数据集上进行实验。我们提出的HGHP模型在MIMIC-II数据集上的准确率为86.6%，在Financial数据集上的准确率为62.42%，在Stackoverflow上的准确率为46.79%，优于现有的基线模型。

引用次数: 0

Pulmonary Nodule Classification with Multi-View Convolutional Vision Transformer 基于多视点卷积视觉变压器的肺结节分类

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892716

Yuxuan Xiong, Bo Du, Yongchao Xu, J. Deng, Y. She, Chang Chen

Pulmonary nodule classification from computerized tomography(CT) Scans is a vital task for the early screening of Lung cancers. The algorithm is aiming at distinguishing malignant pulmonary nodules, benign nodules and the ones with their subtypes. In this paper, we defined a detailed pulmonary nodule classification task considering 5 semantic labels. We are facing with a series of non-trival problems dealing with such a task. First, the available medical image data for training is quite limited. We enlarged the training dataset by cropping out three-dimension(3D) volume of each pulmonary nodule and generating 15 planes with different orientations from these volumes. Secondly, the global modeling ability of the existing convolutional neural network(CNN) based architectures can not meet the need of medical image analysis well. To learn discriminative abstract information, we down-sample feature maps between successive stages and adopt the BotNet-50 backbone which is a combination of ResNet backbone and self-attention modules. Such an architecture can extract local and non-local information in low-level and high-level layers, respectively. Last but not the least, the data distribution of training data and testing data don't share similar distribution in real-world multi-center medical image classification scenes. We assigned the samples with modified wights while calculating the loss value for optimization. The proposed method can eliminate the spurious correlation between features and labels. Experiments demonstrate the effectiveness of each component.

计算机断层扫描(CT)对肺结节的分类是肺癌早期筛查的一项重要任务。该算法旨在区分恶性肺结节、良性肺结节及其亚型。在本文中，我们定义了一个详细的肺结节分类任务，考虑5个语义标签。处理这样一项任务，我们面临着一系列不容忽视的问题。首先，可用于训练的医学图像数据非常有限。我们通过裁剪出每个肺结节的三维(3D)体积并从这些体积中生成15个不同方向的平面来扩大训练数据集。其次，现有基于卷积神经网络(CNN)的体系结构的全局建模能力不能很好地满足医学图像分析的需要。为了学习判别抽象信息，我们对连续阶段之间的特征映射进行了下采样，并采用了BotNet-50骨干网，该骨干网结合了ResNet骨干网和自关注模块。这种体系结构可以分别在低级和高级层中提取本地和非本地信息。最后，在真实的多中心医学图像分类场景中，训练数据和测试数据的数据分布并不相似。在计算优化损失值的同时，对样本进行了修改后的权重赋值。该方法可以消除特征和标签之间的虚假相关。实验证明了各部分的有效性。

{"title":"Pulmonary Nodule Classification with Multi-View Convolutional Vision Transformer","authors":"Yuxuan Xiong, Bo Du, Yongchao Xu, J. Deng, Y. She, Chang Chen","doi":"10.1109/IJCNN55064.2022.9892716","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892716","url":null,"abstract":"Pulmonary nodule classification from computerized tomography(CT) Scans is a vital task for the early screening of Lung cancers. The algorithm is aiming at distinguishing malignant pulmonary nodules, benign nodules and the ones with their subtypes. In this paper, we defined a detailed pulmonary nodule classification task considering 5 semantic labels. We are facing with a series of non-trival problems dealing with such a task. First, the available medical image data for training is quite limited. We enlarged the training dataset by cropping out three-dimension(3D) volume of each pulmonary nodule and generating 15 planes with different orientations from these volumes. Secondly, the global modeling ability of the existing convolutional neural network(CNN) based architectures can not meet the need of medical image analysis well. To learn discriminative abstract information, we down-sample feature maps between successive stages and adopt the BotNet-50 backbone which is a combination of ResNet backbone and self-attention modules. Such an architecture can extract local and non-local information in low-level and high-level layers, respectively. Last but not the least, the data distribution of training data and testing data don't share similar distribution in real-world multi-center medical image classification scenes. We assigned the samples with modified wights while calculating the loss value for optimization. The proposed method can eliminate the spurious correlation between features and labels. Experiments demonstrate the effectiveness of each component.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123685082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

FCH-TTS: Fast, Controllable and High-quality Non-Autoregressive Text-to-Speech Synthesis FCH-TTS:快速，可控和高质量的非自回归文本到语音合成

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892512

Xun Zhou, Zhiyang Zhou, Xiaodon Shi

Inspired by the success of the non-autoregressive speech synthesis model FastSpeech, we propose FCH-TTS, a fast, controllable and universal neural text-to-speech (TTS) capable of generating high-quality spectrograms. The basic architecture of FCH-TTS is similar to that of FastSpeech, but FCH-TTS uses a simple yet effective attention-based soft alignment mechanism to replace the complex teacher model in FastSpeech, allowing the model to be better adapted to different languages. Specifically, in addition to the control of voice speed and prosody, a fusion module has been designed to better model speaker features in order to obtain the desired timbre. Meanwhile, several special loss functions were applied to ensure the quality of the output mel-spectrogram. Experimental results on the dataset LJSpeech show that FCH-TTS achieves the fastest inference speed compared to all baseline models, while also achieving the best speech quality. In addition, the controllability of the model with respect to prosody, voice speed and timbre was validated on several datasets, and the good performance on the low-resource Tibetan dataset demonstrates the universality of the model.

受非自回归语音合成模型FastSpeech成功的启发，我们提出了FCH-TTS，一种快速、可控和通用的神经文本到语音(TTS)，能够生成高质量的频谱图。FCH-TTS的基本架构与FastSpeech类似，但FCH-TTS使用了一种简单而有效的基于注意力的软对齐机制来取代FastSpeech中复杂的教师模型，使该模型能够更好地适应不同的语言。具体来说，除了控制语速和韵律外，还设计了一个融合模块来更好地模拟扬声器的特征，以获得所需的音色。同时，采用了几种特殊的损失函数来保证输出的mel谱图的质量。在LJSpeech数据集上的实验结果表明，与所有基线模型相比，FCH-TTS获得了最快的推理速度，同时也获得了最好的语音质量。此外，在多个数据集上验证了该模型在韵律、语速和音色方面的可控性，在低资源藏语数据集上的良好表现证明了该模型的通用性。

{"title":"FCH-TTS: Fast, Controllable and High-quality Non-Autoregressive Text-to-Speech Synthesis","authors":"Xun Zhou, Zhiyang Zhou, Xiaodon Shi","doi":"10.1109/IJCNN55064.2022.9892512","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892512","url":null,"abstract":"Inspired by the success of the non-autoregressive speech synthesis model FastSpeech, we propose FCH-TTS, a fast, controllable and universal neural text-to-speech (TTS) capable of generating high-quality spectrograms. The basic architecture of FCH-TTS is similar to that of FastSpeech, but FCH-TTS uses a simple yet effective attention-based soft alignment mechanism to replace the complex teacher model in FastSpeech, allowing the model to be better adapted to different languages. Specifically, in addition to the control of voice speed and prosody, a fusion module has been designed to better model speaker features in order to obtain the desired timbre. Meanwhile, several special loss functions were applied to ensure the quality of the output mel-spectrogram. Experimental results on the dataset LJSpeech show that FCH-TTS achieves the fastest inference speed compared to all baseline models, while also achieving the best speech quality. In addition, the controllability of the model with respect to prosody, voice speed and timbre was validated on several datasets, and the good performance on the low-resource Tibetan dataset demonstrates the universality of the model.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114317102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Shaping the Ultra-Selectivity of a Looming Detection Neural Network from Non-linear Correlation of Radial Motion 基于径向运动非线性相关的逼近检测神经网络的超选择性塑造

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892408

Mu Hua, Qinbing Fu, Jigen Peng, Shigang Yue, Hao Luan

In this paper, a numerical neural network inspired by the lobula plate/lobula columnar type II (LPLC2), the ultra-selective looming sensitive neurons identified within visual system of Drosophila, is proposed utilising non-linear computation. This method aims to be one of the explorations towards solving the collision perception problem resulted from radial motion. Taking inspiration from the distinctive structure and placement of directionally selective neurons (DSNs) named T4/T5 interneurons and their post-synaptic neurons, the motion opponency along four cardinal directions is computed in a non-linear way and subsequently mapped into four quadrants. More precisely, local motion excites adjacent neurons ahead of the ongoing motion, whilst transfers inhibitory signals to presently-excited neurons with slight temporal delay. From comparative experimental results collected, the main contribution is established by sculpting the ultra-selective features of generating a vast majority of responses to dark centroid-emanated centrifugal motion patterns whilst remaining nearly silent to those starting from other quadrants of receptive field (RF). The proposed method also distinguishes relatively dark approaching objects against brighter background and light ones against dark background via exploiting ON/OFF parallel channels, which well fits the physiological findings. Accordingly, the proposed neural network consolidates the theory of non-linear computation in Drosophila's visual system, a prominent paradigm for studying biological motion perception. This research also demonstrates potential of being fused with attention mechanism towards utility in devices such as unmanned aerial vehicles (UAVs), protecting them from unexpected and imminent collision by calculating a safer flying pathway.

本文以果蝇视觉系统中发现的超选择性隐现敏感神经元小叶板/小叶柱型II (LPLC2)为灵感，利用非线性计算方法提出了一个数值神经网络。该方法旨在解决由径向运动引起的碰撞感知问题的探索之一。从被称为T4/T5中间神经元及其突触后神经元的定向选择神经元(dsn)的独特结构和位置得到灵感，沿着四个基本方向的运动对抗以非线性方式计算，随后映射到四个象限。更准确地说，局部运动在正在进行的运动之前刺激邻近的神经元，同时将抑制信号以轻微的时间延迟传递到当前兴奋的神经元。从收集的比较实验结果来看，主要贡献是通过塑造对暗质心发出的离心运动模式产生绝大多数响应的超选择性特征，而对那些从接受野(RF)的其他象限开始的响应几乎保持沉默。该方法还利用ON/OFF并行通道对较亮背景下的较暗接近物体和较暗背景下的较亮接近物体进行区分，与生理研究结果吻合。因此，所提出的神经网络巩固了果蝇视觉系统的非线性计算理论，这是研究生物运动感知的一个重要范式。该研究还展示了将注意力机制融合到无人机等设备中的潜力，通过计算更安全的飞行路径来保护它们免受意外和即将发生的碰撞。

{"title":"Shaping the Ultra-Selectivity of a Looming Detection Neural Network from Non-linear Correlation of Radial Motion","authors":"Mu Hua, Qinbing Fu, Jigen Peng, Shigang Yue, Hao Luan","doi":"10.1109/IJCNN55064.2022.9892408","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892408","url":null,"abstract":"In this paper, a numerical neural network inspired by the lobula plate/lobula columnar type II (LPLC2), the ultra-selective looming sensitive neurons identified within visual system of Drosophila, is proposed utilising non-linear computation. This method aims to be one of the explorations towards solving the collision perception problem resulted from radial motion. Taking inspiration from the distinctive structure and placement of directionally selective neurons (DSNs) named T4/T5 interneurons and their post-synaptic neurons, the motion opponency along four cardinal directions is computed in a non-linear way and subsequently mapped into four quadrants. More precisely, local motion excites adjacent neurons ahead of the ongoing motion, whilst transfers inhibitory signals to presently-excited neurons with slight temporal delay. From comparative experimental results collected, the main contribution is established by sculpting the ultra-selective features of generating a vast majority of responses to dark centroid-emanated centrifugal motion patterns whilst remaining nearly silent to those starting from other quadrants of receptive field (RF). The proposed method also distinguishes relatively dark approaching objects against brighter background and light ones against dark background via exploiting ON/OFF parallel channels, which well fits the physiological findings. Accordingly, the proposed neural network consolidates the theory of non-linear computation in Drosophila's visual system, a prominent paradigm for studying biological motion perception. This research also demonstrates potential of being fused with attention mechanism towards utility in devices such as unmanned aerial vehicles (UAVs), protecting them from unexpected and imminent collision by calculating a safer flying pathway.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121697874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Nested compression of convolutional neural networks with Tucker-2 decomposition 基于Tucker-2分解的卷积神经网络嵌套压缩

2022 International Joint Conference on Neural Networks (IJCNN)

Pub Date : 2022-07-18 DOI: 10.1109/IJCNN55064.2022.9892959

R. Zdunek, M. Gábor

The topic of convolutional neural networks (CNN) compression has attracted increasing attention as new generations of neural networks become larger and require more and more computing performance. This computational problem can be solved by representing the weights of a neural network with low-rank factors using matrix/tensor decomposition methods. This study presents a novel concept for compressing neural networks using nested low-rank decomposition methods. In this approach, we alternately perform decomposition of the neural network weights with fine-tuning of the network. The numerical experiments are performed on various CNN architectures, ranging from small-scale LeNet-5 trained on the MNIST dataset, through medium-scale ResNet-20, ResNet-56, and up to large-scale VGG-16, VGG-19 trained on the CIFAR-10 dataset. The obtained results show that using the nested compression, we can achieve much higher parameter and FLOPS compression with a minor drop in classification accuracy.

随着新一代神经网络的规模越来越大，对计算性能的要求也越来越高，卷积神经网络(CNN)压缩问题越来越受到人们的关注。这个计算问题可以通过使用矩阵/张量分解方法表示具有低秩因子的神经网络的权重来解决。本研究提出了一种利用嵌套低秩分解方法压缩神经网络的新概念。在这种方法中，我们交替进行神经网络权重的分解和网络的微调。在不同的CNN架构上进行了数值实验，从在MNIST数据集上训练的小规模LeNet-5，到中等规模的ResNet-20、ResNet-56，再到在CIFAR-10数据集上训练的大规模VGG-16、VGG-19。实验结果表明，采用嵌套压缩方法，可以实现更高的参数和FLOPS压缩，分类精度略有下降。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 International Joint Conference on Neural Networks (IJCNN)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀