Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892340
Kai Zheng, Liu Cheng, Jiehong Shen
Leveraging patch-level embedding in few-shot learning is widely studied by recent works. However, a fundamental challenge is that labels are actually assigned at image level, whereas patch-level annotations are missing. To deal with this problem, we observe that it exactly matches the applications of multiple instance learning (MIL) and novelly incorporate multiple instance learning with few-shot learning. Specifically, we propose a dynamic relation-aware multiple instance learning framework that explicitly models the spatial and semantic relation on instances and performs iterative aggregation. Extensive experiments demonstrate that the proposed method achieves competitive results compared with state-of-the-arts methods.
{"title":"Dynamic Relation-Aware Multiple Instance Learning for Few-Shot Learning","authors":"Kai Zheng, Liu Cheng, Jiehong Shen","doi":"10.1109/IJCNN55064.2022.9892340","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892340","url":null,"abstract":"Leveraging patch-level embedding in few-shot learning is widely studied by recent works. However, a fundamental challenge is that labels are actually assigned at image level, whereas patch-level annotations are missing. To deal with this problem, we observe that it exactly matches the applications of multiple instance learning (MIL) and novelly incorporate multiple instance learning with few-shot learning. Specifically, we propose a dynamic relation-aware multiple instance learning framework that explicitly models the spatial and semantic relation on instances and performs iterative aggregation. Extensive experiments demonstrate that the proposed method achieves competitive results compared with state-of-the-arts methods.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123996130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
From virtual game to physical robot, games have witnessed the development of artificial intelligence (AI) technology, especially the data-driven technology represented by deep learning. Compared with virtual games, a physical robot game such as RoboMaster AI challenge needs to build a complete closed-loop architecture composed of perception, planning, control, and decision-making to support autonomous confrontation. Perception, as the eye of the robot, its performance in the complex environment depends on a massive dataset. Although there are many open perception datasets, these datasets are difficult to meet the needs of RoboMaster AI challenge due to the high dynamics of the task, the distinctiveness of the objects, and limited computing resources. In this paper, we release a dataset named Neurons11Neurons is a team dedicated to promoting the development of robot with deep neural network. We will release the code and dataset at https://github.com/DRL-CASIA/NeuronsDataset. perception dataset for RoboMaster AI challenge, which covers 3 tasks including monocular depth estimation, lightweight object detection, and multi-view 3D object detection, and makes up the data blank in this field. In addition, we also evaluate State-Of-The-Art (SOTA) methods on each task, hoping to provide an impartial benchmark for the development of perception algorithm.
{"title":"Neurons Perception Dataset for RoboMaster AI Challenge","authors":"Haoran Li, Zicheng Duan, Jiaqi Li, Mingjun Ma, Yaran Chen, Dongbin Zhao","doi":"10.1109/IJCNN55064.2022.9892040","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892040","url":null,"abstract":"From virtual game to physical robot, games have witnessed the development of artificial intelligence (AI) technology, especially the data-driven technology represented by deep learning. Compared with virtual games, a physical robot game such as RoboMaster AI challenge needs to build a complete closed-loop architecture composed of perception, planning, control, and decision-making to support autonomous confrontation. Perception, as the eye of the robot, its performance in the complex environment depends on a massive dataset. Although there are many open perception datasets, these datasets are difficult to meet the needs of RoboMaster AI challenge due to the high dynamics of the task, the distinctiveness of the objects, and limited computing resources. In this paper, we release a dataset named Neurons11Neurons is a team dedicated to promoting the development of robot with deep neural network. We will release the code and dataset at https://github.com/DRL-CASIA/NeuronsDataset. perception dataset for RoboMaster AI challenge, which covers 3 tasks including monocular depth estimation, lightweight object detection, and multi-view 3D object detection, and makes up the data blank in this field. In addition, we also evaluate State-Of-The-Art (SOTA) methods on each task, hoping to provide an impartial benchmark for the development of perception algorithm.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124025415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current offline signature verification methods based on deep learning have achieved promising results, but these methods degrade greatly in cross-domain settings. An efficient offline signature verification model with both high performance and for deployment cross-domain without any adaptation. In this paper, we propose a novel approach to learning generalisable representations for offline signature verification. Firstly, we use the Siamese network combined with Triplet loss and Cross Entropy (CE) loss to learn discriminative features. Secondly, we introduce Instance Normalization (IN) into the network to cope with cross-domain discrepancies and propose an Inference Layer Normalization Neck (ILNNeck) module to further improve model generalization. We evalute the method on our self-collected Multilingual Signature dataset (MLSig) and three public datasets: BHSig-H, BHSig-B, and CEDAR. Results show that while our method achieves comparable results in single-domain setting, it is obviously superior to state-of-the-art methods in cross-domain setting.
{"title":"Learning Generalisable Representations for Offline Signature Verification","authors":"Xianmu Cairang, Duojie Zhaxi, Xiaolong Yang, Yan Hou, Qijun Zhao, Dingguo Gao, Pubu Danzeng, Dorji Gesang","doi":"10.1109/IJCNN55064.2022.9892224","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892224","url":null,"abstract":"Current offline signature verification methods based on deep learning have achieved promising results, but these methods degrade greatly in cross-domain settings. An efficient offline signature verification model with both high performance and for deployment cross-domain without any adaptation. In this paper, we propose a novel approach to learning generalisable representations for offline signature verification. Firstly, we use the Siamese network combined with Triplet loss and Cross Entropy (CE) loss to learn discriminative features. Secondly, we introduce Instance Normalization (IN) into the network to cope with cross-domain discrepancies and propose an Inference Layer Normalization Neck (ILNNeck) module to further improve model generalization. We evalute the method on our self-collected Multilingual Signature dataset (MLSig) and three public datasets: BHSig-H, BHSig-B, and CEDAR. Results show that while our method achieves comparable results in single-domain setting, it is obviously superior to state-of-the-art methods in cross-domain setting.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123346139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892664
Jiemin Ji, D. Guan, Yuwen Deng, Weiwei Yuan
KPI anomaly detection plays an important role in operation and maintenance. Due to incomplete or missing labels are common, methods based on VAE (i.e., Variational Auto-Encoder) is widely used. These methods assume that the normal patterns, which is in majority, will be learned, but this assumption is not easy to satisfy since abnormal patterns are inevitably embedded. Existing debias methods merely utilize anomalous labels to eliminate bias in the decoding process, but latent representation generated by the encoder could still be biased and even ill-defined when input KPIs are too abnormal. We propose a model-agnostic causal principle to make the above VAE-based models unbiased. When modifying ELBO (i.e., evidence of lower bound) to utilize anomalous labels, our causal principle indicates that the anomalous labels are confounders between training data and learned representations, leading to the aforementioned bias. Our principle also implements a do-operation to cut off the causal path from anomaly labels to training data. Through do-operation, we can eliminate the anomaly bias in the encoder and reconstruct normal patterns more frequently in the decoder. Our proposed causal improvement on existing VAE-based models, CausalDonut and CausalBagel, improve F1-score up to 5% compared to Donut and Bagel as well as surpassing state-of-the-art supervised and unsupervised models. To empirically prove the debias capability of our method, we also provide a comparison of anomaly scores between the baselines and our models. In addition, the learning process of our principle is interpreted from an entropy perspective.
{"title":"Model-Agnostic Causal Principle for Unbiased KPI Anomaly Detection","authors":"Jiemin Ji, D. Guan, Yuwen Deng, Weiwei Yuan","doi":"10.1109/IJCNN55064.2022.9892664","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892664","url":null,"abstract":"KPI anomaly detection plays an important role in operation and maintenance. Due to incomplete or missing labels are common, methods based on VAE (i.e., Variational Auto-Encoder) is widely used. These methods assume that the normal patterns, which is in majority, will be learned, but this assumption is not easy to satisfy since abnormal patterns are inevitably embedded. Existing debias methods merely utilize anomalous labels to eliminate bias in the decoding process, but latent representation generated by the encoder could still be biased and even ill-defined when input KPIs are too abnormal. We propose a model-agnostic causal principle to make the above VAE-based models unbiased. When modifying ELBO (i.e., evidence of lower bound) to utilize anomalous labels, our causal principle indicates that the anomalous labels are confounders between training data and learned representations, leading to the aforementioned bias. Our principle also implements a do-operation to cut off the causal path from anomaly labels to training data. Through do-operation, we can eliminate the anomaly bias in the encoder and reconstruct normal patterns more frequently in the decoder. Our proposed causal improvement on existing VAE-based models, CausalDonut and CausalBagel, improve F1-score up to 5% compared to Donut and Bagel as well as surpassing state-of-the-art supervised and unsupervised models. To empirically prove the debias capability of our method, we also provide a comparison of anomaly scores between the baselines and our models. In addition, the learning process of our principle is interpreted from an entropy perspective.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123377184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892872
Yifan Wang, Jianhao Shen, Yiping Song, Sheng Wang, Ming Zhang
Large amounts of user behavior data provide opportunities for user behavior modeling and have great potential in many downstream applications such as advertising and anomaly detection. Compared with traditional methods, embedding-based methods are used more often recently because of their efficiency and scalability. These methods build a “behavior-entity” bipartite graph and learn static embeddings for nodes in the graph. However, behavior patterns in the real world could not be static because entity properties such as user interests usually evolve along with time. In this paper, we formulate user behaviors as a temporal event sequence and propose a stream network embedding approach to capture the evolving nature of user behaviors. Representation of each event is built and used to update the embeddings of nodes. Two contextual behavior modeling tasks are studied for dynamic user behaviors, and experimental results with real-world data demonstrate the effectiveness of our proposed approach over several competitive baselines.
{"title":"HE-SNE: Heterogeneous Event Sequence-based Streaming Network Embedding for Dynamic Behaviors","authors":"Yifan Wang, Jianhao Shen, Yiping Song, Sheng Wang, Ming Zhang","doi":"10.1109/IJCNN55064.2022.9892872","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892872","url":null,"abstract":"Large amounts of user behavior data provide opportunities for user behavior modeling and have great potential in many downstream applications such as advertising and anomaly detection. Compared with traditional methods, embedding-based methods are used more often recently because of their efficiency and scalability. These methods build a “behavior-entity” bipartite graph and learn static embeddings for nodes in the graph. However, behavior patterns in the real world could not be static because entity properties such as user interests usually evolve along with time. In this paper, we formulate user behaviors as a temporal event sequence and propose a stream network embedding approach to capture the evolving nature of user behaviors. Representation of each event is built and used to update the embeddings of nodes. Two contextual behavior modeling tasks are studied for dynamic user behaviors, and experimental results with real-world data demonstrate the effectiveness of our proposed approach over several competitive baselines.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123510382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892328
Zibo Cheng, Jian-wei Liu, Ze Cao
In real-world application, the temporal asynchronous event sequences are ubiquitous, such as social network, financial engineering, and medical diagonostics, and so on. These data usually show certain intrinsic high-order dependency characteristics. To this end, we propose a hypergraph neural network Hawkes process (HGHP) model, which can extract the high-order correlation from the data through the hypergraph neural network and encode dependent relationships into the hypergraph structure. When processing event sequence data, this method obtains the correlation matrix between different events through hyperedge convolution, and then obtains the latent representation for the event sequence based on the correlation between the data. We conduct experiments on multiple public datasets. Our proposed HGHP model achieves 86.6% accuracy on MIMIC-II dataset, 62.42% on Financial dataset, and 46.79% on Stackoverflow, which is outperforming existing baseline models.
{"title":"Hypergraph Neural Network Hawkes Process","authors":"Zibo Cheng, Jian-wei Liu, Ze Cao","doi":"10.1109/IJCNN55064.2022.9892328","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892328","url":null,"abstract":"In real-world application, the temporal asynchronous event sequences are ubiquitous, such as social network, financial engineering, and medical diagonostics, and so on. These data usually show certain intrinsic high-order dependency characteristics. To this end, we propose a hypergraph neural network Hawkes process (HGHP) model, which can extract the high-order correlation from the data through the hypergraph neural network and encode dependent relationships into the hypergraph structure. When processing event sequence data, this method obtains the correlation matrix between different events through hyperedge convolution, and then obtains the latent representation for the event sequence based on the correlation between the data. We conduct experiments on multiple public datasets. Our proposed HGHP model achieves 86.6% accuracy on MIMIC-II dataset, 62.42% on Financial dataset, and 46.79% on Stackoverflow, which is outperforming existing baseline models.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123526450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892716
Yuxuan Xiong, Bo Du, Yongchao Xu, J. Deng, Y. She, Chang Chen
Pulmonary nodule classification from computerized tomography(CT) Scans is a vital task for the early screening of Lung cancers. The algorithm is aiming at distinguishing malignant pulmonary nodules, benign nodules and the ones with their subtypes. In this paper, we defined a detailed pulmonary nodule classification task considering 5 semantic labels. We are facing with a series of non-trival problems dealing with such a task. First, the available medical image data for training is quite limited. We enlarged the training dataset by cropping out three-dimension(3D) volume of each pulmonary nodule and generating 15 planes with different orientations from these volumes. Secondly, the global modeling ability of the existing convolutional neural network(CNN) based architectures can not meet the need of medical image analysis well. To learn discriminative abstract information, we down-sample feature maps between successive stages and adopt the BotNet-50 backbone which is a combination of ResNet backbone and self-attention modules. Such an architecture can extract local and non-local information in low-level and high-level layers, respectively. Last but not the least, the data distribution of training data and testing data don't share similar distribution in real-world multi-center medical image classification scenes. We assigned the samples with modified wights while calculating the loss value for optimization. The proposed method can eliminate the spurious correlation between features and labels. Experiments demonstrate the effectiveness of each component.
{"title":"Pulmonary Nodule Classification with Multi-View Convolutional Vision Transformer","authors":"Yuxuan Xiong, Bo Du, Yongchao Xu, J. Deng, Y. She, Chang Chen","doi":"10.1109/IJCNN55064.2022.9892716","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892716","url":null,"abstract":"Pulmonary nodule classification from computerized tomography(CT) Scans is a vital task for the early screening of Lung cancers. The algorithm is aiming at distinguishing malignant pulmonary nodules, benign nodules and the ones with their subtypes. In this paper, we defined a detailed pulmonary nodule classification task considering 5 semantic labels. We are facing with a series of non-trival problems dealing with such a task. First, the available medical image data for training is quite limited. We enlarged the training dataset by cropping out three-dimension(3D) volume of each pulmonary nodule and generating 15 planes with different orientations from these volumes. Secondly, the global modeling ability of the existing convolutional neural network(CNN) based architectures can not meet the need of medical image analysis well. To learn discriminative abstract information, we down-sample feature maps between successive stages and adopt the BotNet-50 backbone which is a combination of ResNet backbone and self-attention modules. Such an architecture can extract local and non-local information in low-level and high-level layers, respectively. Last but not the least, the data distribution of training data and testing data don't share similar distribution in real-world multi-center medical image classification scenes. We assigned the samples with modified wights while calculating the loss value for optimization. The proposed method can eliminate the spurious correlation between features and labels. Experiments demonstrate the effectiveness of each component.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123685082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892512
Xun Zhou, Zhiyang Zhou, Xiaodon Shi
Inspired by the success of the non-autoregressive speech synthesis model FastSpeech, we propose FCH-TTS, a fast, controllable and universal neural text-to-speech (TTS) capable of generating high-quality spectrograms. The basic architecture of FCH-TTS is similar to that of FastSpeech, but FCH-TTS uses a simple yet effective attention-based soft alignment mechanism to replace the complex teacher model in FastSpeech, allowing the model to be better adapted to different languages. Specifically, in addition to the control of voice speed and prosody, a fusion module has been designed to better model speaker features in order to obtain the desired timbre. Meanwhile, several special loss functions were applied to ensure the quality of the output mel-spectrogram. Experimental results on the dataset LJSpeech show that FCH-TTS achieves the fastest inference speed compared to all baseline models, while also achieving the best speech quality. In addition, the controllability of the model with respect to prosody, voice speed and timbre was validated on several datasets, and the good performance on the low-resource Tibetan dataset demonstrates the universality of the model.
{"title":"FCH-TTS: Fast, Controllable and High-quality Non-Autoregressive Text-to-Speech Synthesis","authors":"Xun Zhou, Zhiyang Zhou, Xiaodon Shi","doi":"10.1109/IJCNN55064.2022.9892512","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892512","url":null,"abstract":"Inspired by the success of the non-autoregressive speech synthesis model FastSpeech, we propose FCH-TTS, a fast, controllable and universal neural text-to-speech (TTS) capable of generating high-quality spectrograms. The basic architecture of FCH-TTS is similar to that of FastSpeech, but FCH-TTS uses a simple yet effective attention-based soft alignment mechanism to replace the complex teacher model in FastSpeech, allowing the model to be better adapted to different languages. Specifically, in addition to the control of voice speed and prosody, a fusion module has been designed to better model speaker features in order to obtain the desired timbre. Meanwhile, several special loss functions were applied to ensure the quality of the output mel-spectrogram. Experimental results on the dataset LJSpeech show that FCH-TTS achieves the fastest inference speed compared to all baseline models, while also achieving the best speech quality. In addition, the controllability of the model with respect to prosody, voice speed and timbre was validated on several datasets, and the good performance on the low-resource Tibetan dataset demonstrates the universality of the model.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114317102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892408
Mu Hua, Qinbing Fu, Jigen Peng, Shigang Yue, Hao Luan
In this paper, a numerical neural network inspired by the lobula plate/lobula columnar type II (LPLC2), the ultra-selective looming sensitive neurons identified within visual system of Drosophila, is proposed utilising non-linear computation. This method aims to be one of the explorations towards solving the collision perception problem resulted from radial motion. Taking inspiration from the distinctive structure and placement of directionally selective neurons (DSNs) named T4/T5 interneurons and their post-synaptic neurons, the motion opponency along four cardinal directions is computed in a non-linear way and subsequently mapped into four quadrants. More precisely, local motion excites adjacent neurons ahead of the ongoing motion, whilst transfers inhibitory signals to presently-excited neurons with slight temporal delay. From comparative experimental results collected, the main contribution is established by sculpting the ultra-selective features of generating a vast majority of responses to dark centroid-emanated centrifugal motion patterns whilst remaining nearly silent to those starting from other quadrants of receptive field (RF). The proposed method also distinguishes relatively dark approaching objects against brighter background and light ones against dark background via exploiting ON/OFF parallel channels, which well fits the physiological findings. Accordingly, the proposed neural network consolidates the theory of non-linear computation in Drosophila's visual system, a prominent paradigm for studying biological motion perception. This research also demonstrates potential of being fused with attention mechanism towards utility in devices such as unmanned aerial vehicles (UAVs), protecting them from unexpected and imminent collision by calculating a safer flying pathway.
{"title":"Shaping the Ultra-Selectivity of a Looming Detection Neural Network from Non-linear Correlation of Radial Motion","authors":"Mu Hua, Qinbing Fu, Jigen Peng, Shigang Yue, Hao Luan","doi":"10.1109/IJCNN55064.2022.9892408","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892408","url":null,"abstract":"In this paper, a numerical neural network inspired by the lobula plate/lobula columnar type II (LPLC2), the ultra-selective looming sensitive neurons identified within visual system of Drosophila, is proposed utilising non-linear computation. This method aims to be one of the explorations towards solving the collision perception problem resulted from radial motion. Taking inspiration from the distinctive structure and placement of directionally selective neurons (DSNs) named T4/T5 interneurons and their post-synaptic neurons, the motion opponency along four cardinal directions is computed in a non-linear way and subsequently mapped into four quadrants. More precisely, local motion excites adjacent neurons ahead of the ongoing motion, whilst transfers inhibitory signals to presently-excited neurons with slight temporal delay. From comparative experimental results collected, the main contribution is established by sculpting the ultra-selective features of generating a vast majority of responses to dark centroid-emanated centrifugal motion patterns whilst remaining nearly silent to those starting from other quadrants of receptive field (RF). The proposed method also distinguishes relatively dark approaching objects against brighter background and light ones against dark background via exploiting ON/OFF parallel channels, which well fits the physiological findings. Accordingly, the proposed neural network consolidates the theory of non-linear computation in Drosophila's visual system, a prominent paradigm for studying biological motion perception. This research also demonstrates potential of being fused with attention mechanism towards utility in devices such as unmanned aerial vehicles (UAVs), protecting them from unexpected and imminent collision by calculating a safer flying pathway.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121697874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892959
R. Zdunek, M. Gábor
The topic of convolutional neural networks (CNN) compression has attracted increasing attention as new generations of neural networks become larger and require more and more computing performance. This computational problem can be solved by representing the weights of a neural network with low-rank factors using matrix/tensor decomposition methods. This study presents a novel concept for compressing neural networks using nested low-rank decomposition methods. In this approach, we alternately perform decomposition of the neural network weights with fine-tuning of the network. The numerical experiments are performed on various CNN architectures, ranging from small-scale LeNet-5 trained on the MNIST dataset, through medium-scale ResNet-20, ResNet-56, and up to large-scale VGG-16, VGG-19 trained on the CIFAR-10 dataset. The obtained results show that using the nested compression, we can achieve much higher parameter and FLOPS compression with a minor drop in classification accuracy.
{"title":"Nested compression of convolutional neural networks with Tucker-2 decomposition","authors":"R. Zdunek, M. Gábor","doi":"10.1109/IJCNN55064.2022.9892959","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892959","url":null,"abstract":"The topic of convolutional neural networks (CNN) compression has attracted increasing attention as new generations of neural networks become larger and require more and more computing performance. This computational problem can be solved by representing the weights of a neural network with low-rank factors using matrix/tensor decomposition methods. This study presents a novel concept for compressing neural networks using nested low-rank decomposition methods. In this approach, we alternately perform decomposition of the neural network weights with fine-tuning of the network. The numerical experiments are performed on various CNN architectures, ranging from small-scale LeNet-5 trained on the MNIST dataset, through medium-scale ResNet-20, ResNet-56, and up to large-scale VGG-16, VGG-19 trained on the CIFAR-10 dataset. The obtained results show that using the nested compression, we can achieve much higher parameter and FLOPS compression with a minor drop in classification accuracy.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":" 42","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113948616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}