Pub Date : 2022-04-05DOI: 10.48550/arXiv.2204.02035
Stanislav Frolov, Prateek Bansal, Jörn Hees, A. Dengel
Despite astonishing progress, generating realistic images of complex scenes remains a challenging problem. Recently, layout-to-image synthesis approaches have attracted much interest by conditioning the generator on a list of bounding boxes and corresponding class labels. However, previous approaches are very restrictive because the set of labels is fixed a priori. Meanwhile, text-to-image synthesis methods have substantially improved and provide a flexible way for conditional image generation. In this work, we introduce dense text-to-image (DT2I) synthesis as a new task to pave the way toward more intuitive image generation. Furthermore, we propose DTC-GAN, a novel method to generate images from semantically rich region descriptions, and a multi-modal region feature matching loss to encourage semantic image-text matching. Our results demonstrate the capability of our approach to generate plausible images of complex scenes using region captions.
{"title":"DT2I: Dense Text-to-Image Generation from Region Descriptions","authors":"Stanislav Frolov, Prateek Bansal, Jörn Hees, A. Dengel","doi":"10.48550/arXiv.2204.02035","DOIUrl":"https://doi.org/10.48550/arXiv.2204.02035","url":null,"abstract":"Despite astonishing progress, generating realistic images of complex scenes remains a challenging problem. Recently, layout-to-image synthesis approaches have attracted much interest by conditioning the generator on a list of bounding boxes and corresponding class labels. However, previous approaches are very restrictive because the set of labels is fixed a priori. Meanwhile, text-to-image synthesis methods have substantially improved and provide a flexible way for conditional image generation. In this work, we introduce dense text-to-image (DT2I) synthesis as a new task to pave the way toward more intuitive image generation. Furthermore, we propose DTC-GAN, a novel method to generate images from semantically rich region descriptions, and a multi-modal region feature matching loss to encourage semantic image-text matching. Our results demonstrate the capability of our approach to generate plausible images of complex scenes using region captions.","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"35 1","pages":"395-406"},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79365337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-05DOI: 10.48550/arXiv.2204.02121
Calum Heggan, S. Budgett, Timothy M. Hospedales, Mehrdad Yaghoobi
Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative, covering a variety of sound domains and experimental settings. We compare the few-shot classification performance of a variety of techniques on seven audio datasets (spanning environmental sounds to human-speech). Extending this, we carry out in-depth analyses of joint training (where all datasets are used during training) and cross-dataset adaptation protocols, establishing the possibility of a generalised audio few-shot classification algorithm. Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods. We also demonstrate that the joint training routine helps overall generalisation for the environmental sound databases included, as well as being a somewhat-effective method of tackling the cross-dataset/domain setting.
{"title":"MetaAudio: A Few-Shot Audio Classification Benchmark","authors":"Calum Heggan, S. Budgett, Timothy M. Hospedales, Mehrdad Yaghoobi","doi":"10.48550/arXiv.2204.02121","DOIUrl":"https://doi.org/10.48550/arXiv.2204.02121","url":null,"abstract":"Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative, covering a variety of sound domains and experimental settings. We compare the few-shot classification performance of a variety of techniques on seven audio datasets (spanning environmental sounds to human-speech). Extending this, we carry out in-depth analyses of joint training (where all datasets are used during training) and cross-dataset adaptation protocols, establishing the possibility of a generalised audio few-shot classification algorithm. Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods. We also demonstrate that the joint training routine helps overall generalisation for the environmental sound databases included, as well as being a somewhat-effective method of tackling the cross-dataset/domain setting.","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"41 1","pages":"219-230"},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86763804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-04DOI: 10.48550/arXiv.2204.01436
Jonathan Jakob, André Artelt, M. Hasenjäger, Barbara Hammer
. Water distribution networks are a key component of modern infrastructure for housing and industry. They transport and distribute water via widely branched networks from sources to consumers. In order to guarantee a working network at all times, the water supply company continuously monitors the network and takes actions when necessary – e.g. reacting to leakages, sensor faults and drops in water quality. Since real world networks are too large and complex to be monitored by a human, algorithmic monitoring systems have been developed. A popular type of such systems are residual based anomaly detection systems that can detect events such as leakages and sensor faults. For a continuous high quality monitoring, it is necessary for these systems to adapt to changed demands and presence of various anomalies. In this work, we propose an adaption of the incremental SAM-kNN classifier for regression to build a residual based anomaly detection system for water distribution networks that is able to adapt to any kind of change.
{"title":"SAM-kNN Regressor for Online Learning in Water Distribution Networks","authors":"Jonathan Jakob, André Artelt, M. Hasenjäger, Barbara Hammer","doi":"10.48550/arXiv.2204.01436","DOIUrl":"https://doi.org/10.48550/arXiv.2204.01436","url":null,"abstract":". Water distribution networks are a key component of modern infrastructure for housing and industry. They transport and distribute water via widely branched networks from sources to consumers. In order to guarantee a working network at all times, the water supply company continuously monitors the network and takes actions when necessary – e.g. reacting to leakages, sensor faults and drops in water quality. Since real world networks are too large and complex to be monitored by a human, algorithmic monitoring systems have been developed. A popular type of such systems are residual based anomaly detection systems that can detect events such as leakages and sensor faults. For a continuous high quality monitoring, it is necessary for these systems to adapt to changed demands and presence of various anomalies. In this work, we propose an adaption of the incremental SAM-kNN classifier for regression to build a residual based anomaly detection system for water distribution networks that is able to adapt to any kind of change.","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"91 6 1","pages":"752-762"},"PeriodicalIF":0.0,"publicationDate":"2022-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89443601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-30DOI: 10.48550/arXiv.2203.16506
Sheng Xu
Coronavirus 2019 has brought severe challenges to social stability and public health worldwide. One effective way of curbing the epidemic is to require people to wear masks in public places and monitor mask-wearing states by utilizing suitable automatic detectors. However, existing deep learning based models struggle to simultaneously achieve the requirements of both high precision and real-time performance. To solve this problem, we propose an improved lightweight face mask detector based on YOLOv5, which can achieve an excellent balance of precision and speed. Firstly, a novel backbone ShuffleCANet that combines ShuffleNetV2 network with Coordinate Attention mechanism is proposed as the backbone. Afterwards, an efficient path aggression network BiFPN is applied as the feature fusion neck. Furthermore, the localization loss is replaced with α-CIoU in model training phase to obtain higher-quality anchors. Some valuable strategies such as data augmentation, adaptive image scaling, and anchor cluster operation are also utilized. Experimental results on AIZOO face mask dataset show the superiority of the proposed model. Compared with the original YOLOv5, the proposed model increases the inference speed by 28.3% while still improving the precision by 0.58%. It achieves the best mean average precision of 95.2% compared with other seven existing models, which is 4.4% higher than the baseline.
{"title":"An Improved Lightweight YOLOv5 Model Based on Attention Mechanism for Face Mask Detection","authors":"Sheng Xu","doi":"10.48550/arXiv.2203.16506","DOIUrl":"https://doi.org/10.48550/arXiv.2203.16506","url":null,"abstract":"Coronavirus 2019 has brought severe challenges to social stability and public health worldwide. One effective way of curbing the epidemic is to require people to wear masks in public places and monitor mask-wearing states by utilizing suitable automatic detectors. However, existing deep learning based models struggle to simultaneously achieve the requirements of both high precision and real-time performance. To solve this problem, we propose an improved lightweight face mask detector based on YOLOv5, which can achieve an excellent balance of precision and speed. Firstly, a novel backbone ShuffleCANet that combines ShuffleNetV2 network with Coordinate Attention mechanism is proposed as the backbone. Afterwards, an efficient path aggression network BiFPN is applied as the feature fusion neck. Furthermore, the localization loss is replaced with α-CIoU in model training phase to obtain higher-quality anchors. Some valuable strategies such as data augmentation, adaptive image scaling, and anchor cluster operation are also utilized. Experimental results on AIZOO face mask dataset show the superiority of the proposed model. Compared with the original YOLOv5, the proposed model increases the inference speed by 28.3% while still improving the precision by 0.58%. It achieves the best mean average precision of 95.2% compared with other seven existing models, which is 4.4% higher than the baseline.","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"3 1","pages":"531-543"},"PeriodicalIF":0.0,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84130603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-26DOI: 10.48550/arXiv.2204.05132
Shengjie Zheng, Wenyi Li, Lang Qian, Che He, Xiaojian Li
ql20@mails.tsinghua.edu.cn Abstract. Brain-computer interfaces (BCIs), transform neural signals in the brain into instructions to control external devices. However, obtaining sufficient training data is difficult as well as limited. With the advent of advanced machine learning methods, the capability of brain-computer interfaces has been enhanced like never before, however, these methods require a large amount of data for training and thus require data augmentation of the limited data available. Here, we use spiking neural networks (SNN) as data generators. It is touted as the next-generation neural network and is considered as one of the algorithms oriented to general artificial intelligence because it borrows the neural information processing from biological neurons. We use the SNN to generate neural spike information that is bio-interpretable and conforms to the intrinsic patterns in the original neural data. Experiments show that the model can direct-ly synthesize new spike trains, which in turn improves the generalization ability of the BCI decoder. Both the input and output of the spiking neural model are spike information, which is a brain-inspired intelligence approach that can be better integrated with BCI in the specific patterns of neural population activity rather than on individual neurons[4]. The neural population dynamics exist in low-dimensional neural manifolds in a high-dimensional neural space[5]. Here, we employ a bio-interpretive SNN that mimics the neural information generation as well as the com-munication of biological neural populations. We analyze motor cortical neural population data recorded from monkeys to derive motor-related neural population dynamics. The neural spike properties of the SNN itself allow the direct generation of biologically meaningful spike trains that match the activity of real biological neural populations. We explored the interaction between the spike train synthesizer and the BCI decoder. Our results show that based on a small amount of training data as a template, data conforming to the dynamics of neural populations are generated, thus enhancing the decoding ability of the BCI decoder.
{"title":"A Spiking Neural Network based on Neural Manifold for Augmenting Intracortical Brain-Computer Interface Data","authors":"Shengjie Zheng, Wenyi Li, Lang Qian, Che He, Xiaojian Li","doi":"10.48550/arXiv.2204.05132","DOIUrl":"https://doi.org/10.48550/arXiv.2204.05132","url":null,"abstract":"ql20@mails.tsinghua.edu.cn Abstract. Brain-computer interfaces (BCIs), transform neural signals in the brain into instructions to control external devices. However, obtaining sufficient training data is difficult as well as limited. With the advent of advanced machine learning methods, the capability of brain-computer interfaces has been enhanced like never before, however, these methods require a large amount of data for training and thus require data augmentation of the limited data available. Here, we use spiking neural networks (SNN) as data generators. It is touted as the next-generation neural network and is considered as one of the algorithms oriented to general artificial intelligence because it borrows the neural information processing from biological neurons. We use the SNN to generate neural spike information that is bio-interpretable and conforms to the intrinsic patterns in the original neural data. Experiments show that the model can direct-ly synthesize new spike trains, which in turn improves the generalization ability of the BCI decoder. Both the input and output of the spiking neural model are spike information, which is a brain-inspired intelligence approach that can be better integrated with BCI in the specific patterns of neural population activity rather than on individual neurons[4]. The neural population dynamics exist in low-dimensional neural manifolds in a high-dimensional neural space[5]. Here, we employ a bio-interpretive SNN that mimics the neural information generation as well as the com-munication of biological neural populations. We analyze motor cortical neural population data recorded from monkeys to derive motor-related neural population dynamics. The neural spike properties of the SNN itself allow the direct generation of biologically meaningful spike trains that match the activity of real biological neural populations. We explored the interaction between the spike train synthesizer and the BCI decoder. Our results show that based on a small amount of training data as a template, data conforming to the dynamics of neural populations are generated, thus enhancing the decoding ability of the BCI decoder.","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"11 1","pages":"519-530"},"PeriodicalIF":0.0,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88606360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-24DOI: 10.48550/arXiv.2203.13004
Daniel Kluvanec, Thomas B. Phillips, K. McCaffrey, N. A. Moubayed
A difficult step in the process of karyotyping is segmenting chromosomes that touch or overlap. In an attempt to automate the process, previous studies turned to Deep Learning methods, with some formulating the task as a semantic segmentation problem. These models treat separate chromosome instances as semantic classes, which we show to be problematic, since it is uncertain which chromosome should be classed as #1 and #2. Assigning class labels based on comparison rules, such as the shorter/longer chromosome alleviates, but does not fully re-solve the issue. Instead, we separate the chromosome instances in a second stage, predict-ing the orientation of the chromosomes by the model and use it as one of the key distinguishing factors of the chromosomes. We demonstrate this method to be effective. Furthermore, we introduce a novel Double-Angle representation that a neural network can use to predict the orientation. The representation maps any direction and its reverse to the same point. Lastly, we present a new expanded synthetic dataset, which is based on Pommier’s dataset, but ad-dresses its issues with insufficient separation between its training and testing sets.
{"title":"Using Orientation to Distinguish Overlapping Chromosomes","authors":"Daniel Kluvanec, Thomas B. Phillips, K. McCaffrey, N. A. Moubayed","doi":"10.48550/arXiv.2203.13004","DOIUrl":"https://doi.org/10.48550/arXiv.2203.13004","url":null,"abstract":"A difficult step in the process of karyotyping is segmenting chromosomes that touch or overlap. In an attempt to automate the process, previous studies turned to Deep Learning methods, with some formulating the task as a semantic segmentation problem. These models treat separate chromosome instances as semantic classes, which we show to be problematic, since it is uncertain which chromosome should be classed as #1 and #2. Assigning class labels based on comparison rules, such as the shorter/longer chromosome alleviates, but does not fully re-solve the issue. Instead, we separate the chromosome instances in a second stage, predict-ing the orientation of the chromosomes by the model and use it as one of the key distinguishing factors of the chromosomes. We demonstrate this method to be effective. Furthermore, we introduce a novel Double-Angle representation that a neural network can use to predict the orientation. The representation maps any direction and its reverse to the same point. Lastly, we present a new expanded synthetic dataset, which is based on Pommier’s dataset, but ad-dresses its issues with insufficient separation between its training and testing sets.","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"98 1","pages":"391-403"},"PeriodicalIF":0.0,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80872938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-17DOI: 10.1007/978-3-031-15937-4_39
Nima Rafiee, Rahil Gholamipoorfard, Nikolas Adaloglou, Simon Jaxy, Julius Ramakers, M. Kollmann
{"title":"Self-Supervised Anomaly Detection by Self-Distillation and Negative Sampling","authors":"Nima Rafiee, Rahil Gholamipoorfard, Nikolas Adaloglou, Simon Jaxy, Julius Ramakers, M. Kollmann","doi":"10.1007/978-3-031-15937-4_39","DOIUrl":"https://doi.org/10.1007/978-3-031-15937-4_39","url":null,"abstract":"","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"282 1","pages":"459-470"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86392509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1007/978-3-031-15937-4_47
Weiran Chen, Chunping Liu, Yi Ji
{"title":"Chinese Character Style Transfer Model Based on Convolutional Neural Network","authors":"Weiran Chen, Chunping Liu, Yi Ji","doi":"10.1007/978-3-031-15937-4_47","DOIUrl":"https://doi.org/10.1007/978-3-031-15937-4_47","url":null,"abstract":"","PeriodicalId":93416,"journal":{"name":"Artificial neural networks, ICANN : international conference ... proceedings. International Conference on Artificial Neural Networks (European Neural Network Society)","volume":"30 1","pages":"558-569"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74001850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}