Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892051
Jiayang Qiao, Yutong Liu, L. Kong
In the field of various downstream tasks of graph learning, graph neural networks (GNNs) have achieved the state-of-the-art (SOTA) performance benefits from its special propagation mechanism. The propagation mechanism aggregates attributes from neighbor nodes to obtain expressive node representations, which is pivotal for achieving SOTA performance in various downstream tasks. However, in most graph datasets, the neighborhood of each node may contain weakly correlated neighbors (WCNs), whose attributes may impair the expressiveness of central node representations. Though efforts have been devoted to solving such problem, they merely focus on aggregating fewer or even subtracting the attributes of WCNs. However, WCNs still share some correlated information with the central node, thus the correlated information provided by WCNs is underutilized. In this work, we devote to leveraging the correlated information provided by WCNs with our proposed method, namely GNN-detective. This detective can efficiently and automatically distinguish WCNs, as well as dig out their correlated information in the graph. It is realized by a semi-supervised learning framework, where the Differential Propagation (DP) module is designed specially for information triage and utilization. This module can fully leverage the correlated information provided by WCNs, and eliminate interference of uncorrelated information. We have conducted semi-supervised node classification tasks on 9 benchmark datasets. Our proposed method is proven to achieve the best performance in processing WCNs. The problems such as over-smoothing and overfitting are also mitigated as evaluated.
{"title":"GNN-Detective: Efficient Weakly Correlated Neighbors Distinguishing and Processing in GNN","authors":"Jiayang Qiao, Yutong Liu, L. Kong","doi":"10.1109/IJCNN55064.2022.9892051","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892051","url":null,"abstract":"In the field of various downstream tasks of graph learning, graph neural networks (GNNs) have achieved the state-of-the-art (SOTA) performance benefits from its special propagation mechanism. The propagation mechanism aggregates attributes from neighbor nodes to obtain expressive node representations, which is pivotal for achieving SOTA performance in various downstream tasks. However, in most graph datasets, the neighborhood of each node may contain weakly correlated neighbors (WCNs), whose attributes may impair the expressiveness of central node representations. Though efforts have been devoted to solving such problem, they merely focus on aggregating fewer or even subtracting the attributes of WCNs. However, WCNs still share some correlated information with the central node, thus the correlated information provided by WCNs is underutilized. In this work, we devote to leveraging the correlated information provided by WCNs with our proposed method, namely GNN-detective. This detective can efficiently and automatically distinguish WCNs, as well as dig out their correlated information in the graph. It is realized by a semi-supervised learning framework, where the Differential Propagation (DP) module is designed specially for information triage and utilization. This module can fully leverage the correlated information provided by WCNs, and eliminate interference of uncorrelated information. We have conducted semi-supervised node classification tasks on 9 benchmark datasets. Our proposed method is proven to achieve the best performance in processing WCNs. The problems such as over-smoothing and overfitting are also mitigated as evaluated.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130797962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892885
Bhumika, D. Das
Urban anomalies are abnormal events such as a blocked driveway, illegal parking, noise, crime, crowd gathering, etc. affect people and policy managers drastically if not handled in time. Prediction of these anomalies in the early stages is critical for public safety and mitigation of economic losses. However, predicting urban anomalies has various challenges like complex spatio-temporal relationships, dynamic nature, and data sparsity. This paper proposes a novel end-to-end deep learning based framework, i.e., UApredictor that utilizes stacked spatial-temporal-interaction block to predict urban anomaly from multivariate time-series data. We model the problem using an attribute graph, where we represent city regions as nodes to capture inter region spatial information using a spatial transformer. Further, to capture temporal correlation, we utilize a temporal transformer, and the interaction module retains complex interaction between spatio-temporal dimensions. Besides, the attention layer is added on the top of the spatial-temporal-interaction block that captures important information for predicting urban anomaly. We use real-world NYC-Urban Anomaly, NYC-Taxi, NYC-POI, NYC-Road Network, NYC-Demographic, and NYC-Weather datasets of New York city to evaluate the urban anomaly prediction framework. The results show that our proposed framework predicts better in terms of F-measure, macro-F1, and micro-F1 than baseline and state-of-the-art models.
{"title":"UApredictor: Urban Anomaly Prediction from Spatial-Temporal Data using Graph Transformer Neural Network","authors":"Bhumika, D. Das","doi":"10.1109/IJCNN55064.2022.9892885","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892885","url":null,"abstract":"Urban anomalies are abnormal events such as a blocked driveway, illegal parking, noise, crime, crowd gathering, etc. affect people and policy managers drastically if not handled in time. Prediction of these anomalies in the early stages is critical for public safety and mitigation of economic losses. However, predicting urban anomalies has various challenges like complex spatio-temporal relationships, dynamic nature, and data sparsity. This paper proposes a novel end-to-end deep learning based framework, i.e., UApredictor that utilizes stacked spatial-temporal-interaction block to predict urban anomaly from multivariate time-series data. We model the problem using an attribute graph, where we represent city regions as nodes to capture inter region spatial information using a spatial transformer. Further, to capture temporal correlation, we utilize a temporal transformer, and the interaction module retains complex interaction between spatio-temporal dimensions. Besides, the attention layer is added on the top of the spatial-temporal-interaction block that captures important information for predicting urban anomaly. We use real-world NYC-Urban Anomaly, NYC-Taxi, NYC-POI, NYC-Road Network, NYC-Demographic, and NYC-Weather datasets of New York city to evaluate the urban anomaly prediction framework. The results show that our proposed framework predicts better in terms of F-measure, macro-F1, and micro-F1 than baseline and state-of-the-art models.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130744416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent days have seen a lot of interest in surgical data science (SDS) methods and imaging technologies. As a result of these developments, surgeons may execute less invasive procedures. Using pathology and no pathology situations to classify laparoscopic video pictures of surgical activities, in this research work authors conducted their investigation using a transfer learning technique named enhanced ENet (eENet) network based on enhanced EfficientNet network. Two base versions of the EfficientNet model named ENetB0 and ENetB7 along with the two proposed versions of the EfficientNet network as enhanced EfficientNetB0 (eENetB0) and enhanced EfficientnetB7 (eENetB7) are implemented in the proposed framework using publicly available GLENDA [1] dataset. The proposed eENetB0 and eENetB7 models have classified the features extracted using the transfer learning technique into binary classification. For 70–30 and 10-fold Cross-Validation (10-fold CV), the data splitting eENetB0 model has achieved maximum classification accuracy as 88.43% and 97.59%, and the eENetB7 model has achieved 97.72% and 98.78% accuracy. We also compared the performance of our proposed enhanced version of EfficientNet (eENetB0 and eENetB7) with the base version of the models (ENetB0 and ENetB7) it shows that among these four models eENetB7 performed well. For GUI-based visualization purposes, we also created a platform named IAS.ai that detects the surgical video clips having blood and dry scenarios and uses explainable AI for unboxing the deep learning model's performance. IAS.ai is a real-time application of our approach. For further validation, we compared our framework's performance with other leading approaches cited in the literature [2]–[4]. We can see how well the proposed eENet model does compare to existing models, as well as the current best practices.
{"title":"Enhanced EfficientNet Network for Classifying Laparoscopy Videos using Transfer Learning Technique","authors":"Divya Acharya, Guda Ramachandra Kaladhara Sarma, Kameshwar Raovenkatajammalamadaka","doi":"10.1109/IJCNN55064.2022.9891989","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9891989","url":null,"abstract":"Recent days have seen a lot of interest in surgical data science (SDS) methods and imaging technologies. As a result of these developments, surgeons may execute less invasive procedures. Using pathology and no pathology situations to classify laparoscopic video pictures of surgical activities, in this research work authors conducted their investigation using a transfer learning technique named enhanced ENet (eENet) network based on enhanced EfficientNet network. Two base versions of the EfficientNet model named ENetB0 and ENetB7 along with the two proposed versions of the EfficientNet network as enhanced EfficientNetB0 (eENetB0) and enhanced EfficientnetB7 (eENetB7) are implemented in the proposed framework using publicly available GLENDA [1] dataset. The proposed eENetB0 and eENetB7 models have classified the features extracted using the transfer learning technique into binary classification. For 70–30 and 10-fold Cross-Validation (10-fold CV), the data splitting eENetB0 model has achieved maximum classification accuracy as 88.43% and 97.59%, and the eENetB7 model has achieved 97.72% and 98.78% accuracy. We also compared the performance of our proposed enhanced version of EfficientNet (eENetB0 and eENetB7) with the base version of the models (ENetB0 and ENetB7) it shows that among these four models eENetB7 performed well. For GUI-based visualization purposes, we also created a platform named IAS.ai that detects the surgical video clips having blood and dry scenarios and uses explainable AI for unboxing the deep learning model's performance. IAS.ai is a real-time application of our approach. For further validation, we compared our framework's performance with other leading approaches cited in the literature [2]–[4]. We can see how well the proposed eENet model does compare to existing models, as well as the current best practices.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130987443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892546
Beibei Ruan, Cui Zhu, Wenjun Zhu
It has always been a challenge to research inductive learning, which can embed newly unseen nodes. Inductive learning is a frequently encountered problem in practical applications of graph networks, but there is little research on dynamic heterogeneous network link prediction. Therefore, we propose a Heterogeneous and Temporal Model Based on Transformer (HT-Trans) for dynamic heterogeneous network, which core idea is to introduce transformer to integrate better neighbor information to capture network structure. The goal of HT-Trans is to infer proper embedding for existing nodes and unseen nodes. Experimental results show that the algorithm proposed in this paper is significantly competitive compared with baselines for link prediction tasks on three real datasets.
{"title":"A Link Prediction Model of Dynamic Heterogeneous Network Based on Transformer","authors":"Beibei Ruan, Cui Zhu, Wenjun Zhu","doi":"10.1109/IJCNN55064.2022.9892546","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892546","url":null,"abstract":"It has always been a challenge to research inductive learning, which can embed newly unseen nodes. Inductive learning is a frequently encountered problem in practical applications of graph networks, but there is little research on dynamic heterogeneous network link prediction. Therefore, we propose a Heterogeneous and Temporal Model Based on Transformer (HT-Trans) for dynamic heterogeneous network, which core idea is to introduce transformer to integrate better neighbor information to capture network structure. The goal of HT-Trans is to infer proper embedding for existing nodes and unseen nodes. Experimental results show that the algorithm proposed in this paper is significantly competitive compared with baselines for link prediction tasks on three real datasets.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131107739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892404
Chanchal Suman, Abhishek Singh, S. Saha, P. Bhattacharyya
With the rise of personalized online services, a huge opportunity for user profiling has developed. Gender plays a very important role for services that rely on information about a user's background. Although, due to anonymity and privacy, the gender information of a user is usually unavailable for other users. Social Networking sites have provided users with a lot of features to express their thoughts and emotions either using pictures or emojis or writing texts. Based on the idea that female and male users have some differences in their post and message contents, social media accounts can be analyzed using their textual posts for finding the user's gender. In this work, we explore different emotion-aided multi-modal gender prediction models. The basic intuition behind our proposed approach is to predict the gender of a user based on the emotional clues present in their multimodal posts, which includes texts as well as images. PAN 2018 dataset is enriched with emotion labels, for the experimentation. Different multi-tasking based architectures have been developed for gender prediction. Obtained results on the benchmark PAN-2018 dataset illustrate that the proposed multimodal emotion-aided system performs better than the single modal (with only text and only image) based models and the state of the art system too.
{"title":"Development of Multi-task Models for Emotion-Aware Gender Prediction","authors":"Chanchal Suman, Abhishek Singh, S. Saha, P. Bhattacharyya","doi":"10.1109/IJCNN55064.2022.9892404","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892404","url":null,"abstract":"With the rise of personalized online services, a huge opportunity for user profiling has developed. Gender plays a very important role for services that rely on information about a user's background. Although, due to anonymity and privacy, the gender information of a user is usually unavailable for other users. Social Networking sites have provided users with a lot of features to express their thoughts and emotions either using pictures or emojis or writing texts. Based on the idea that female and male users have some differences in their post and message contents, social media accounts can be analyzed using their textual posts for finding the user's gender. In this work, we explore different emotion-aided multi-modal gender prediction models. The basic intuition behind our proposed approach is to predict the gender of a user based on the emotional clues present in their multimodal posts, which includes texts as well as images. PAN 2018 dataset is enriched with emotion labels, for the experimentation. Different multi-tasking based architectures have been developed for gender prediction. Obtained results on the benchmark PAN-2018 dataset illustrate that the proposed multimodal emotion-aided system performs better than the single modal (with only text and only image) based models and the state of the art system too.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133001785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892125
Zuhao Ge, Wenhao Yu, Xian Liu, Lizhe Qi, Yunquan Sun
We investigate traffic scene detection from surveillance cameras and UAVs. This task is rather challenging, mainly due to the spatial nonuniform gathering, large-scale variance, and instance-level imbalanced distribution of vehicles. Most existing methods that employed FPN to enrich features are prone to failure in this scenario. To mitigate the influences above, we propose a novel detector called Density and Context Aware Network(DCANet) that can focus on dense regions and adaptively aggregate context features. Specifically, DCANet consists of three components: Density Map Supervision(DMP), Context Feature Aggregation(CFA), and Hierarchical Head Module(HHM). DMP is designed to capture the gathering information of objects supervised by density maps. CFA exploits adjacent feature layers' relationships to fulfill ROI-level contextual information enhancement. Finally, HHM is introduced to classify and locate imbalanced objects employed in hierarchical heads. Without bells and whistles, DCANet can be used in any two-stage detectors. Extensive experiments are carried out on the two widely used traffic detection datasets, CityCam and VisDrone, and DCANet reports new state-of-the-art scores on the CityCam.
{"title":"Density and Context Aware Network with Hierarchical Head for Traffic Scene Detection","authors":"Zuhao Ge, Wenhao Yu, Xian Liu, Lizhe Qi, Yunquan Sun","doi":"10.1109/IJCNN55064.2022.9892125","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892125","url":null,"abstract":"We investigate traffic scene detection from surveillance cameras and UAVs. This task is rather challenging, mainly due to the spatial nonuniform gathering, large-scale variance, and instance-level imbalanced distribution of vehicles. Most existing methods that employed FPN to enrich features are prone to failure in this scenario. To mitigate the influences above, we propose a novel detector called Density and Context Aware Network(DCANet) that can focus on dense regions and adaptively aggregate context features. Specifically, DCANet consists of three components: Density Map Supervision(DMP), Context Feature Aggregation(CFA), and Hierarchical Head Module(HHM). DMP is designed to capture the gathering information of objects supervised by density maps. CFA exploits adjacent feature layers' relationships to fulfill ROI-level contextual information enhancement. Finally, HHM is introduced to classify and locate imbalanced objects employed in hierarchical heads. Without bells and whistles, DCANet can be used in any two-stage detectors. Extensive experiments are carried out on the two widely used traffic detection datasets, CityCam and VisDrone, and DCANet reports new state-of-the-art scores on the CityCam.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133084714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892178
Jiameng Pan, Xiaoguang Zhu, Peilin Liu
Massive accessible personal data on the Internet raises the risk of malicious retrieval. In this paper, we propose to conceal the images with the targeted adversarial attacks on content-based image retrieval. An imperceptible perturbation is added to the original image to generate adversarial examples, making the retrieval results similar to the target image but look completely different. Previous work on the targeted attack for image retrieval only introduces a target-specific model and needs to train the model each time for new targets. We extend the attack adaptability by exploiting the target images as conditional input for the generative model. The proposed Adaptive Targeted Attack Generative Adversarial Network (ATA-GAN) is a GAN-based model with a generator and discriminator. The generator extracts the features of origin and target, then uses the Feature Integration Module to explore the relation between the target and original image to ignore the origin feature while paying more attention to the target. Simultaneously, the discriminator distinguishes the realness and ensures the adversarial example is similar to the origin. We evaluate and analyze the performance of the adaptive targeted attack on popular retrieval benchmarks.
{"title":"Generating Adaptive Targeted Adversarial Examples for Content-Based Image Retrieval","authors":"Jiameng Pan, Xiaoguang Zhu, Peilin Liu","doi":"10.1109/IJCNN55064.2022.9892178","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892178","url":null,"abstract":"Massive accessible personal data on the Internet raises the risk of malicious retrieval. In this paper, we propose to conceal the images with the targeted adversarial attacks on content-based image retrieval. An imperceptible perturbation is added to the original image to generate adversarial examples, making the retrieval results similar to the target image but look completely different. Previous work on the targeted attack for image retrieval only introduces a target-specific model and needs to train the model each time for new targets. We extend the attack adaptability by exploiting the target images as conditional input for the generative model. The proposed Adaptive Targeted Attack Generative Adversarial Network (ATA-GAN) is a GAN-based model with a generator and discriminator. The generator extracts the features of origin and target, then uses the Feature Integration Module to explore the relation between the target and original image to ignore the origin feature while paying more attention to the target. Simultaneously, the discriminator distinguishes the realness and ensures the adversarial example is similar to the origin. We evaluate and analyze the performance of the adaptive targeted attack on popular retrieval benchmarks.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133361781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9891915
M. Scarpiniti, Cristiano Mauri, D. Comminiello, A. Uncini, Yong-Cheol Lee
Generative audio data augmentation in a construction site is one of challenging research areas due to the high dissimilarity between work sounds of involved machines and equipment. However, it becomes necessary since the availability of audio data of critical work classes is often rare. Motivated by these considerations and demands, in this paper, we propose a complex-valued GAN architecture working with the audio spectrogram, named CoVal-SGAN, for an effective augmentation of audio data. Specifically, the proposed CoVal-SGAN exploits both the magnitude and phase information to improve the quality of the artificially generated audio signals and increase the overall performance of the underlying classifier. Numerical results, performed on the data recorded in real-world construction sites, along with the comparisons with available state-of-the-art approaches, show the effectiveness of the proposed idea by obtaining an improved accuracy.
{"title":"CoVal-SGAN: A Complex-Valued Spectral GAN architecture for the effective audio data augmentation in construction sites","authors":"M. Scarpiniti, Cristiano Mauri, D. Comminiello, A. Uncini, Yong-Cheol Lee","doi":"10.1109/IJCNN55064.2022.9891915","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9891915","url":null,"abstract":"Generative audio data augmentation in a construction site is one of challenging research areas due to the high dissimilarity between work sounds of involved machines and equipment. However, it becomes necessary since the availability of audio data of critical work classes is often rare. Motivated by these considerations and demands, in this paper, we propose a complex-valued GAN architecture working with the audio spectrogram, named CoVal-SGAN, for an effective augmentation of audio data. Specifically, the proposed CoVal-SGAN exploits both the magnitude and phase information to improve the quality of the artificially generated audio signals and increase the overall performance of the underlying classifier. Numerical results, performed on the data recorded in real-world construction sites, along with the comparisons with available state-of-the-art approaches, show the effectiveness of the proposed idea by obtaining an improved accuracy.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133297970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892170
Kyle Naddeo, N. Bouaynaya, R. Shterenberg
Deep learning models have achieved state-of-the-art accuracy in complex tasks, sometimes outperforming human-level accuracy. Yet, they suffer from vulnerabilities known as adversarial attacks, which are imperceptible input perturbations that fool the models on inputs that were originally classified correctly. The adversarial problem remains poorly understood and commonly thought to be an inherent weakness of deep learning models. We argue that understanding and alleviating the adversarial phenomenon may require us to go beyond the Euclidean view and consider the relationship between the input and output spaces as a statistical manifold with the Fisher Information as its Riemannian metric. Under this information geometric view, the optimal attack is constructed as the direction corresponding to the highest eigenvalue of the Fisher Information Matrix - called the Fisher spectral attack. We show that an orthogonal transformation of the data cleverly alters its manifold by keeping the highest eigenvalue but changing the optimal direction of attack; thus deceiving the attacker into adopting the wrong direction. We demonstrate the defensive capabilities of the proposed orthogonal scheme - against the Fisher spectral attack and the popular fast gradient sign method - on standard networks, e.g., LeNet and MobileNetV2 for benchmark data sets, MNIST and CIFAR-10.
{"title":"An Information Geometric Perspective to Adversarial Attacks and Defenses","authors":"Kyle Naddeo, N. Bouaynaya, R. Shterenberg","doi":"10.1109/IJCNN55064.2022.9892170","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892170","url":null,"abstract":"Deep learning models have achieved state-of-the-art accuracy in complex tasks, sometimes outperforming human-level accuracy. Yet, they suffer from vulnerabilities known as adversarial attacks, which are imperceptible input perturbations that fool the models on inputs that were originally classified correctly. The adversarial problem remains poorly understood and commonly thought to be an inherent weakness of deep learning models. We argue that understanding and alleviating the adversarial phenomenon may require us to go beyond the Euclidean view and consider the relationship between the input and output spaces as a statistical manifold with the Fisher Information as its Riemannian metric. Under this information geometric view, the optimal attack is constructed as the direction corresponding to the highest eigenvalue of the Fisher Information Matrix - called the Fisher spectral attack. We show that an orthogonal transformation of the data cleverly alters its manifold by keeping the highest eigenvalue but changing the optimal direction of attack; thus deceiving the attacker into adopting the wrong direction. We demonstrate the defensive capabilities of the proposed orthogonal scheme - against the Fisher spectral attack and the popular fast gradient sign method - on standard networks, e.g., LeNet and MobileNetV2 for benchmark data sets, MNIST and CIFAR-10.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133639169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892877
Fang Lei
In low light conditions, image enhancement is critical for vision-based artificial systems since details of objects in dark regions are buried. Moreover, enhancing the low-light image without introducing too many irrelevant artifacts is important for visual tasks like motion detection. However, conventional methods always have the risk of “bad” enhancement. Nocturnal insects show remarkable visual abilities at night time, and their adaptations in light responses provide inspiration for low-light image enhancement. In this paper, we aim to adopt the neural mechanism of dark adaptation for adaptively raising intensities whilst preserving the naturalness. We propose a framework for enhancing low-light images by implementing the dark adaptation operation with proper adaptation parameters in R, G and B channels separately. Specifically, the dark adaptation in this paper consists of a series of canonical neural computations, including the power law adaptation, divisive normalization and adaptive rescaling operations. Experiments show that the proposed bio-inspired dark adaptation framework is more efficient and can better preserve the naturalness of the image compared to existing methods.
{"title":"A Bio-inspired Dark Adaptation Framework for Low-light Image Enhancement","authors":"Fang Lei","doi":"10.1109/IJCNN55064.2022.9892877","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892877","url":null,"abstract":"In low light conditions, image enhancement is critical for vision-based artificial systems since details of objects in dark regions are buried. Moreover, enhancing the low-light image without introducing too many irrelevant artifacts is important for visual tasks like motion detection. However, conventional methods always have the risk of “bad” enhancement. Nocturnal insects show remarkable visual abilities at night time, and their adaptations in light responses provide inspiration for low-light image enhancement. In this paper, we aim to adopt the neural mechanism of dark adaptation for adaptively raising intensities whilst preserving the naturalness. We propose a framework for enhancing low-light images by implementing the dark adaptation operation with proper adaptation parameters in R, G and B channels separately. Specifically, the dark adaptation in this paper consists of a series of canonical neural computations, including the power law adaptation, divisive normalization and adaptive rescaling operations. Experiments show that the proposed bio-inspired dark adaptation framework is more efficient and can better preserve the naturalness of the image compared to existing methods.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"66 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132772584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}