Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892895
Lourdes Martínez-Villaseñor, Hiram Ponce, Antonieta Martínez-Velasco, Luis Miralles-Pechuán
Artificial intelligence and deep learning, in particu-lar, have gained large attention in the ophthalmology community due to the possibility of processing large amounts of data and dig-itized ocular images. Intelligent systems are developed to support the diagnosis and treatment of a number of ophthalmic diseases such as age-related macular degeneration (AMD), glaucoma and retinopathy of prematurity. Hence, explainability is necessary to gain trust and therefore the adoption of these critical decision support systems. Visual explanations have been proposed for AMD diagnosis only when optical coherence tomography (OCT) images are used, but interpretability using other inputs (i.e. data point-based features) for AMD diagnosis is rather limited. In this paper, we propose a practical tool to support AMD diagnosis based on Artificial Hydrocarbon Networks (AHN) with different kinds of input data such as demographic characteristics, features known as risk factors for AMD, and genetic variants obtained from DNA genotyping. The proposed explainer, namely eXplainable Artificial Hydrocarbon Networks (XAHN) is able to get global and local interpretations of the AHN model. An explainability assessment of the XAHN explainer was applied to clinicians for getting feedback from the tool. We consider the XAHN explainer tool will be beneficial to support expert clinicians in AMD diagnosis, especially where input data are not visual.
{"title":"An Explainable Tool to Support Age-related Macular Degeneration Diagnosis","authors":"Lourdes Martínez-Villaseñor, Hiram Ponce, Antonieta Martínez-Velasco, Luis Miralles-Pechuán","doi":"10.1109/IJCNN55064.2022.9892895","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892895","url":null,"abstract":"Artificial intelligence and deep learning, in particu-lar, have gained large attention in the ophthalmology community due to the possibility of processing large amounts of data and dig-itized ocular images. Intelligent systems are developed to support the diagnosis and treatment of a number of ophthalmic diseases such as age-related macular degeneration (AMD), glaucoma and retinopathy of prematurity. Hence, explainability is necessary to gain trust and therefore the adoption of these critical decision support systems. Visual explanations have been proposed for AMD diagnosis only when optical coherence tomography (OCT) images are used, but interpretability using other inputs (i.e. data point-based features) for AMD diagnosis is rather limited. In this paper, we propose a practical tool to support AMD diagnosis based on Artificial Hydrocarbon Networks (AHN) with different kinds of input data such as demographic characteristics, features known as risk factors for AMD, and genetic variants obtained from DNA genotyping. The proposed explainer, namely eXplainable Artificial Hydrocarbon Networks (XAHN) is able to get global and local interpretations of the AHN model. An explainability assessment of the XAHN explainer was applied to clinicians for getting feedback from the tool. We consider the XAHN explainer tool will be beneficial to support expert clinicians in AMD diagnosis, especially where input data are not visual.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128533900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892729
Anrui Wang, Weiyang Chen
Facial Action Unit (FAU) intensity can describe the degree of change in the appearance of a specific location on the face and can be used for the analysis of human facial behavior. Due to the subtle changes in FAU, FAU intensity prediction still faces great challenges. Previous works using attention mechanisms for FAU intensity prediction either simply crop the FAU regions or directly use attention mechanisms to obtain local representations of FAUs, but these methods do not capture FAU intensity features at different scales and locations well. In addition, the dependencies between FAUs also contain important information. In this paper, we propose a multi-scale local-region relational attention model based on convolutional neural networks (CNN) for FAU intensity prediction. Specifically, we first reflect the relationship between FAUs by adjusting the luminance values of face images to capture local features with pixel-level relationships. Then, we use the introduced multi-scale local area relational attention model to extract the local attention latent relational features of FAU. Finally, we combine local attention potential relationship features, facial geometry information, and deep global features captured using an autoencoder to achieve robust FAU intensity prediction. The method is evaluated on the public benchmark dataset DISFA, and experimental results show that our method achieves comparable performance to state-of-the-art methods and validates the effectiveness of a multi-scale local-region relational attention model for FAU intensity prediction.
{"title":"Multi-scale Local Region Relation Attention in Convolutional Neural Networks for Facial Action Unit Intensity Prediction","authors":"Anrui Wang, Weiyang Chen","doi":"10.1109/IJCNN55064.2022.9892729","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892729","url":null,"abstract":"Facial Action Unit (FAU) intensity can describe the degree of change in the appearance of a specific location on the face and can be used for the analysis of human facial behavior. Due to the subtle changes in FAU, FAU intensity prediction still faces great challenges. Previous works using attention mechanisms for FAU intensity prediction either simply crop the FAU regions or directly use attention mechanisms to obtain local representations of FAUs, but these methods do not capture FAU intensity features at different scales and locations well. In addition, the dependencies between FAUs also contain important information. In this paper, we propose a multi-scale local-region relational attention model based on convolutional neural networks (CNN) for FAU intensity prediction. Specifically, we first reflect the relationship between FAUs by adjusting the luminance values of face images to capture local features with pixel-level relationships. Then, we use the introduced multi-scale local area relational attention model to extract the local attention latent relational features of FAU. Finally, we combine local attention potential relationship features, facial geometry information, and deep global features captured using an autoencoder to achieve robust FAU intensity prediction. The method is evaluated on the public benchmark dataset DISFA, and experimental results show that our method achieves comparable performance to state-of-the-art methods and validates the effectiveness of a multi-scale local-region relational attention model for FAU intensity prediction.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892242
Quynh Tran, Krystsina Shpileuskaya, Elaine Zaunseder, Larissa Putzar, S. Blankenburg
Deep learning algorithms achieve exceptional accuracies in various tasks. Despite this success, those models are known to be prone to errors, i.e. low in robustness, due to differences between training and production environment. One might assume that more model complexity translates directly to more robustness. Therefore, we compare simple, classical models (logistic regression, support vector machine) with complex deep learning techniques (convolutional neural networks, transformers) to provide novel insights into the robustness of machine learning systems. In our approach, we assess the robustness by developing and applying three realistic perturbations, mimicking scanning, typing, and speech recognition errors occurring in inputs for text classification tasks. Hence, we performed a thorough study analyzing the impact of different perturbations with variable strengths on character and word level. A noteworthy finding is that algorithms with low complexity can achieve high robustness. Additionally, we demonstrate that augmented training regarding a specific perturbation can strengthen the chosen models' robustness against other perturbations without reducing their accuracy. Our results can impact the selection of machine learning models and provide a guideline on how to examine the robustness of text classification methods for real-world applications. Moreover, our implementation is publicly available, which contributes to the development of more robust machine learning systems.
{"title":"Comparing the Robustness of Classical and Deep Learning Techniques for Text Classification","authors":"Quynh Tran, Krystsina Shpileuskaya, Elaine Zaunseder, Larissa Putzar, S. Blankenburg","doi":"10.1109/IJCNN55064.2022.9892242","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892242","url":null,"abstract":"Deep learning algorithms achieve exceptional accuracies in various tasks. Despite this success, those models are known to be prone to errors, i.e. low in robustness, due to differences between training and production environment. One might assume that more model complexity translates directly to more robustness. Therefore, we compare simple, classical models (logistic regression, support vector machine) with complex deep learning techniques (convolutional neural networks, transformers) to provide novel insights into the robustness of machine learning systems. In our approach, we assess the robustness by developing and applying three realistic perturbations, mimicking scanning, typing, and speech recognition errors occurring in inputs for text classification tasks. Hence, we performed a thorough study analyzing the impact of different perturbations with variable strengths on character and word level. A noteworthy finding is that algorithms with low complexity can achieve high robustness. Additionally, we demonstrate that augmented training regarding a specific perturbation can strengthen the chosen models' robustness against other perturbations without reducing their accuracy. Our results can impact the selection of machine learning models and provide a guideline on how to examine the robustness of text classification methods for real-world applications. Moreover, our implementation is publicly available, which contributes to the development of more robust machine learning systems.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125125169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892662
Nan Li, Meng Ge, Longbiao Wang, J. Dang
For reverberation, the current speech is usually influenced by the previous frames. Traditional neural network-based speech dereverberation (SD) methods directly map the current speech frame that only has short-term cues to clean speech or learn a mask, which can not utilize long-term information to remove late reverberation and further limit SD's ability. To address this issue, we propose a dual-stream speech dereverberation network (DualSDNet) using long-term and short-term cues. First, we analyze the effectiveness of using a finite impulse response (FIR) based on long-term information recorded filter by reverberation generation progress. Second, to make full use of both long-term and short-term information, we further design a dual-stream network, it can map both long and short speech to high-dimensional representation and pay more attention to a more helpful time index. The results of the REVERB Challenge data show that our DualSDNet consistently outperforms the state-of-the-art SD baselines.
{"title":"Dual-stream Speech Dereverberation Network Using Long-term and Short-term Cues","authors":"Nan Li, Meng Ge, Longbiao Wang, J. Dang","doi":"10.1109/IJCNN55064.2022.9892662","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892662","url":null,"abstract":"For reverberation, the current speech is usually influenced by the previous frames. Traditional neural network-based speech dereverberation (SD) methods directly map the current speech frame that only has short-term cues to clean speech or learn a mask, which can not utilize long-term information to remove late reverberation and further limit SD's ability. To address this issue, we propose a dual-stream speech dereverberation network (DualSDNet) using long-term and short-term cues. First, we analyze the effectiveness of using a finite impulse response (FIR) based on long-term information recorded filter by reverberation generation progress. Second, to make full use of both long-term and short-term information, we further design a dual-stream network, it can map both long and short speech to high-dimensional representation and pay more attention to a more helpful time index. The results of the REVERB Challenge data show that our DualSDNet consistently outperforms the state-of-the-art SD baselines.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129652775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892507
Yanxia Guan, Yuntao Liu, Yuan Li, Xinhai Xu
The distributed system Ray has attracted much attention for many decision-making applications. It provides a flexible and powerful distributed running mechanism for the training of the learning algorithms, which could map the computation tasks to the resources automatically. Task scheduling is a critical component in Ray, adopting a two-layer structure. It uses a simple general scheduling principle, which leaves much space to optimize. In this paper, we will study the two-layer scheduling problem in Ray, setting it as an optimization problem. We firstly present a comprehensive formulation for the problem and point out that it is a NP-hard problem. Then we design a hierarchical reinforcement learning method, named HierRL, which consists of a high-level agent and a low-level agent. Sophisticated state space, action space, and reward function are designed for this method. In the high level, we devise a value-based reinforcement learning method, which allocates a task to an appropriate node of the low level. With tasks allocated from the high level and generated from applications, a low-level reinforcement learning method is constructed to select tasks from the queue to be executed. A hierarchical policy learning method is introduced for the training of the two-layer agents. Finally, we simulate the two-layer scheduling procedure in a public platform, Cloudsim, with tasks from a real Dataset generated by the Alibaba Cluster Trace Program. The results show that the proposed method performs much better than the original scheduling method of Ray.
{"title":"HierRL: Hierarchical Reinforcement Learning for Task Scheduling in Distributed Systems","authors":"Yanxia Guan, Yuntao Liu, Yuan Li, Xinhai Xu","doi":"10.1109/IJCNN55064.2022.9892507","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892507","url":null,"abstract":"The distributed system Ray has attracted much attention for many decision-making applications. It provides a flexible and powerful distributed running mechanism for the training of the learning algorithms, which could map the computation tasks to the resources automatically. Task scheduling is a critical component in Ray, adopting a two-layer structure. It uses a simple general scheduling principle, which leaves much space to optimize. In this paper, we will study the two-layer scheduling problem in Ray, setting it as an optimization problem. We firstly present a comprehensive formulation for the problem and point out that it is a NP-hard problem. Then we design a hierarchical reinforcement learning method, named HierRL, which consists of a high-level agent and a low-level agent. Sophisticated state space, action space, and reward function are designed for this method. In the high level, we devise a value-based reinforcement learning method, which allocates a task to an appropriate node of the low level. With tasks allocated from the high level and generated from applications, a low-level reinforcement learning method is constructed to select tasks from the queue to be executed. A hierarchical policy learning method is introduced for the training of the two-layer agents. Finally, we simulate the two-layer scheduling procedure in a public platform, Cloudsim, with tasks from a real Dataset generated by the Alibaba Cluster Trace Program. The results show that the proposed method performs much better than the original scheduling method of Ray.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129894692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892611
Davide Armellini, M. Ometto, Cristiano Ponton
A hype topic, one that now has become an established idea, is the possibility to increase plant efficiency by gaining and applying a better awareness of how scrap is performing in the melting process. Scrap management becomes the key point in cost reduction since it could comprise up to the 50% of the overall production costs. Technological innovations promise to be the driver to improving raw material management, shortening its acquisition time and reducing the waste during the metallurgical process. Expensive raw materials require a huge involvement of plant resources, and are highly dependent on the human factor. All the quality and logistics decisions belong to the judgment of the operators, increasing the chance of non-conformities (e.g., erroneous classification, material discharged in the wrong location, error loading material in the buckets). To overcome these issues, online classification of the scrap is the keystone. Starting from the arrival of scrap at the plant, through the acceptance of the delivery note and the check-in of the carriers, Automatic Scrap Classification gives support to inbound-scrap control and classification, enabling real-time traceability of the scrap inside the bays. The Quality Control System will benefit from all the details of the material used in production. Danieli Automation implemented the Q-ASC a system that, leveraging Artificial Intelligence (AI) and deep learning techniques, can assist scrap classification procedures through computer vision and automatic scrap recognition. The goal of scrap identification is to localize and assign a specific class label to a given visual sample of scrap or inert/hazardous material. The classification can be conducted using different methodologies based on material shapes or dimensions. Q-ASC is the entry point for the Scrap Yard Management and can be considered as the central data hub for managing the scrap inbound to the plant, connecting all the systems requiring reliable scrap data.
{"title":"Q-SYM2 and Automatic Scrap Classification a joint solution for the Circular economy and sustainability of Steel Manufacturing, to ensure the scrap yard operates competitively","authors":"Davide Armellini, M. Ometto, Cristiano Ponton","doi":"10.1109/IJCNN55064.2022.9892611","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892611","url":null,"abstract":"A hype topic, one that now has become an established idea, is the possibility to increase plant efficiency by gaining and applying a better awareness of how scrap is performing in the melting process. Scrap management becomes the key point in cost reduction since it could comprise up to the 50% of the overall production costs. Technological innovations promise to be the driver to improving raw material management, shortening its acquisition time and reducing the waste during the metallurgical process. Expensive raw materials require a huge involvement of plant resources, and are highly dependent on the human factor. All the quality and logistics decisions belong to the judgment of the operators, increasing the chance of non-conformities (e.g., erroneous classification, material discharged in the wrong location, error loading material in the buckets). To overcome these issues, online classification of the scrap is the keystone. Starting from the arrival of scrap at the plant, through the acceptance of the delivery note and the check-in of the carriers, Automatic Scrap Classification gives support to inbound-scrap control and classification, enabling real-time traceability of the scrap inside the bays. The Quality Control System will benefit from all the details of the material used in production. Danieli Automation implemented the Q-ASC a system that, leveraging Artificial Intelligence (AI) and deep learning techniques, can assist scrap classification procedures through computer vision and automatic scrap recognition. The goal of scrap identification is to localize and assign a specific class label to a given visual sample of scrap or inert/hazardous material. The classification can be conducted using different methodologies based on material shapes or dimensions. Q-ASC is the entry point for the Scrap Yard Management and can be considered as the central data hub for managing the scrap inbound to the plant, connecting all the systems requiring reliable scrap data.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130313353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892627
Lin Li, Weike Pan, Guanliang Chen, Zhong Ming
Recommendation methods based on deep learning frameworks have drastically increased over recent years, covering virtually all the sub-topics in recommender systems. Among these topics, one-class collaborative filtering (OCCF) as a fundamental problem has been studied most extensively. However, most of existing deep learning-based OCCF methods are essentially focused on either defining new prediction rules by replacing conventional shallow and linear inner products with a variety of neural architectures, or learning more expressive user and item factors with neural networks, which may still suffer from the inferior recommendation performance due to the underlying preference assumptions typically defined on single items. In this paper, we propose to address the limitation and justify the capacity of deep learning-based recommendation methods by adapting the setwise preference to the underlying assumption during the model learning process. Specifically, we propose a new setwise preference assumption under the neural recommendation frameworks and devise a general solution named DeepSet, which aims to enhance the learning abilities of neural collaborative filtering methods by activating the setwise preference at different neural layers, namely 1) the feature input layer, 2) the feature output layer, and 3) the prediction layer. Extensive experiments on four commonly used datasets show that our solution can effectively boost the performance of existing deep learning based methods without introducing any new model parameters.
{"title":"DeepSet: Deep Learning-based Recommendation with Setwise Preference","authors":"Lin Li, Weike Pan, Guanliang Chen, Zhong Ming","doi":"10.1109/IJCNN55064.2022.9892627","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892627","url":null,"abstract":"Recommendation methods based on deep learning frameworks have drastically increased over recent years, covering virtually all the sub-topics in recommender systems. Among these topics, one-class collaborative filtering (OCCF) as a fundamental problem has been studied most extensively. However, most of existing deep learning-based OCCF methods are essentially focused on either defining new prediction rules by replacing conventional shallow and linear inner products with a variety of neural architectures, or learning more expressive user and item factors with neural networks, which may still suffer from the inferior recommendation performance due to the underlying preference assumptions typically defined on single items. In this paper, we propose to address the limitation and justify the capacity of deep learning-based recommendation methods by adapting the setwise preference to the underlying assumption during the model learning process. Specifically, we propose a new setwise preference assumption under the neural recommendation frameworks and devise a general solution named DeepSet, which aims to enhance the learning abilities of neural collaborative filtering methods by activating the setwise preference at different neural layers, namely 1) the feature input layer, 2) the feature output layer, and 3) the prediction layer. Extensive experiments on four commonly used datasets show that our solution can effectively boost the performance of existing deep learning based methods without introducing any new model parameters.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126898189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892231
Y. Liu, Jing Liu, Wei Ni, Liang Song
The detection of abnormal events in surveillance videos with weak supervision is a challenging task, which tries to temporally find abnormal frames using readily accessible video-level labels. In this paper, we propose a self-guiding multi-instance ranking (SMR) framework, which has explored task-specific deep representations and considered the temporal correlations between video clips. Specifically, we apply a clustering algorithm to fine-tune the features extracted by the pre-trained 3D-convolutional-based models. Besides, the clustering module can generate clip-level labels for abnormal videos, and the pseudo-labels are in part used to supervise the training of the multi-instance regression. While implementing the regression module, we compare the effectiveness of various recurrent neural networks, and the results demonstrate the necessity of temporal correlations for weakly supervised video anomaly detection tasks. Experimental results on two standard benchmarks reveal that the SMR framework is comparable to the state-of-the-art approaches, with frame-level AUCs of 81.7% and 92.4% on the UCF-crime and UCSD Ped2 datasets respectively. Additionally, ablation studies and visualization results prove the effectiveness of the component, and our framework can accurately locate abnormal events.
{"title":"Abnormal Event Detection with Self-guiding Multi-instance Ranking Framework","authors":"Y. Liu, Jing Liu, Wei Ni, Liang Song","doi":"10.1109/IJCNN55064.2022.9892231","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892231","url":null,"abstract":"The detection of abnormal events in surveillance videos with weak supervision is a challenging task, which tries to temporally find abnormal frames using readily accessible video-level labels. In this paper, we propose a self-guiding multi-instance ranking (SMR) framework, which has explored task-specific deep representations and considered the temporal correlations between video clips. Specifically, we apply a clustering algorithm to fine-tune the features extracted by the pre-trained 3D-convolutional-based models. Besides, the clustering module can generate clip-level labels for abnormal videos, and the pseudo-labels are in part used to supervise the training of the multi-instance regression. While implementing the regression module, we compare the effectiveness of various recurrent neural networks, and the results demonstrate the necessity of temporal correlations for weakly supervised video anomaly detection tasks. Experimental results on two standard benchmarks reveal that the SMR framework is comparable to the state-of-the-art approaches, with frame-level AUCs of 81.7% and 92.4% on the UCF-crime and UCSD Ped2 datasets respectively. Additionally, ablation studies and visualization results prove the effectiveness of the component, and our framework can accurately locate abnormal events.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126992996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892217
Jiangwei Li, J. Zhong
We focus on retrieval-based dialogue systems. Such a system aims to select an appropriate response from a candidate pool for a given context. Recent methods commonly utilize powerful interaction-based pre-trained language models like BERT to achieve the goal. However, their time cost is usually not satisfying since the procedure of computing relevance scores is not efficient, especially in scenarios that require online response selection. We propose an efficient dialogue system that utilizes a representation-based BERT to address this issue, which can produce an independent representation for every response candidate and context. The relevance score can be simply calculated by the dot product. We further enhance the representation ability of this model by applying domain adaptive post-training and supervised contrastive learning fine-tuning. Experimental results on two benchmark datasets show that our method achieves competitive performance with other interaction-based models while retaining the advantage of time efficiency. We also provide an empirical and theoretical analysis of time efficiency between representation-based models and interaction-based models. The main contribution of this paper is to propose a novel methodology to build a simple but efficient dialogue system.
{"title":"Building an Efficient Retrieval-based Dialogue System with Contrastive Learning","authors":"Jiangwei Li, J. Zhong","doi":"10.1109/IJCNN55064.2022.9892217","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892217","url":null,"abstract":"We focus on retrieval-based dialogue systems. Such a system aims to select an appropriate response from a candidate pool for a given context. Recent methods commonly utilize powerful interaction-based pre-trained language models like BERT to achieve the goal. However, their time cost is usually not satisfying since the procedure of computing relevance scores is not efficient, especially in scenarios that require online response selection. We propose an efficient dialogue system that utilizes a representation-based BERT to address this issue, which can produce an independent representation for every response candidate and context. The relevance score can be simply calculated by the dot product. We further enhance the representation ability of this model by applying domain adaptive post-training and supervised contrastive learning fine-tuning. Experimental results on two benchmark datasets show that our method achieves competitive performance with other interaction-based models while retaining the advantage of time efficiency. We also provide an empirical and theoretical analysis of time efficiency between representation-based models and interaction-based models. The main contribution of this paper is to propose a novel methodology to build a simple but efficient dialogue system.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129203881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892671
Cristian Sestito, S. Perri, Rob Stewart
Several modern applications in the field of Artificial Intelligence exploit deep learning to make accurate decisions. Recent work on compression techniques allows for deep learning applications, such as computer vision, to run on Edge Computing devices. For instance, quantizing the precision of deep learning architectures allows Edge Computing devices to achieve high throughput at low power. Quantization has been mainly focused on multilayer perceptrons and convolution-based models for classification problems. However, its impact over more complex scenarios, such as image up-sampling, is still underexplored. This paper presents a systematic evaluation of the accuracy achieved by quantized neural networks when performing image up-sampling in three different applications: image compression/decompression, synthetic image generation and semantic segmentation. Taking into account the promising attitude of learnable filters to predict pixels, transposed convolutional layers are used for up-sampling. Experimental results based on analytical metrics show that acceptable accuracies are reached with quantization spanning between 3 and 7 bits. Based on the visual inspection, the range 2–6 bits guarantees appropriate accuracy.
{"title":"Accuracy Evaluation of Transposed Convolution-Based Quantized Neural Networks","authors":"Cristian Sestito, S. Perri, Rob Stewart","doi":"10.1109/IJCNN55064.2022.9892671","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892671","url":null,"abstract":"Several modern applications in the field of Artificial Intelligence exploit deep learning to make accurate decisions. Recent work on compression techniques allows for deep learning applications, such as computer vision, to run on Edge Computing devices. For instance, quantizing the precision of deep learning architectures allows Edge Computing devices to achieve high throughput at low power. Quantization has been mainly focused on multilayer perceptrons and convolution-based models for classification problems. However, its impact over more complex scenarios, such as image up-sampling, is still underexplored. This paper presents a systematic evaluation of the accuracy achieved by quantized neural networks when performing image up-sampling in three different applications: image compression/decompression, synthetic image generation and semantic segmentation. Taking into account the promising attitude of learnable filters to predict pixels, transposed convolutional layers are used for up-sampling. Experimental results based on analytical metrics show that acceptable accuracies are reached with quantization spanning between 3 and 7 bits. Based on the visual inspection, the range 2–6 bits guarantees appropriate accuracy.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130648767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}