Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892192
Islam Nasr, L. Nassar, F. Karray
Accurate estimates of fresh produce (FP) yields and prices are crucial for having fair bidding prices by retailers along with informed asking prices by farmers, leading to the best prices for customers. To have accurate estimates, the state-of-the-art deep learning (DL) models for forecasting FP yields and prices are improved in this work while a novel transfer learning (TL) framework is proposed for better generalizability. The proposed models are trained and tested using real world datasets for the Santa Barbara region in California, which contain environmental input parameters mapped to FP yield and price output parameters. Based on an aggregated measure (AGM), the proposed model, an ensemble of Attention Deep Feedforward Neural Network with Gated Recurrent Unit (GRU) units and Deep Feedforward Neural Network with embedded GRU units, is found to significantly outperform the state-of-the-art models. Beside finding the best DL, the TL framework is utilizing FP similarity, clustering, and TL techniques customized to fit the problem in hand and enhance the model generalization to other FPs. The literature similarity algorithms are improved by considering the time series features rather than the absolute values of their points. In addition, the FPs are clustered using a hierarchical clustering technique utilizing the complete linkage of a dendrogram to automate the process of finding the similarity thresholds and avoid setting them arbitrarily. Finally, the transfer learning is applied by freezing some layers of the proposed ensemble model and fine-tuning the rest leading to significant improvement in AGM compared to the best literature model.
{"title":"Transfer Learning Framework for Forecasting Fresh Produce Yield and Price","authors":"Islam Nasr, L. Nassar, F. Karray","doi":"10.1109/IJCNN55064.2022.9892192","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892192","url":null,"abstract":"Accurate estimates of fresh produce (FP) yields and prices are crucial for having fair bidding prices by retailers along with informed asking prices by farmers, leading to the best prices for customers. To have accurate estimates, the state-of-the-art deep learning (DL) models for forecasting FP yields and prices are improved in this work while a novel transfer learning (TL) framework is proposed for better generalizability. The proposed models are trained and tested using real world datasets for the Santa Barbara region in California, which contain environmental input parameters mapped to FP yield and price output parameters. Based on an aggregated measure (AGM), the proposed model, an ensemble of Attention Deep Feedforward Neural Network with Gated Recurrent Unit (GRU) units and Deep Feedforward Neural Network with embedded GRU units, is found to significantly outperform the state-of-the-art models. Beside finding the best DL, the TL framework is utilizing FP similarity, clustering, and TL techniques customized to fit the problem in hand and enhance the model generalization to other FPs. The literature similarity algorithms are improved by considering the time series features rather than the absolute values of their points. In addition, the FPs are clustered using a hierarchical clustering technique utilizing the complete linkage of a dendrogram to automate the process of finding the similarity thresholds and avoid setting them arbitrarily. Finally, the transfer learning is applied by freezing some layers of the proposed ensemble model and fine-tuning the rest leading to significant improvement in AGM compared to the best literature model.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129995415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892925
Massimo Pavan, Armando Caltabiano, M. Roveri
Tiny Machine Learning (TinyML) is a novel research area aiming at designing machine and deep learning models and algorithms able to be executed on tiny devices such as Internet-of-Things units, edge devices or embedded systems. In this paper we introduce, for the first time in the literature, a TinyML solution for presence-detection based on UltrawideBand (UWB) radar, which is a particularly promising radar technology for pervasive systems. To achieve this goal we introduce a novel family of tiny convolutional neural networks for the processing of UWB-radar data characterized by a reduced memory footprint and computational demand so as to satisfy the severe technological constraints of tiny devices. From this technological perspective, UWB-radars are particularly relevant in the presence-detection scenario since they do not acquire sensitive information of users (e.g., images, videos or audio), hence preserving their privacy. The proposed solution has been successfully tested on a public-available benchmark for the indoor presence detection and on a real-world application of in-car presence detection.
{"title":"TinyML for UWB-radar based presence detection","authors":"Massimo Pavan, Armando Caltabiano, M. Roveri","doi":"10.1109/IJCNN55064.2022.9892925","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892925","url":null,"abstract":"Tiny Machine Learning (TinyML) is a novel research area aiming at designing machine and deep learning models and algorithms able to be executed on tiny devices such as Internet-of-Things units, edge devices or embedded systems. In this paper we introduce, for the first time in the literature, a TinyML solution for presence-detection based on UltrawideBand (UWB) radar, which is a particularly promising radar technology for pervasive systems. To achieve this goal we introduce a novel family of tiny convolutional neural networks for the processing of UWB-radar data characterized by a reduced memory footprint and computational demand so as to satisfy the severe technological constraints of tiny devices. From this technological perspective, UWB-radars are particularly relevant in the presence-detection scenario since they do not acquire sensitive information of users (e.g., images, videos or audio), hence preserving their privacy. The proposed solution has been successfully tested on a public-available benchmark for the indoor presence detection and on a real-world application of in-car presence detection.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134009571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9891908
J. Yang, Qiqi Chen
Identifying causal relationships from observational time-series data is a key problem in dealing with complex dynamical systems such as in the industrial or natural climate fields. Data-driven causal network construction in such systems is challenging since data sets are often high-dimensional and nonlinear. In response to this challenge, this paper combines partial rank correlation coefficients and proposes a new structure learning algorithm, TS-PRCS, suitable for time-series causal network models. In this article, we mainly make three contributions. First, we proved that partial rank correlation can be used as a standard of independence tests. Second, we combined partial rank correlation with constraint-based causality discovery methods, and proposed a causal network discovery algorithm (TS-PRCS) on time-series data based on partial rank correlation. Finally, the effectiveness of the algorithm is proven in experiments on time-series data generated by a time-series causal network model. Compared with an existing algorithm, the proposed algorithm achieves better results on high-dimensional and nonlinear data systems, and it also demonstrates good time performance. In particular, the algorithm has been applied to real data generated by a power plant. Experiments show that our method improves the ability to detect causality on time-series data, and further promotes the development of the field of causal network construction on time-series data.
{"title":"A Causal Network Construction Algorithm Based on Partial Rank Correlation on Time Series","authors":"J. Yang, Qiqi Chen","doi":"10.1109/IJCNN55064.2022.9891908","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9891908","url":null,"abstract":"Identifying causal relationships from observational time-series data is a key problem in dealing with complex dynamical systems such as in the industrial or natural climate fields. Data-driven causal network construction in such systems is challenging since data sets are often high-dimensional and nonlinear. In response to this challenge, this paper combines partial rank correlation coefficients and proposes a new structure learning algorithm, TS-PRCS, suitable for time-series causal network models. In this article, we mainly make three contributions. First, we proved that partial rank correlation can be used as a standard of independence tests. Second, we combined partial rank correlation with constraint-based causality discovery methods, and proposed a causal network discovery algorithm (TS-PRCS) on time-series data based on partial rank correlation. Finally, the effectiveness of the algorithm is proven in experiments on time-series data generated by a time-series causal network model. Compared with an existing algorithm, the proposed algorithm achieves better results on high-dimensional and nonlinear data systems, and it also demonstrates good time performance. In particular, the algorithm has been applied to real data generated by a power plant. Experiments show that our method improves the ability to detect causality on time-series data, and further promotes the development of the field of causal network construction on time-series data.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134080188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892374
Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf, André Artelt, Barbara Hammer
The notion of concept drift refers to the phenomenon that the distribution which is underlying the observed data changes over time. As a consequence machine learning models may become inaccurate and need adjustment. While there do exist methods to detect concept drift, to find change points in data streams, or to adjust models in the presence of observed drift, the problem of localizing drift, i.e. identifying it in data space, is yet widely unsolved - in particular from a formal perspective. This problem however is of importance, since it enables an inspection of the most prominent characteristics, e.g. features, where drift manifests itself and can therefore be used to make informed decisions, e.g. efficient updates of the training set of online learning algorithms, and perform precise adjustments of the learning model. In this paper we present a general theoretical framework that reduces drift localization to a supervised machine learning problem. We construct a new method for drift localization thereon and demonstrate the usefulness of our theory and the performance of our algorithm by comparing it to other methods from the literature.
{"title":"Localization of Concept Drift: Identifying the Drifting Datapoints","authors":"Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf, André Artelt, Barbara Hammer","doi":"10.1109/IJCNN55064.2022.9892374","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892374","url":null,"abstract":"The notion of concept drift refers to the phenomenon that the distribution which is underlying the observed data changes over time. As a consequence machine learning models may become inaccurate and need adjustment. While there do exist methods to detect concept drift, to find change points in data streams, or to adjust models in the presence of observed drift, the problem of localizing drift, i.e. identifying it in data space, is yet widely unsolved - in particular from a formal perspective. This problem however is of importance, since it enables an inspection of the most prominent characteristics, e.g. features, where drift manifests itself and can therefore be used to make informed decisions, e.g. efficient updates of the training set of online learning algorithms, and perform precise adjustments of the learning model. In this paper we present a general theoretical framework that reduces drift localization to a supervised machine learning problem. We construct a new method for drift localization thereon and demonstrate the usefulness of our theory and the performance of our algorithm by comparing it to other methods from the literature.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134240096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892675
Edoardo Occhipinti, H. Davies, Ghena Hammour, Danilo P. Mandic
Ear-worn devices offer the opportunity to measure vital signals in a 24/7 fashion, without the need of a clinician. These devices are however prone to motion artefacts, so that entire epochs of artefact-corrupt recordings are routinely discarded. This work aims at reducing the impact of artefacts introduced by a series of common real life daily activities such as talking, chewing, and walking while recording Electroencephalogram (EEG) from the ear canal. The approach used employs multiple external sensors, such as microphones and an accelerometer as means to capture the artefact. The proposed algorithm is a combination of Noise-Assisted Multivariate Empirical Mode Decomposition (NA-MEMD) with Adaptive Noise Cancellation (ANC), where each pair (EEG and motion sensors) of Intrinsic Mode Functions (IMFs) within NA-MEMD is fed independently to multiple Normalised Least Mean Square (NLMS) adaptive filters. The resulting denoised IMFs are then added up again to reconstruct the denoised EEG signal. Results across multiple subjects show that the so denoised EEG signals have reduced power in the frequency range occupied by artefacts. Also, different sensors provide different denoising performance in the tested artefacts, with the microphones being more sensitive to artefacts which cause internal motion within the ear-canal, such as chewing, and the accelerometer being more suitable for artefacts which come from full body movements of the subjects, such as walking.
{"title":"Hearables: Artefact removal in Ear-EEG for continuous 24/7 monitoring","authors":"Edoardo Occhipinti, H. Davies, Ghena Hammour, Danilo P. Mandic","doi":"10.1109/IJCNN55064.2022.9892675","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892675","url":null,"abstract":"Ear-worn devices offer the opportunity to measure vital signals in a 24/7 fashion, without the need of a clinician. These devices are however prone to motion artefacts, so that entire epochs of artefact-corrupt recordings are routinely discarded. This work aims at reducing the impact of artefacts introduced by a series of common real life daily activities such as talking, chewing, and walking while recording Electroencephalogram (EEG) from the ear canal. The approach used employs multiple external sensors, such as microphones and an accelerometer as means to capture the artefact. The proposed algorithm is a combination of Noise-Assisted Multivariate Empirical Mode Decomposition (NA-MEMD) with Adaptive Noise Cancellation (ANC), where each pair (EEG and motion sensors) of Intrinsic Mode Functions (IMFs) within NA-MEMD is fed independently to multiple Normalised Least Mean Square (NLMS) adaptive filters. The resulting denoised IMFs are then added up again to reconstruct the denoised EEG signal. Results across multiple subjects show that the so denoised EEG signals have reduced power in the frequency range occupied by artefacts. Also, different sensors provide different denoising performance in the tested artefacts, with the microphones being more sensitive to artefacts which cause internal motion within the ear-canal, such as chewing, and the accelerometer being more suitable for artefacts which come from full body movements of the subjects, such as walking.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134423544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892394
Bruno A. Pimentel, Jadson Crislan Santos Costa
The clustering task has challenges that change according to the data, thus different algorithms have been proposed where each one has a bias on the data. In the fuzzy clustering approach, the most popular algorithm is the Fuzzy C-Means (FCM), which uses a global view of variables to calculate the degree of membership. On the other hand, the Multivariate Fuzzy C-Means (MFCM) uses a local view of variables to calculate the degree of membership. In this work, we proposed a new hybrid algorithm to use a combined local and global view approaches. For this, a new objective function based on the hybridization parameter is introduced. The experiments show the robustness and superiority of the proposed algorithm in real and synthetic datasets in most of the analyzed scenarios.
{"title":"A hybrid algorithm for fuzzy clustering based on global and local membership degree","authors":"Bruno A. Pimentel, Jadson Crislan Santos Costa","doi":"10.1109/IJCNN55064.2022.9892394","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892394","url":null,"abstract":"The clustering task has challenges that change according to the data, thus different algorithms have been proposed where each one has a bias on the data. In the fuzzy clustering approach, the most popular algorithm is the Fuzzy C-Means (FCM), which uses a global view of variables to calculate the degree of membership. On the other hand, the Multivariate Fuzzy C-Means (MFCM) uses a local view of variables to calculate the degree of membership. In this work, we proposed a new hybrid algorithm to use a combined local and global view approaches. For this, a new objective function based on the hybridization parameter is introduced. The experiments show the robustness and superiority of the proposed algorithm in real and synthetic datasets in most of the analyzed scenarios.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134004289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
End-to-End open-domain dialogue systems suffer from the issues of generating inconsistent and repetitive responses. Existing dialogue models pay attention to unilaterally incorporating personalized knowledge into the dialogue to enhance the quality of generated response. However, they ignore that incorporating the personality-related information from dialogue history into personalized knowledge can boost the subsequent dialogue quality. In this paper, A Knowledge-Conversation Cyclic Utilization Mechanism (KC2UM) is proposed to enhance the dialogue quality. Specifically, A novel cyclic interaction module is designed to iteratively incorporate personalized knowledge into each turn conversation and capture the personality-related conversation information to enhance personalized knowledge semantic representation. We represent the knowledge with semantic and utilization representations to keep track of the personalized knowledge utilization. Experiments on two knowledge-grounded dialogue datasets show that our approach manages to select knowledge more accurately and generates more informative responses.
{"title":"KC2UM: Knowledge-Conversation Cyclic Utilization Mechanism for Knowledge-Grounded Dialogue Generation","authors":"Yajing Sun, Yue Hu, Luxi Xing, Wei Peng, Yuqiang Xie, Xingsheng Zhang","doi":"10.1109/IJCNN55064.2022.9892149","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892149","url":null,"abstract":"End-to-End open-domain dialogue systems suffer from the issues of generating inconsistent and repetitive responses. Existing dialogue models pay attention to unilaterally incorporating personalized knowledge into the dialogue to enhance the quality of generated response. However, they ignore that incorporating the personality-related information from dialogue history into personalized knowledge can boost the subsequent dialogue quality. In this paper, A Knowledge-Conversation Cyclic Utilization Mechanism (KC2UM) is proposed to enhance the dialogue quality. Specifically, A novel cyclic interaction module is designed to iteratively incorporate personalized knowledge into each turn conversation and capture the personality-related conversation information to enhance personalized knowledge semantic representation. We represent the knowledge with semantic and utilization representations to keep track of the personalized knowledge utilization. Experiments on two knowledge-grounded dialogue datasets show that our approach manages to select knowledge more accurately and generates more informative responses.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134181061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Node classification is a fundamental research problem in graph neural networks(GNNs), which uses node's feature and label to capture node embedding in a low dimension. The existing graph node classification approaches mainly focus on GNNs from global and local perspectives. The relevant research is relatively insufficient for the micro perspective, which refers to the feature itself. In this paper, we prove that deeper GCNs' features will be updated with the same coefficient in the same dimension, limiting deeper GCNs' expression. To overcome the limits of the deeper GCN model, we propose a zero feature (k-ZF) method to train GCNs. Specifically, k-ZF randomly sets the initial k feature value to zero, acting as a data rectifier and augmenter, and is also a skill equipped with GCNs models and other GCNs skills. Extensive experiments based on three public datasets show that k-ZF significantly improves GCNs in the feature aspect and achieves competitive accuracy.
{"title":"Rethinking the Feature Iteration Process of Graph Convolution Networks","authors":"Bisheng Tang, Xiaojun Chen, Dakui Wang, Zhendong Zhao","doi":"10.1109/IJCNN55064.2022.9892737","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892737","url":null,"abstract":"Node classification is a fundamental research problem in graph neural networks(GNNs), which uses node's feature and label to capture node embedding in a low dimension. The existing graph node classification approaches mainly focus on GNNs from global and local perspectives. The relevant research is relatively insufficient for the micro perspective, which refers to the feature itself. In this paper, we prove that deeper GCNs' features will be updated with the same coefficient in the same dimension, limiting deeper GCNs' expression. To overcome the limits of the deeper GCN model, we propose a zero feature (k-ZF) method to train GCNs. Specifically, k-ZF randomly sets the initial k feature value to zero, acting as a data rectifier and augmenter, and is also a skill equipped with GCNs models and other GCNs skills. Extensive experiments based on three public datasets show that k-ZF significantly improves GCNs in the feature aspect and achieves competitive accuracy.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134312002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892910
Manuel Benavent-Lledó, Sergiu Oprea, John Alejandro Castro-Vargas, David Mulero-Pérez, J. G. Rodríguez
Egocentric videos provide a rich source of hand-object interactions that support action recognition. However, prior to action recognition, one may need to detect the presence of hands and objects in the scene. In this work, we propose an action estimation architecture based on the simultaneous detection of the hands and objects in the scene. For the hand and object detection, we have adapted well known YOLO architecture, leveraging its inference speed and accuracy. We experimentally determined the best performing architecture for our task. After obtaining the hand and object bounding boxes, we select the most likely objects to interact with, i.e., the closest objects to a hand. The rough estimation of the closest objects to a hand is a direct approach to determine hand-object interaction. After identifying the scene and alongside a set of per-object and global actions, we could determine the most suitable action we are performing in each context.
{"title":"Predicting Human-Object Interactions in Egocentric Videos","authors":"Manuel Benavent-Lledó, Sergiu Oprea, John Alejandro Castro-Vargas, David Mulero-Pérez, J. G. Rodríguez","doi":"10.1109/IJCNN55064.2022.9892910","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892910","url":null,"abstract":"Egocentric videos provide a rich source of hand-object interactions that support action recognition. However, prior to action recognition, one may need to detect the presence of hands and objects in the scene. In this work, we propose an action estimation architecture based on the simultaneous detection of the hands and objects in the scene. For the hand and object detection, we have adapted well known YOLO architecture, leveraging its inference speed and accuracy. We experimentally determined the best performing architecture for our task. After obtaining the hand and object bounding boxes, we select the most likely objects to interact with, i.e., the closest objects to a hand. The rough estimation of the closest objects to a hand is a direct approach to determine hand-object interaction. After identifying the scene and alongside a set of per-object and global actions, we could determine the most suitable action we are performing in each context.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132961379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-18DOI: 10.1109/IJCNN55064.2022.9892312
Shike Wang, Wen Zhang, Wenyu Guo, Dong Yu, Pengyuan Liu
Multimodal machine translation (MMT) is a task that incorporates extra image modality with text to translate. Previous works have worked on the interaction between two modalities and investigated the need of visual modality. However, few works focus on the models with better and more effective visual representation as input. We argue that the performance of MMT systems will get improved when better visual representation inputs into the systems. To investigate the thought, we introduce mT-ICL, a multimodal Transformer model with image contrastive learning. The contrastive objective is optimized to enhance the representation ability of the image encoder so that the encoder can generate better and more adaptive visual representation. Experiments show that our mT-ICL significantly outperforms the strong baseline and achieves the new SOTA on most of test sets of English-to-German and English-to-French. Further analysis reveals that visual modality works more than a regularization method under contrastive learning framework.
{"title":"Contrastive Learning Based Visual Representation Enhancement for Multimodal Machine Translation","authors":"Shike Wang, Wen Zhang, Wenyu Guo, Dong Yu, Pengyuan Liu","doi":"10.1109/IJCNN55064.2022.9892312","DOIUrl":"https://doi.org/10.1109/IJCNN55064.2022.9892312","url":null,"abstract":"Multimodal machine translation (MMT) is a task that incorporates extra image modality with text to translate. Previous works have worked on the interaction between two modalities and investigated the need of visual modality. However, few works focus on the models with better and more effective visual representation as input. We argue that the performance of MMT systems will get improved when better visual representation inputs into the systems. To investigate the thought, we introduce mT-ICL, a multimodal Transformer model with image contrastive learning. The contrastive objective is optimized to enhance the representation ability of the image encoder so that the encoder can generate better and more adaptive visual representation. Experiments show that our mT-ICL significantly outperforms the strong baseline and achieves the new SOTA on most of test sets of English-to-German and English-to-French. Further analysis reveals that visual modality works more than a regularization method under contrastive learning framework.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132615682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}