Pub Date : 2019-02-19DOI: 10.5220/0007367205010508
Seiya Satoh, R. Nakano
There are two ways to learn radial basis function (RBF) networks: one-stage and two-stage learnings. Recently a very powerful one-stage learning method called RBF-SSF has been proposed, which can stably find a series of excellent solutions, making good use of singular regions, and can monotonically decrease training error along with the increase of hidden units. RBF-SSF was built by applying the SSF (singularity stairs following) paradigm to RBF networks; the SSF paradigm was originally and successfully proposed for multilayer perceptrons. Although RBF-SSF has the strong capability to find excellent solutions, it required a lot of time mainly because it computes the Hessian. This paper proposes a faster version of RBF-SSF called RBF-SSF(pH) by introducing partial calculation of the Hessian. The experiments using two datasets showed RBF-SSF(pH) ran as fast as usual one-stage learning methods while keeping the excellent solution quality.
{"title":"Faster RBF Network Learning Utilizing Singular Regions","authors":"Seiya Satoh, R. Nakano","doi":"10.5220/0007367205010508","DOIUrl":"https://doi.org/10.5220/0007367205010508","url":null,"abstract":"There are two ways to learn radial basis function (RBF) networks: one-stage and two-stage learnings. Recently a very powerful one-stage learning method called RBF-SSF has been proposed, which can stably find a series of excellent solutions, making good use of singular regions, and can monotonically decrease training error along with the increase of hidden units. RBF-SSF was built by applying the SSF (singularity stairs following) paradigm to RBF networks; the SSF paradigm was originally and successfully proposed for multilayer perceptrons. Although RBF-SSF has the strong capability to find excellent solutions, it required a lot of time mainly because it computes the Hessian. This paper proposes a faster version of RBF-SSF called RBF-SSF(pH) by introducing partial calculation of the Hessian. The experiments using two datasets showed RBF-SSF(pH) ran as fast as usual one-stage learning methods while keeping the excellent solution quality.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114435235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007690902550265
M. Marras, Pedro A. Marín-Reyes, J. Lorenzo-Navarro, M. C. Santana, G. Fenu
Intelligent technologies have pervaded our daily life, making it easier for people to complete their activities. One emerging application is involving the use of robots for assisting people in various tasks (e.g., visiting a museum). In this context, it is crucial to enable robots to correctly identify people. Existing robots often use facial information to establish the identity of a person of interest. But, the face alone may not offer enough relevant information due to variations in pose, illumination, resolution and recording distance. Other biometric modalities like the voice can improve the recognition performance in these conditions. However, the existing datasets in robotic scenarios usually do not include the audio cue and tend to suffer from one or more limitations: most of them are acquired under controlled conditions, limited in number of identities or samples per user, collected by the same recording device, and/or not freely available. In this paper, we propose AveRobot, an audio-visual dataset of 111 participants vocalizing short sentences under robot assistance scenarios. The collection took place into a three-floor building through eight different cameras with built-in microphones. The performance for face and voice re-identification and verification was evaluated on this dataset with deep learning baselines, and compared against audio-visual datasets from diverse scenarios. The results showed that AveRobot is a challenging dataset for people re-identification and verification.
{"title":"AveRobot: An Audio-visual Dataset for People Re-identification and Verification in Human-Robot Interaction","authors":"M. Marras, Pedro A. Marín-Reyes, J. Lorenzo-Navarro, M. C. Santana, G. Fenu","doi":"10.5220/0007690902550265","DOIUrl":"https://doi.org/10.5220/0007690902550265","url":null,"abstract":"Intelligent technologies have pervaded our daily life, making it easier for people to complete their activities. One emerging application is involving the use of robots for assisting people in various tasks (e.g., visiting a museum). In this context, it is crucial to enable robots to correctly identify people. Existing robots often use facial information to establish the identity of a person of interest. But, the face alone may not offer enough relevant information due to variations in pose, illumination, resolution and recording distance. Other biometric modalities like the voice can improve the recognition performance in these conditions. However, the existing datasets in robotic scenarios usually do not include the audio cue and tend to suffer from one or more limitations: most of them are acquired under controlled conditions, limited in number of identities or samples per user, collected by the same recording device, and/or not freely available. In this paper, we propose AveRobot, an audio-visual dataset of 111 participants vocalizing short sentences under robot assistance scenarios. The collection took place into a three-floor building through eight different cameras with built-in microphones. The performance for face and voice re-identification and verification was evaluated on this dataset with deep learning baselines, and compared against audio-visual datasets from diverse scenarios. The results showed that AveRobot is a challenging dataset for people re-identification and verification.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122117285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007688709000907
Marco Filax, Tim Gonschorek, F. Ortmeier
Using large datasets is essential for machine learning. In practice, training a machine learning algorithm requires hundreds of samples. Multiple off-the-shelf datasets from the scientific domain exist to benchmark new approaches. However, when machine learning algorithms transit to industry, e.g., for a particular image classification problem, hundreds of specific purpose images are collected and annotated in laborious manual work. In this paper, we present a novel system to decrease the effort of annotating those large image sets. Therefore, we generate 2D bounding boxes from minimal 3D annotations using the known location and orientation of the camera. We annotate a particular object of interest in 3D once and project these annotations on to every frame of a video stream. The proposed approach is designed to work with off-the-shelf hardware. We demonstrate its applicability with an example from the real world. We generated a more extensive dataset than available in other works for a particular industrial use case: fine-grained recognition of items within grocery stores. Further, we make our dataset available to the interested vision community consisting of over 60,000 images. Some images were taken under ideal conditions for training while others were taken with the proposed approach in the wild.
{"title":"Data for Image Recognition Tasks: An Efficient Tool for Fine-Grained Annotations","authors":"Marco Filax, Tim Gonschorek, F. Ortmeier","doi":"10.5220/0007688709000907","DOIUrl":"https://doi.org/10.5220/0007688709000907","url":null,"abstract":"Using large datasets is essential for machine learning. In practice, training a machine learning algorithm requires hundreds of samples. Multiple off-the-shelf datasets from the scientific domain exist to benchmark new approaches. However, when machine learning algorithms transit to industry, e.g., for a particular image classification problem, hundreds of specific purpose images are collected and annotated in laborious manual work. In this paper, we present a novel system to decrease the effort of annotating those large image sets. Therefore, we generate 2D bounding boxes from minimal 3D annotations using the known location and orientation of the camera. We annotate a particular object of interest in 3D once and project these annotations on to every frame of a video stream. The proposed approach is designed to work with off-the-shelf hardware. We demonstrate its applicability with an example from the real world. We generated a more extensive dataset than available in other works for a particular industrial use case: fine-grained recognition of items within grocery stores. Further, we make our dataset available to the interested vision community consisting of over 60,000 images. Some images were taken under ideal conditions for training while others were taken with the proposed approach in the wild.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129637016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007346504290437
Kripasindhu Sarkar, D. Stricker
We present a simple method of domain adaptation between synthetic images and real images - by high quality rendering of the 3D models and correlation alignment. Using this method, we solve the problem of 3D object recognition in 2D images by fine-tuning existing pretrained CNN models for the object categories using the rendered images. Experimentally, we show that our rendering pipeline along with the correlation alignment improve the recognition accuracy of existing CNN based recognition trained on rendered images - by a canonical renderer - by a large margin. Using the same idea we present a general image classifier of common objects which is trained only on the 3D models from the publicly available databases, and show that a small number of training models are sufficient to capture different variations within and across the classes.
{"title":"Simple Domain Adaptation for CAD based Object Recognition","authors":"Kripasindhu Sarkar, D. Stricker","doi":"10.5220/0007346504290437","DOIUrl":"https://doi.org/10.5220/0007346504290437","url":null,"abstract":"We present a simple method of domain adaptation between synthetic images and real images - by high quality rendering of the 3D models and correlation alignment. Using this method, we solve the problem of 3D object recognition in 2D images by fine-tuning existing pretrained CNN models for the object categories using the rendered images. Experimentally, we show that our rendering pipeline along with the correlation alignment improve the recognition accuracy of existing CNN based recognition trained on rendered images - by a canonical renderer - by a large margin. Using the same idea we present a general image classifier of common objects which is trained only on the 3D models from the publicly available databases, and show that a small number of training models are sufficient to capture different variations within and across the classes.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114562434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.1007/978-3-030-40014-9_1
Shengkun Xie, A. Lawniczak
{"title":"Fourier Spectral Domain Functional Principal Component Analysis of EEG Signals","authors":"Shengkun Xie, A. Lawniczak","doi":"10.1007/978-3-030-40014-9_1","DOIUrl":"https://doi.org/10.1007/978-3-030-40014-9_1","url":null,"abstract":"","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126697607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007383205650573
Gizem Aras, G. Ayhan, Mehmet Sarıkaya, A. A. Tokuç, C. O. Sakar
Hotel Room Sales prediction using previous booking data is a prominent research topic for the online travel agency (OTA) sector. Various approaches have been proposed to predict hotel room sales for different prediction horizons, such as yearly demand or daily number of reservations. An OTA website includes offers of many companies for the same hotel, and the position of the company’s offer in OTA website depends on the bid amount given for each click by the company. Therefore, the accurate prediction of the sales amount for a given bid is a crucial need in revenue and cost management for the companies in the sector. In this paper, we forecast the next day’s sales amount in order to provide an estimate of daily revenue generated per hotel. An important contribution of our study is to use an enriched dataset constructed by combining the most informative features proposed in various related studies for hotel sales prediction. Moreover, we enrich this dataset with a set of OTA specific features that possess information about the relative position of the company’s offers to that of its competitors in a travel metasearch engine website. We provide a real application on the hotel room sales data of a large OTA in Turkey. The comparative results show that enrichment of the input representation with the OTA-specific additional features increases the generalization ability of the prediction models, and tree-based boosting algorithms perform the best results on this task.
{"title":"Forecasting Hotel Room Sales within Online Travel Agencies by Combining Multiple Feature Sets","authors":"Gizem Aras, G. Ayhan, Mehmet Sarıkaya, A. A. Tokuç, C. O. Sakar","doi":"10.5220/0007383205650573","DOIUrl":"https://doi.org/10.5220/0007383205650573","url":null,"abstract":"Hotel Room Sales prediction using previous booking data is a prominent research topic for the online travel agency (OTA) sector. Various approaches have been proposed to predict hotel room sales for different prediction horizons, such as yearly demand or daily number of reservations. An OTA website includes offers of many companies for the same hotel, and the position of the company’s offer in OTA website depends on the bid amount given for each click by the company. Therefore, the accurate prediction of the sales amount for a given bid is a crucial need in revenue and cost management for the companies in the sector. In this paper, we forecast the next day’s sales amount in order to provide an estimate of daily revenue generated per hotel. An important contribution of our study is to use an enriched dataset constructed by combining the most informative features proposed in various related studies for hotel sales prediction. Moreover, we enrich this dataset with a set of OTA specific features that possess information about the relative position of the company’s offers to that of its competitors in a travel metasearch engine website. We provide a real application on the hotel room sales data of a large OTA in Turkey. The comparative results show that enrichment of the input representation with the OTA-specific additional features increases the generalization ability of the prediction models, and tree-based boosting algorithms perform the best results on this task.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"255 16","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133390055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007683708810886
Dara Pir
This paper presents the novel Cascaded acoustic Group and Individual Feature Selection (CGI-FS) method for automatic recognition of food likability rating addressed in the ICMI 2018 Eating Analysis and Tracking Challenge’s Likability Sub-Challenge. Employing the speech and video recordings of the iHEARu-EAT database, the Likability Sub-Challenge attempts to recognize self-reported binary labels, ‘Neutral’ and ‘Like’, assigned by subjects to food they consumed while speaking. CGI-FS uses an audio approach and performs a sequence of two feature selection operations by considering the acoustic feature space first in groups and then individually. In CGI-FS, an acoustic group feature is defined as a collection of features generated by the application of a single statistical functional to a specified set of audio low-level descriptors. We investigate the performance of CGI-FS using four different classifiers and evaluate the relevance of group features to the task. All four CGI-FS system results outperform the Likability Sub-Challenge baseline on iHEARu-EAT development data with the best performance achieving a 9.8% relative Unweighted Average Recall improvement over it.
{"title":"Cascaded Acoustic Group and Individual Feature Selection for Recognition of Food Likability","authors":"Dara Pir","doi":"10.5220/0007683708810886","DOIUrl":"https://doi.org/10.5220/0007683708810886","url":null,"abstract":"This paper presents the novel Cascaded acoustic Group and Individual Feature Selection (CGI-FS) method for automatic recognition of food likability rating addressed in the ICMI 2018 Eating Analysis and Tracking Challenge’s Likability Sub-Challenge. Employing the speech and video recordings of the iHEARu-EAT database, the Likability Sub-Challenge attempts to recognize self-reported binary labels, ‘Neutral’ and ‘Like’, assigned by subjects to food they consumed while speaking. CGI-FS uses an audio approach and performs a sequence of two feature selection operations by considering the acoustic feature space first in groups and then individually. In CGI-FS, an acoustic group feature is defined as a collection of features generated by the application of a single statistical functional to a specified set of audio low-level descriptors. We investigate the performance of CGI-FS using four different classifiers and evaluate the relevance of group features to the task. All four CGI-FS system results outperform the Likability Sub-Challenge baseline on iHEARu-EAT development data with the best performance achieving a 9.8% relative Unweighted Average Recall improvement over it.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122018336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007695409080914
Johannes Steffen, Georg Hille, Klaus D. Tönnies
This work addresses the automatic enhancement of visual percepts of virtual patients with retinal implants. Specifically, we render the task as an image transformation problem within an artificial neural network. The neurophysiological model of (Nanduri et al., 2012) was implemented as a tensor network to simulate a virtual patient’s visual percept and used together with an image transformation network in order to perform end-to-end learning on an image reconstruction and a classification task. The image reconstruction task was evaluated using the MNIST data set and yielded plausible results w.r.t. the learned transformations while halving the dissimilarity (mean-squared-error) of an input image to its simulated visual percept. Furthermore, the classification task was evaluated on the cifar-10 data set. Experiments show, that classification accuracy increases by approximately 12.9% when a suitable input image transformation is learned.
这项工作解决了视网膜植入的虚拟患者视觉感知的自动增强。具体来说,我们将该任务呈现为人工神经网络中的图像变换问题。(Nanduri et al., 2012)的神经生理学模型被实现为一个张量网络来模拟虚拟患者的视觉感知,并与图像变换网络一起使用,以便对图像重建和分类任务进行端到端学习。使用MNIST数据集对图像重建任务进行了评估,并在将输入图像与其模拟视觉感知的不相似性(均方误差)减半的同时,在学习转换的基础上产生了可信的结果。此外,在cifar-10数据集上对分类任务进行了评估。实验表明,当学习到合适的输入图像变换后,分类准确率提高了约12.9%。
{"title":"Automatic Perception Enhancement for Simulated Retinal Implants","authors":"Johannes Steffen, Georg Hille, Klaus D. Tönnies","doi":"10.5220/0007695409080914","DOIUrl":"https://doi.org/10.5220/0007695409080914","url":null,"abstract":"This work addresses the automatic enhancement of visual percepts of virtual patients with retinal implants. Specifically, we render the task as an image transformation problem within an artificial neural network. The neurophysiological model of (Nanduri et al., 2012) was implemented as a tensor network to simulate a virtual patient’s visual percept and used together with an image transformation network in order to perform end-to-end learning on an image reconstruction and a classification task. The image reconstruction task was evaluated using the MNIST data set and yielded plausible results w.r.t. the learned transformations while halving the dissimilarity (mean-squared-error) of an input image to its simulated visual percept. Furthermore, the classification task was evaluated on the cifar-10 data set. Experiments show, that classification accuracy increases by approximately 12.9% when a suitable input image transformation is learned.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122401661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007380701730180
Yasunobu Imamura, N. Higuchi, T. Shinohara, K. Hirata, T. Kuboyama
Annealing by Increasing Resampling (AIR) is a stochastic hill-climbing optimization by resampling with increasing size for evaluating an objective function. In this paper, we introduce a unified view of the conventional Simulated Annealing (SA) and AIR. In this view, we generalize both SA and AIR to a stochastic hill-climbing for objective functions with stochastic fluctuations, i.e., logit and probit, respectively. Since the logit function is approximated by the probit function, we show that AIR is regarded as an approximation of SA. The experimental results on sparse pivot selection and annealing-based clustering also support that AIR is an approximation of SA. Moreover, when an objective function requires a large number of samples, AIR is much faster than SA without sacrificing the quality of the results.
{"title":"Annealing by Increasing Resampling in the Unified View of Simulated Annealing","authors":"Yasunobu Imamura, N. Higuchi, T. Shinohara, K. Hirata, T. Kuboyama","doi":"10.5220/0007380701730180","DOIUrl":"https://doi.org/10.5220/0007380701730180","url":null,"abstract":"Annealing by Increasing Resampling (AIR) is a stochastic hill-climbing optimization by resampling with increasing size for evaluating an objective function. In this paper, we introduce a unified view of the conventional Simulated Annealing (SA) and AIR. In this view, we generalize both SA and AIR to a stochastic hill-climbing for objective functions with stochastic fluctuations, i.e., logit and probit, respectively. Since the logit function is approximated by the probit function, we show that AIR is regarded as an approximation of SA. The experimental results on sparse pivot selection and annealing-based clustering also support that AIR is an approximation of SA. Moreover, when an objective function requires a large number of samples, AIR is much faster than SA without sacrificing the quality of the results.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116516995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-02-19DOI: 10.5220/0007682502320242
Martin Torstensson, B. Durán, Cristofer Englund
Traffic situations leading up to accidents have been shown to be greatly affected by human errors. To reduce these errors, warning systems such as Driver Alert Control, Collision Warning and Lane Departure Warning have been introduced. However, there is still room for improvement, both regarding the timing of when a warning should be given as well as the time needed to detect a hazardous situation in advance. Two factors that affect when a warning should be given are the environment and the actions of the driver. This study proposes an artificial neural network-based approach consisting of a convolutional neural network and a recurrent neural network with long short-term memory to detect and predict different actions of a driver inside a vehicle. The network achieved an accuracy of 84% while predicting the actions of the driver in the next frame, and an accuracy of 58% 20 frames ahead with a sampling rate of approximately 30 frames per second.
{"title":"Using Recurrent Neural Networks for Action and Intention Recognition of Car Drivers","authors":"Martin Torstensson, B. Durán, Cristofer Englund","doi":"10.5220/0007682502320242","DOIUrl":"https://doi.org/10.5220/0007682502320242","url":null,"abstract":"Traffic situations leading up to accidents have been shown to be greatly affected by human errors. To reduce these errors, warning systems such as Driver Alert Control, Collision Warning and Lane Departure Warning have been introduced. However, there is still room for improvement, both regarding the timing of when a warning should be given as well as the time needed to detect a hazardous situation in advance. Two factors that affect when a warning should be given are the environment and the actions of the driver. This study proposes an artificial neural network-based approach consisting of a convolutional neural network and a recurrent neural network with long short-term memory to detect and predict different actions of a driver inside a vehicle. The network achieved an accuracy of 84% while predicting the actions of the driver in the next frame, and an accuracy of 58% 20 frames ahead with a sampling rate of approximately 30 frames per second.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114486880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}