Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615877
Ana Daysi Ruvalcaba-Cardenas, T. Scoleri, Geoffrey Day
This paper proposes two novel deep learning models for 2D and 3D classification of objects in extremely low-resolution time-of-flight imagery. The models have been developed to suit contemporary range imaging hardware based on a recently fabricated Single Photon Avalanche Diode (SPAD) camera with 64 χ 64 pixel resolution. Being the first prototype of its kind, only a small data set has been collected so far which makes it challenging for training models. To bypass this hurdle, transfer learning is applied to the widely used VGG-16 convolutional neural network (CNN), with supplementary layers added specifically to handle SPAD data. This classifier and the renowned Faster-RCNN detector offer benchmark models for comparison to a newly created 3D CNN operating on time-of-flight data acquired by the SPAD sensor. Another contribution of this work is the proposed shot noise removal algorithm which is particularly useful to mitigate the camera sensitivity in situations of excessive lighting. Models have been tested in both low-light indoor settings and outdoor daytime conditions, on eight objects exhibiting small physical dimensions, low reflectivity, featureless structures and located at ranges from 25m to 700m. Despite antagonist factors, the proposed 2D model has achieved 95% average precision and recall, with higher accuracy for the 3D model.
本文提出了两种新的深度学习模型,用于极低分辨率飞行时间图像中物体的二维和三维分类。这些模型的开发是为了适应基于最近制造的64 x 64像素分辨率的单光子雪崩二极管(SPAD)相机的当代距离成像硬件。作为该类型的第一个原型,到目前为止只收集了一小部分数据集,这给训练模型带来了挑战。为了绕过这个障碍,迁移学习被应用到广泛使用的VGG-16卷积神经网络(CNN)中,并添加了专门用于处理SPAD数据的补充层。该分类器和著名的Faster-RCNN检测器提供了基准模型,用于与SPAD传感器获取的飞行时间数据上运行的新创建的3D CNN进行比较。这项工作的另一个贡献是提出的镜头噪声去除算法,该算法特别有助于在过度照明的情况下降低相机的灵敏度。模型在室内低光环境和室外白天条件下进行了测试,测试对象为8个物体,它们的物理尺寸小,反射率低,结构无特征,距离从25米到700米不等。尽管存在拮抗剂因素,所提出的2D模型达到了95%的平均精度和召回率,3D模型的准确率更高。
{"title":"Object Classification using Deep Learning on Extremely Low-Resolution Time-of-Flight Data","authors":"Ana Daysi Ruvalcaba-Cardenas, T. Scoleri, Geoffrey Day","doi":"10.1109/DICTA.2018.8615877","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615877","url":null,"abstract":"This paper proposes two novel deep learning models for 2D and 3D classification of objects in extremely low-resolution time-of-flight imagery. The models have been developed to suit contemporary range imaging hardware based on a recently fabricated Single Photon Avalanche Diode (SPAD) camera with 64 χ 64 pixel resolution. Being the first prototype of its kind, only a small data set has been collected so far which makes it challenging for training models. To bypass this hurdle, transfer learning is applied to the widely used VGG-16 convolutional neural network (CNN), with supplementary layers added specifically to handle SPAD data. This classifier and the renowned Faster-RCNN detector offer benchmark models for comparison to a newly created 3D CNN operating on time-of-flight data acquired by the SPAD sensor. Another contribution of this work is the proposed shot noise removal algorithm which is particularly useful to mitigate the camera sensitivity in situations of excessive lighting. Models have been tested in both low-light indoor settings and outdoor daytime conditions, on eight objects exhibiting small physical dimensions, low reflectivity, featureless structures and located at ranges from 25m to 700m. Despite antagonist factors, the proposed 2D model has achieved 95% average precision and recall, with higher accuracy for the 3D model.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128587255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615773
G. Belous, Andrew Busch, D. Rowlands, Yongsheng Gao
Accurate delineation of the left ventricle (LV) endocardial border in echocardiography is of vital importance for the diagnosis and treatment of heart disease. Effective segmentation of the LV is challenging due to low contrast, signal dropout and acoustic noise. In the situation where low level and region-based image cues are unable to define the LV boundary, shape prior models are critical to infer shape. These models perform well when there is low variability in the underlying shape subspace and the shape instance produced by appearance cues does not contain gross errors, however in the absence of these conditions results are often much poorer. In this paper, we first propose a shape model to overcome the problem of modelling complex shape subspaces. Our method connects the implicit relationship between image features and shape by extending graph regularized sparse nonnegative matrix factorization (NMF) to jointly learn the structure and connection between two low dimensional manifolds comprising image features and shapes, respectively. We extend conventional NMF learning to an online learning-based approach where the input image is used to leverage the learning and connection of each manifold to the most relevant subspace regions. This ensures robust shape inference and a shape model constructed from contextually relevant shapes. A fully automatic segmentation approach using a probabilistic framework is then proposed to detect the LV endocardial border. Our method is applied to a diverse dataset that contains multiple views of the LV. Results show the effectiveness of our approach compared to state-of-the-art methods.
{"title":"Online Relational Manifold Learning for Multiview Segmentation in Echocardiography","authors":"G. Belous, Andrew Busch, D. Rowlands, Yongsheng Gao","doi":"10.1109/DICTA.2018.8615773","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615773","url":null,"abstract":"Accurate delineation of the left ventricle (LV) endocardial border in echocardiography is of vital importance for the diagnosis and treatment of heart disease. Effective segmentation of the LV is challenging due to low contrast, signal dropout and acoustic noise. In the situation where low level and region-based image cues are unable to define the LV boundary, shape prior models are critical to infer shape. These models perform well when there is low variability in the underlying shape subspace and the shape instance produced by appearance cues does not contain gross errors, however in the absence of these conditions results are often much poorer. In this paper, we first propose a shape model to overcome the problem of modelling complex shape subspaces. Our method connects the implicit relationship between image features and shape by extending graph regularized sparse nonnegative matrix factorization (NMF) to jointly learn the structure and connection between two low dimensional manifolds comprising image features and shapes, respectively. We extend conventional NMF learning to an online learning-based approach where the input image is used to leverage the learning and connection of each manifold to the most relevant subspace regions. This ensures robust shape inference and a shape model constructed from contextually relevant shapes. A fully automatic segmentation approach using a probabilistic framework is then proposed to detect the LV endocardial border. Our method is applied to a diverse dataset that contains multiple views of the LV. Results show the effectiveness of our approach compared to state-of-the-art methods.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128932684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615793
D. Kamenetsky, Sau Yee Yiu, Martyn Hole
Face recognition in adverse environments, such as at long distances or in low light conditions, remains a challenging task for current state-of-the-art face matching algorithms. The facial images taken in these conditions are often low resolution and low quality due to the effects of atmospheric turbulence and/or insufficient amount of light reaching the camera. In this work, we use an atmospheric turbulence mitigation algorithm (MPE) to enhance low resolution RGB videos of faces captured either at long distances or in low light conditions. Due to its interactive nature, MPE is tuned to work well in each specific environment. We also propose three image enhancement techniques that further improve the images produced by MPE: two for low light imagery (MPEf and fMPE) and one for long distance imagery (MPEh). Experimental results show that all three methods significantly improve the image quality and face recognition performance, allowing effective face recognition in almost complete darkness (at close range) or at distances up to 200m (in daylight).
{"title":"Image Enhancement for Face Recognition in Adverse Environments","authors":"D. Kamenetsky, Sau Yee Yiu, Martyn Hole","doi":"10.1109/DICTA.2018.8615793","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615793","url":null,"abstract":"Face recognition in adverse environments, such as at long distances or in low light conditions, remains a challenging task for current state-of-the-art face matching algorithms. The facial images taken in these conditions are often low resolution and low quality due to the effects of atmospheric turbulence and/or insufficient amount of light reaching the camera. In this work, we use an atmospheric turbulence mitigation algorithm (MPE) to enhance low resolution RGB videos of faces captured either at long distances or in low light conditions. Due to its interactive nature, MPE is tuned to work well in each specific environment. We also propose three image enhancement techniques that further improve the images produced by MPE: two for low light imagery (MPEf and fMPE) and one for long distance imagery (MPEh). Experimental results show that all three methods significantly improve the image quality and face recognition performance, allowing effective face recognition in almost complete darkness (at close range) or at distances up to 200m (in daylight).","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126059532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615791
Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis
Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.
{"title":"Convolutional 3D Attention Network for Video Based Freezing of Gait Recognition","authors":"Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis","doi":"10.1109/DICTA.2018.8615791","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615791","url":null,"abstract":"Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122837838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615787
S. Kabbour, Pierre-Yves Richard
Dealing with repetitive patterns in images proves to be difficult in Multiview structure from motion. Previous work in the field suggests that this problem can be solved by clearing inconsistent rotations in the visual graph that represents pairwise relations between images. So we present a simple and rather effective algorithm, to clear the graph based on cycles. While trying to generate all cycles within the graph is computationally impossible in most cases, we choose to verify only the cycles that we need, and without relying on the spanning tree method because it puts a big emphasis on certain edges.
{"title":"Clearing Multiview Structure Graph from Inconsistencies","authors":"S. Kabbour, Pierre-Yves Richard","doi":"10.1109/DICTA.2018.8615787","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615787","url":null,"abstract":"Dealing with repetitive patterns in images proves to be difficult in Multiview structure from motion. Previous work in the field suggests that this problem can be solved by clearing inconsistent rotations in the visual graph that represents pairwise relations between images. So we present a simple and rather effective algorithm, to clear the graph based on cycles. While trying to generate all cycles within the graph is computationally impossible in most cases, we choose to verify only the cycles that we need, and without relying on the spanning tree method because it puts a big emphasis on certain edges.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124546928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615816
Elham Etemad, Q. Gao
There are many applications such as augmented or mixed reality with limited training data and computing power which results in inapplicability of convolutional neural networks in those domains. In this method, we have extracted the perceptual edge map of the image and grouped its perceptual structure-based edge elements according to gestalt psychology. The connecting points of these groups, called curve partitioning points (CPPs), are descriptive areas of the image and are utilized for image representation. In this method, the global perceptual image features, and local image representation methods are combined to encode the image according to the generated bag of CPPs using the spatial pyramid matching. The experiments on multi-label and single-label datasets show the superiority of the proposed method.
{"title":"Image Representation using Bag of Perceptual Curve Features","authors":"Elham Etemad, Q. Gao","doi":"10.1109/DICTA.2018.8615816","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615816","url":null,"abstract":"There are many applications such as augmented or mixed reality with limited training data and computing power which results in inapplicability of convolutional neural networks in those domains. In this method, we have extracted the perceptual edge map of the image and grouped its perceptual structure-based edge elements according to gestalt psychology. The connecting points of these groups, called curve partitioning points (CPPs), are descriptive areas of the image and are utilized for image representation. In this method, the global perceptual image features, and local image representation methods are combined to encode the image according to the generated bag of CPPs using the spatial pyramid matching. The experiments on multi-label and single-label datasets show the superiority of the proposed method.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128797825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615842
Son Anh Vo, J. Scanlan, L. Mirowski, P. Turner
This paper describes how conventional image processing techniques can be applied to the grading of Southern Rock Lobsters (SRL) to produce a high quality data layer which could be an input into product traceability. The research is part of a broader investigation into designing a low-cost biometric identification solution for use along the entire lobster supply chain. In approaching the image processing for lobster grading a key consideration is to develop a system capable of using low cost consumer grade cameras readily available in mobile phones. The results confirm that by combining a number of common techniques in computer vision it is possible to capture and process a set of valuable attributes from sampled lobster image including color, length, weight, legs and sex. By combining this image profile with other pre-existing data on catch location and landing port each lobster can be verifiably tracked along the supply chain journey to markets in China. The image processing research results achieved in the laboratory show high accuracy in measuring lobster carapace length that is vital for weight conversion calculations. The results also demonstrate the capability to obtain reliable values for average color, tail shape and number of legs on a lobster used in grading classifications. The findings are a major first step in the development of individual lobster biometric identification and will directly contribute to automating lobster grading in this valuable Australian fishery.
{"title":"Image Processing for Traceability: A System Prototype for the Southern Rock Lobster (SRL) Supply Chain","authors":"Son Anh Vo, J. Scanlan, L. Mirowski, P. Turner","doi":"10.1109/DICTA.2018.8615842","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615842","url":null,"abstract":"This paper describes how conventional image processing techniques can be applied to the grading of Southern Rock Lobsters (SRL) to produce a high quality data layer which could be an input into product traceability. The research is part of a broader investigation into designing a low-cost biometric identification solution for use along the entire lobster supply chain. In approaching the image processing for lobster grading a key consideration is to develop a system capable of using low cost consumer grade cameras readily available in mobile phones. The results confirm that by combining a number of common techniques in computer vision it is possible to capture and process a set of valuable attributes from sampled lobster image including color, length, weight, legs and sex. By combining this image profile with other pre-existing data on catch location and landing port each lobster can be verifiably tracked along the supply chain journey to markets in China. The image processing research results achieved in the laboratory show high accuracy in measuring lobster carapace length that is vital for weight conversion calculations. The results also demonstrate the capability to obtain reliable values for average color, tail shape and number of legs on a lobster used in grading classifications. The findings are a major first step in the development of individual lobster biometric identification and will directly contribute to automating lobster grading in this valuable Australian fishery.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117154461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615814
Allan Pennings, I. Svalbe
The presence of salt and pepper noise in imaging is a common issue that needs to be overcome in image analysis. Many potential solutions to remove this noise have been discussed over the years, but these algorithms often make the common assumption that salt noise and pepper noise appear in equal densities. This is not necessarily the case. In this paper several filters are proposed and tested across a range of different salt to pepper ratios, which result in higher PSNR and SSIM when compared to other existing filters.
{"title":"A New Method for Removing Asymmetric High Density Salt and Pepper Noise","authors":"Allan Pennings, I. Svalbe","doi":"10.1109/DICTA.2018.8615814","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615814","url":null,"abstract":"The presence of salt and pepper noise in imaging is a common issue that needs to be overcome in image analysis. Many potential solutions to remove this noise have been discussed over the years, but these algorithms often make the common assumption that salt noise and pepper noise appear in equal densities. This is not necessarily the case. In this paper several filters are proposed and tested across a range of different salt to pepper ratios, which result in higher PSNR and SSIM when compared to other existing filters.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125767391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615763
Afsaneh Koohestani, P. Kebria, A. Khosravi, S. Nahavandi
Monitoring the drivers behaviour and detecting their awareness are of vital importance for road safety. Drivers distraction and low awareness are already known to be the main reason for accidents in the world. Distraction-related crashes have greatly increased in recent years due to the proliferation of communication, entertainment, and malfunctioning of driver assistance systems. Accordingly, there is a need for advanced systems to monitor the drivers behaviour and generate a warning if a degradation in a drivers performance is detected. The purpose of this study is to analyse the vehicle and drivers data to detect the onset of distraction. Physiological measurements, such as palm electrodermal activity, heart rate, breathing rate, and perinasal perspiration are analysed and applied for the development of the monitoring system. The dataset used in this research has these measurements for 68 healthy participants (35 male, 33 female/17 elderly, 51 young). These participants completed two driving sessions in a driving simulator, including the normal and loaded drive. In the loaded scenario, drivers were texting back words. The lane deviation of vehicle was recorded as the response variable. Different classification algorithms such as generalised linear, support vector model, K-nearest neighbour and random forest machines are implemented to classify the driver's performance based on input features. Prediction results indicate that random forest performs the best by achieving an area under the curve (AUC) of over 91%. It is also found that biographic features are not informative enough to analyse drivers performance while perinasal perspiration carries the most information.
{"title":"Drivers Performance Evaluation using Physiological Measurement in a Driving Simulator","authors":"Afsaneh Koohestani, P. Kebria, A. Khosravi, S. Nahavandi","doi":"10.1109/DICTA.2018.8615763","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615763","url":null,"abstract":"Monitoring the drivers behaviour and detecting their awareness are of vital importance for road safety. Drivers distraction and low awareness are already known to be the main reason for accidents in the world. Distraction-related crashes have greatly increased in recent years due to the proliferation of communication, entertainment, and malfunctioning of driver assistance systems. Accordingly, there is a need for advanced systems to monitor the drivers behaviour and generate a warning if a degradation in a drivers performance is detected. The purpose of this study is to analyse the vehicle and drivers data to detect the onset of distraction. Physiological measurements, such as palm electrodermal activity, heart rate, breathing rate, and perinasal perspiration are analysed and applied for the development of the monitoring system. The dataset used in this research has these measurements for 68 healthy participants (35 male, 33 female/17 elderly, 51 young). These participants completed two driving sessions in a driving simulator, including the normal and loaded drive. In the loaded scenario, drivers were texting back words. The lane deviation of vehicle was recorded as the response variable. Different classification algorithms such as generalised linear, support vector model, K-nearest neighbour and random forest machines are implemented to classify the driver's performance based on input features. Prediction results indicate that random forest performs the best by achieving an area under the curve (AUC) of over 91%. It is also found that biographic features are not informative enough to analyse drivers performance while perinasal perspiration carries the most information.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123474926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}