Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608123
Li Yu, M. Hannuksela, T. Tillo, Chunyu Lin, M. Gabbouj
Depth data is of vital importance in the 3D video streaming, which allows the flexible rendering of views at arbitrary viewpoints. Given the importance of the depth data, the Multi-view Video plus Depth (MVD) format has been conventionally used. In the MVD format, the depth data is transmitted along with the texture data. In this work, we argue that the transmission of the depth data is not necessary in cases, when 1) the bandwidth is limited and 2) viewpoint switching is not frequent. We propose that the depth transmission could be replaced by a receiver- side unit that can estimate the depth from the received multi- view videos. This replacement does not only spare the bandwidth dedicated for the transmission of the depth, but also achieves a competitive rate-distortion performance with the MVD method.
{"title":"Is the transmission of depth data always necessary for 3D video streaming?","authors":"Li Yu, M. Hannuksela, T. Tillo, Chunyu Lin, M. Gabbouj","doi":"10.1109/IPTA.2018.8608123","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608123","url":null,"abstract":"Depth data is of vital importance in the 3D video streaming, which allows the flexible rendering of views at arbitrary viewpoints. Given the importance of the depth data, the Multi-view Video plus Depth (MVD) format has been conventionally used. In the MVD format, the depth data is transmitted along with the texture data. In this work, we argue that the transmission of the depth data is not necessary in cases, when 1) the bandwidth is limited and 2) viewpoint switching is not frequent. We propose that the depth transmission could be replaced by a receiver- side unit that can estimate the depth from the received multi- view videos. This replacement does not only spare the bandwidth dedicated for the transmission of the depth, but also achieves a competitive rate-distortion performance with the MVD method.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130679802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608160
Xiaoyue Jiang, Junna Du, B. Sun, Xiaoyi Feng
Material is actually one of the intrinsic features for objects, consequently material recognition plays an important role in image understanding. For the same material, it may have various shapes and appearances, but keeps the same physical characteristic, which brings great challenges for material recognition. Most recent material recognition methods are based on image patches, and cannot give accurate segmentation results for each specific material. In this paper, we propose a deep learning based method to do pixel level material segmentation for whole images directly. In classical convolutional network, the spacial size of features becomes smaller and smaller with the increasing of convolutional layers, which loses the details for pixel-wise segmentation. Therefore we propose to use dilated convolutional layers to keep the details of features. In addition, the dilated convolutional features are combined with traditional convolutional features to remove the artifacts that are brough by dilated convolution. In the experiments, the proposed dilated network showed its effectiveness on the popular MINC dataset and its extended version.
{"title":"Deep Dilated Convolutional Network for Material Recognition","authors":"Xiaoyue Jiang, Junna Du, B. Sun, Xiaoyi Feng","doi":"10.1109/IPTA.2018.8608160","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608160","url":null,"abstract":"Material is actually one of the intrinsic features for objects, consequently material recognition plays an important role in image understanding. For the same material, it may have various shapes and appearances, but keeps the same physical characteristic, which brings great challenges for material recognition. Most recent material recognition methods are based on image patches, and cannot give accurate segmentation results for each specific material. In this paper, we propose a deep learning based method to do pixel level material segmentation for whole images directly. In classical convolutional network, the spacial size of features becomes smaller and smaller with the increasing of convolutional layers, which loses the details for pixel-wise segmentation. Therefore we propose to use dilated convolutional layers to keep the details of features. In addition, the dilated convolutional features are combined with traditional convolutional features to remove the artifacts that are brough by dilated convolution. In the experiments, the proposed dilated network showed its effectiveness on the popular MINC dataset and its extended version.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134577342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608133
Emanuela Marasco, S. Cando, Larry L Tang, Luca Ghiani, G. Marcialis
Scientific literature lacks of countermeasures specifically for fingerprint presentation attacks (PAs) realized with non-cooperative methods; even though, in realistic scenarios, it is unlikely that individuals would agree to duplicate their fingerprints. For example, replicas can be created from finger marks left on a surface without the person’s knowledge. Existing anti-spoofing mechanisms are trained to detect presentation attacks realized with cooperation of the user and are assumed to be able to identify non-cooperative spoofs as well. In this regard, latent prints are perceived to be of low quality and less likely to succeed in gaining unauthorized access. Thus, they are expected to be blocked without the need of a particular presentation attack detection system. Currently, the lowest Presentation Attack Detection (PAD) error rates on spoofs from latent prints are achieved using frameworks involving Convolutional Neural Networks (CNNs) trained on cooperative PAs; however, the computational requirement of these networks does not make them easily portable for mobile applications. Therefore, the focus of this paper is to investigate the degree of success of spoofs made from latent fingerprints to improve the understanding of their vitality features. Furthermore, we experimentally show the performance drop of existing liveness detectors when dealing with non-cooperative attacks and analyze the quality estimates pertaining to such spoofs, which are commonly believed to be of lower quality compared to the molds fabricated with user’s consensus.
{"title":"A Look At Non-Cooperative Presentation Attacks in Fingerprint Systems","authors":"Emanuela Marasco, S. Cando, Larry L Tang, Luca Ghiani, G. Marcialis","doi":"10.1109/IPTA.2018.8608133","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608133","url":null,"abstract":"Scientific literature lacks of countermeasures specifically for fingerprint presentation attacks (PAs) realized with non-cooperative methods; even though, in realistic scenarios, it is unlikely that individuals would agree to duplicate their fingerprints. For example, replicas can be created from finger marks left on a surface without the person’s knowledge. Existing anti-spoofing mechanisms are trained to detect presentation attacks realized with cooperation of the user and are assumed to be able to identify non-cooperative spoofs as well. In this regard, latent prints are perceived to be of low quality and less likely to succeed in gaining unauthorized access. Thus, they are expected to be blocked without the need of a particular presentation attack detection system. Currently, the lowest Presentation Attack Detection (PAD) error rates on spoofs from latent prints are achieved using frameworks involving Convolutional Neural Networks (CNNs) trained on cooperative PAs; however, the computational requirement of these networks does not make them easily portable for mobile applications. Therefore, the focus of this paper is to investigate the degree of success of spoofs made from latent fingerprints to improve the understanding of their vitality features. Furthermore, we experimentally show the performance drop of existing liveness detectors when dealing with non-cooperative attacks and analyze the quality estimates pertaining to such spoofs, which are commonly believed to be of lower quality compared to the molds fabricated with user’s consensus.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117109466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608165
H. Abdulrahman, Baptiste Magnier
Usually, the most important structures in an image are extracted by an edge detector. Once extracted edges are binarized, they represent the shape boundary information of an object. For the edge-based localization/matching process, the differences between a reference edge map and a candidate image are quantified by computing a performance measure. This study investigates supervised contour measures for determining the degree to which an object shape differs from a desired position. Therefore, several distance measures are evaluated for different shape alterations: translation, rotation and scale change. Experiments on both synthetic and real images exhibit which measures are accurate enough for an object pose or matching estimation, useful for robot task as to refine the object pose.
{"title":"A Study of Measures for Contour-based Recognition and Localization of Known Objects in Digital Images","authors":"H. Abdulrahman, Baptiste Magnier","doi":"10.1109/IPTA.2018.8608165","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608165","url":null,"abstract":"Usually, the most important structures in an image are extracted by an edge detector. Once extracted edges are binarized, they represent the shape boundary information of an object. For the edge-based localization/matching process, the differences between a reference edge map and a candidate image are quantified by computing a performance measure. This study investigates supervised contour measures for determining the degree to which an object shape differs from a desired position. Therefore, several distance measures are evaluated for different shape alterations: translation, rotation and scale change. Experiments on both synthetic and real images exhibit which measures are accurate enough for an object pose or matching estimation, useful for robot task as to refine the object pose.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129573547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608145
Z. Hamici
This paper presents a novel image processing technique for recognizing finger signs language alphabet. A human-computer interaction system is built based on the recognition of sign language which constitutes an interface between the computer and hearing-impaired persons, or as an assistive technology in industrial robotics. The sign language recognition is articulated on the extraction of the contours of the sign language alphabets, therefore, converting image recognition into one dimensional signal processing, which improves the recognition efficiency and significantly reduces the processing time. The pre-processing of images is performed by a novel skin-color region segmentation defined inside the standard RGB (sRGB) color space, then a morphological filtering is used for non-skin residuals removal. Afterwards, a circular correlation achieves the identification of the sign language after extracting the sign closed contour vector and performing matching between extracted vector and target alphabets vectors. The closed contour vector is generated around the hand palm centroid with position optimized by a particle swarm optimization algorithm search. Finally, a multi-objective function is used for computing the recognition score. The results presented in this paper for skin color segmentation, centroid search and pattern recognition show high effectiveness of the novel artificial vision engine.
{"title":"Human-Computer Interaction using Finger Signing Recognition with Hand Palm Centroid PSO Search and Skin-Color Classification and Segmentation","authors":"Z. Hamici","doi":"10.1109/IPTA.2018.8608145","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608145","url":null,"abstract":"This paper presents a novel image processing technique for recognizing finger signs language alphabet. A human-computer interaction system is built based on the recognition of sign language which constitutes an interface between the computer and hearing-impaired persons, or as an assistive technology in industrial robotics. The sign language recognition is articulated on the extraction of the contours of the sign language alphabets, therefore, converting image recognition into one dimensional signal processing, which improves the recognition efficiency and significantly reduces the processing time. The pre-processing of images is performed by a novel skin-color region segmentation defined inside the standard RGB (sRGB) color space, then a morphological filtering is used for non-skin residuals removal. Afterwards, a circular correlation achieves the identification of the sign language after extracting the sign closed contour vector and performing matching between extracted vector and target alphabets vectors. The closed contour vector is generated around the hand palm centroid with position optimized by a particle swarm optimization algorithm search. Finally, a multi-objective function is used for computing the recognition score. The results presented in this paper for skin color segmentation, centroid search and pattern recognition show high effectiveness of the novel artificial vision engine.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134638499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608152
Bartosz Kopczynski, P. Strumiłło, Marcin Just, E. Niebudek-Bogusz
We describe a novel image segmentation technique for automated detection of objects being in periodic motion that generates acoustic waves. The method is based on measuring similarity of two independently collected but time synchronized data, i.e. the audio signals and image sequences. Such a technique enables automatic and optimized segmentation procedure of a sequence of images depicting an oscillating object. The proposed segmentation procedure has been validated on the problem of detecting edges of vibrating vocal folds. The similarity measure of the synchronously collected sequence of laryngoscopic images and the voice signal is achieved by applying time-frequency analysis. The developed segmentation technique and motion analysis method can be applied for early detection of oscillation anomalies of the vocal folds which may cause hoarse voice, also known as dysphonia. In particular, the image segmentation result can aid the phoniatrist in the analysis of the vocal folds phonation process and help in early detection of voice anomalies.
{"title":"Acoustic Based Method for Automatic Segmentation of Images of Objects in Periodic Motion: detection of vocal folds edges case study","authors":"Bartosz Kopczynski, P. Strumiłło, Marcin Just, E. Niebudek-Bogusz","doi":"10.1109/IPTA.2018.8608152","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608152","url":null,"abstract":"We describe a novel image segmentation technique for automated detection of objects being in periodic motion that generates acoustic waves. The method is based on measuring similarity of two independently collected but time synchronized data, i.e. the audio signals and image sequences. Such a technique enables automatic and optimized segmentation procedure of a sequence of images depicting an oscillating object. The proposed segmentation procedure has been validated on the problem of detecting edges of vibrating vocal folds. The similarity measure of the synchronously collected sequence of laryngoscopic images and the voice signal is achieved by applying time-frequency analysis. The developed segmentation technique and motion analysis method can be applied for early detection of oscillation anomalies of the vocal folds which may cause hoarse voice, also known as dysphonia. In particular, the image segmentation result can aid the phoniatrist in the analysis of the vocal folds phonation process and help in early detection of voice anomalies.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132377393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608121
Asad Ullah, Hongmei Xie, M. Farooq, Zhaoyun Sun
Compared to visible spectrum image the infrared image is much clearer in poor lighting conditions. Infrared imaging devices are capable to operate even without the availability of visible light, acquires clear images of objects which are helpful in efficient classification and detection. For image object classification and detection, CNN which belongs to the class of feed-forward ANN, has been successfully used. Fast RCNN combines advantages of modern CNN detectors i.e. RCNN and SPPnet to classify object proposals more efficiently, resulting in better and faster detection. To further improve the detection rate and speed of Fast RCNN, two modifications are proposed in this paper. One for accuracy in which an extra convolutional layer is added to the network and named it as Fast RCNN type 2, the other for speed in which the input channel is reduced from three channel input to one and named as Fast RCNN type 3.Fast RCNN type 1 has better detection rate than RCNN and compare to Fast RCNN, Fast RCNN type 2 has better detection rate while Fast RCNN type 3 is faster.
与可见光谱图像相比,在较差的光照条件下,红外图像清晰得多。红外成像设备能够在没有可见光的情况下工作,获得清晰的物体图像,有助于有效的分类和检测。对于图像对象的分类和检测,已经成功地使用了前馈神经网络中的CNN。Fast RCNN结合了现代CNN检测器(RCNN和SPPnet)的优点,更有效地对目标提案进行分类,从而实现更好更快的检测。为了进一步提高Fast RCNN的检测率和速度,本文提出了两个改进方案。一种是为了提高准确性,在网络中增加一个额外的卷积层,并将其命名为Fast RCNN type 2;另一种是为了提高速度,将输入通道从三个通道减少到一个通道,并将其命名为Fast RCNN type 3。Fast RCNN type 1的检出率优于RCNN, Fast RCNN type 2的检出率优于Fast RCNN type 3的检出率。
{"title":"Pedestrian Detection in Infrared Images Using Fast RCNN","authors":"Asad Ullah, Hongmei Xie, M. Farooq, Zhaoyun Sun","doi":"10.1109/IPTA.2018.8608121","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608121","url":null,"abstract":"Compared to visible spectrum image the infrared image is much clearer in poor lighting conditions. Infrared imaging devices are capable to operate even without the availability of visible light, acquires clear images of objects which are helpful in efficient classification and detection. For image object classification and detection, CNN which belongs to the class of feed-forward ANN, has been successfully used. Fast RCNN combines advantages of modern CNN detectors i.e. RCNN and SPPnet to classify object proposals more efficiently, resulting in better and faster detection. To further improve the detection rate and speed of Fast RCNN, two modifications are proposed in this paper. One for accuracy in which an extra convolutional layer is added to the network and named it as Fast RCNN type 2, the other for speed in which the input channel is reduced from three channel input to one and named as Fast RCNN type 3.Fast RCNN type 1 has better detection rate than RCNN and compare to Fast RCNN, Fast RCNN type 2 has better detection rate while Fast RCNN type 3 is faster.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125510226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608142
Peng Shi, Jun Wu, Kai Wang, Yao Zhang, Jiapei Wang, Juneho Yi
We present an effective low-resolution pedestrian detection using targeted pooling and Region Proposal Network (RPN) in the Faster R-CNN. Our method firstly rearranges the anchor from the RPN exploiting an optimal hyper-parameter setting called "Elaborate Setup". Secondly, it refines the granularity in the pooling operation from the ROI pooling layer. The experimental results demonstrate that the proposed RPN together with fine-grained pooling, which we call LRPD-R-CNN is able to achieve high average precision and robust performance on the VOC 2007 dataset. This method has great potential in commercial values and wide application prospect in the field of computer vision, security and intelligent city.
{"title":"Research on Low-Resolution Pedestrian Detection Algorithms based on R-CNN with Targeted Pooling and Proposal","authors":"Peng Shi, Jun Wu, Kai Wang, Yao Zhang, Jiapei Wang, Juneho Yi","doi":"10.1109/IPTA.2018.8608142","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608142","url":null,"abstract":"We present an effective low-resolution pedestrian detection using targeted pooling and Region Proposal Network (RPN) in the Faster R-CNN. Our method firstly rearranges the anchor from the RPN exploiting an optimal hyper-parameter setting called \"Elaborate Setup\". Secondly, it refines the granularity in the pooling operation from the ROI pooling layer. The experimental results demonstrate that the proposed RPN together with fine-grained pooling, which we call LRPD-R-CNN is able to achieve high average precision and robust performance on the VOC 2007 dataset. This method has great potential in commercial values and wide application prospect in the field of computer vision, security and intelligent city.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116067858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608120
Zhao Zhongyang, Cheng Yinglei, Shi Xiaosong, Qin Xianxiang, Sun Li
Aiming at classifying the feature of LiDAR point cloud data in complex scenario, this paper proposed a deep neural network model based on multi-scale features and PointNet. The method improves the local feature of PointNet and realize automatic classification of LiDAR point cloud under the complex scene. Firstly, this paper adds multi-scale network on the basis of PointNet network to extract the local features of points. And then these local features of different scales are composed into a multi-dimensional feature through the fully connected layer, and combined with the global features extracted by PointNet, the scores of each point class are returned to complete the point cloud classification. The deep neural network model proposed in this paper is verified using the Semantic3D dataset and the Vaihingen dataset provided by ISPRS. The experimental results show that the proposed algorithm achieves higher classification accuracy compared with other neural networks used for point cloud classification.
{"title":"Classification of LiDAR Point Cloud based on Multiscale Features and PointNet","authors":"Zhao Zhongyang, Cheng Yinglei, Shi Xiaosong, Qin Xianxiang, Sun Li","doi":"10.1109/IPTA.2018.8608120","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608120","url":null,"abstract":"Aiming at classifying the feature of LiDAR point cloud data in complex scenario, this paper proposed a deep neural network model based on multi-scale features and PointNet. The method improves the local feature of PointNet and realize automatic classification of LiDAR point cloud under the complex scene. Firstly, this paper adds multi-scale network on the basis of PointNet network to extract the local features of points. And then these local features of different scales are composed into a multi-dimensional feature through the fully connected layer, and combined with the global features extracted by PointNet, the scores of each point class are returned to complete the point cloud classification. The deep neural network model proposed in this paper is verified using the Semantic3D dataset and the Vaihingen dataset provided by ISPRS. The experimental results show that the proposed algorithm achieves higher classification accuracy compared with other neural networks used for point cloud classification.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126438592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/IPTA.2018.8608130
F. Dornaika, J. Reta, Ignacio Arganda-Carreras, A. Moujahid
Extracting effective features of fatigue in images and videos is an open problem. This paper introduces a face image descriptor that can be used for discriminating driver fatigue in static frames. In this method, first, each facial image in the sequence is represented by a pyramid whose levels are divided into non-overlapping blocks of the same size, and hybrid image descriptor are employed to extract features in all blocks. Then the obtained descriptor is filtered out using feature selection. Finally, non-linear SVM is applied to predict the drowsiness state of the subject in the image. The proposed method was tested on the public dataset NTH Drowsy Driver Detection (NTHUDDD). This dataset includes a wide range of human subjects of different genders, poses, and illuminations in real-life fatigue conditions. Experimental results show the effectiveness of the proposed method. These results show that the proposed hand-crafted feature compare favorably with several approaches based on the use of deep Convolutional Neural Nets.
{"title":"Driver Drowsiness Detection in Facial Images","authors":"F. Dornaika, J. Reta, Ignacio Arganda-Carreras, A. Moujahid","doi":"10.1109/IPTA.2018.8608130","DOIUrl":"https://doi.org/10.1109/IPTA.2018.8608130","url":null,"abstract":"Extracting effective features of fatigue in images and videos is an open problem. This paper introduces a face image descriptor that can be used for discriminating driver fatigue in static frames. In this method, first, each facial image in the sequence is represented by a pyramid whose levels are divided into non-overlapping blocks of the same size, and hybrid image descriptor are employed to extract features in all blocks. Then the obtained descriptor is filtered out using feature selection. Finally, non-linear SVM is applied to predict the drowsiness state of the subject in the image. The proposed method was tested on the public dataset NTH Drowsy Driver Detection (NTHUDDD). This dataset includes a wide range of human subjects of different genders, poses, and illuminations in real-life fatigue conditions. Experimental results show the effectiveness of the proposed method. These results show that the proposed hand-crafted feature compare favorably with several approaches based on the use of deep Convolutional Neural Nets.","PeriodicalId":272294,"journal":{"name":"2018 Eighth International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114182293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}