The task of synthesizing sketches from photographs has been pursued with image processing methods and supervised learning based approaches. The former lack flexibility and the latter require large quantities of ground-truth data which is hard to obtain because of the manual effort required. We present a convolutional neural network based framework for sketch generation that does not require ground-truth data for training and produces various styles of sketches. The method combines simple analytic loss functions that correspond to characteristics of the sketch. The network is trained on and evaluated for human face images. Several stylized variations of sketches are obtained by varying the parameters of the loss functions. The paper also discusses the implicit abstraction afforded by the deep convolutional network approach which results in high quality sketch output.
{"title":"Stylized Sketch Generation using Convolutional Networks","authors":"Mayur Hemani, Abhishek Sinha, Balaji Krishnamurthy","doi":"10.24132/csrn.2019.2901.1.5","DOIUrl":"https://doi.org/10.24132/csrn.2019.2901.1.5","url":null,"abstract":"The task of synthesizing sketches from photographs has been pursued with image processing methods and supervised learning based approaches. The former lack flexibility and the latter require large quantities of ground-truth data which is hard to obtain because of the manual effort required. We present a convolutional neural network based framework for sketch generation that does not require ground-truth data for training and produces various styles of sketches. The method combines simple analytic loss functions that correspond to characteristics of the sketch. The network is trained on and evaluated for human face images. Several stylized variations of sketches are obtained by varying the parameters of the loss functions. The paper also discusses the implicit abstraction afforded by the deep convolutional network approach which results in high quality sketch output.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115647003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.24132/csrn.2019.2902.2.7
Szidónia Lefkovits, László Lefkovits, L. Szilágyi
In this paper we present a dorsal hand vein recognition method based on convolutional neural networks (CNN). We implemented and compared two CNNs trained from end-to-end to the most important state-of-the-art deep learning architectures (AlexNet, VGG, ResNet and SqueezeNet). We applied the transfer learning and finetuning techniques for the purpose of dorsal hand vein-based identification. The experiments carried out studied the accuracy and training behaviour of these network architectures. The system was trained and evaluated on the best-known database in this field, the NCUT, which contains low resolution, low contrast images. Therefore, different pre-processing steps were required, leading us to investigate the influence of a series of image quality enhancement methods such as Gaussian smoothing, inhomogeneity correction, contrast limited adaptive histogram equalization, ordinal image encoding, and coarse vein segmentation based on geometricalconsiderations. The results show high recognition accuracy for almost every such CNN-based setup.
{"title":"CNN Approaches for Dorsal Hand Vein Based Identification","authors":"Szidónia Lefkovits, László Lefkovits, L. Szilágyi","doi":"10.24132/csrn.2019.2902.2.7","DOIUrl":"https://doi.org/10.24132/csrn.2019.2902.2.7","url":null,"abstract":"In this paper we present a dorsal hand vein recognition method based on convolutional neural networks (CNN). We implemented and compared two CNNs trained from end-to-end to the most important state-of-the-art deep learning architectures (AlexNet, VGG, ResNet and SqueezeNet). We applied the transfer learning and finetuning techniques for the purpose of dorsal hand vein-based identification. The experiments carried out studied the accuracy and training behaviour of these network architectures. The system was trained and evaluated on the best-known database in this field, the NCUT, which contains low resolution, low contrast images. Therefore, different pre-processing steps were required, leading us to investigate the influence of a series of image quality enhancement methods such as Gaussian smoothing, inhomogeneity correction, contrast limited adaptive histogram equalization, ordinal image encoding, and coarse vein segmentation based on geometricalconsiderations. The results show high recognition accuracy for almost every such CNN-based setup.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117033378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.24132/csrn.2019.2901.1.17
S. Sekmen, A. Akyüz
High dynamic range (HDR) imaging techniques allow photographers to capture the luminance distribution in the real-world as it is, freeing them from the limitations of capture and display devices. One common approach for creating HDR images is the multiple exposures technique (MET). This technique is preferred by many photographers as multiple exposures can be captured with off-the-shelf digital cameras and later combined into an HDR image. In this study, we propose a storage scheme in order to simplify the maintenance and usability of such sequences. In our scheme, multiple exposures are stored inside a single JPEG file with the main image representing a user-selected reference exposure. Other exposures are not directly stored, but rather their differences with each other and the reference is stored in a compressed manner in the metadata section of the same file. This allows a significant reduction in file size without impacting quality. If necessary the original exposures can be reconstructed from this single JPEG file, which in turn can be used in a standard HDR workflow.
{"title":"Compressed Exposure Sequences for HDR Imaging","authors":"S. Sekmen, A. Akyüz","doi":"10.24132/csrn.2019.2901.1.17","DOIUrl":"https://doi.org/10.24132/csrn.2019.2901.1.17","url":null,"abstract":"High dynamic range (HDR) imaging techniques allow photographers to capture the luminance distribution in the real-world as it is, freeing them from the limitations of capture and display devices. One common approach for creating HDR images is the multiple exposures technique (MET). This technique is preferred by many photographers as multiple exposures can be captured with off-the-shelf digital cameras and later combined into an HDR image. In this study, we propose a storage scheme in order to simplify the maintenance and usability of such sequences. In our scheme, multiple exposures are stored inside a single JPEG file with the main image representing a user-selected reference exposure. Other exposures are not directly stored, but rather their differences with each other and the reference is stored in a compressed manner in the metadata section of the same file. This allows a significant reduction in file size without impacting quality. If necessary the original exposures can be reconstructed from this single JPEG file, which in turn can be used in a standard HDR workflow.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124025486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.24132/csrn.2019.2902.2.9
Sebastian Blum, Gokhan Cetin, W. Stuerzlinger
In this work we investigated sensemaking activities on different immersive platforms. We observed user s during a classification task on a very large wall-display system (experiment I) and in a modern Virtual Reality headset (experiment II). In experiment II, we also evaluated a condition with a VR headset with an extended field of view, through a sparse peripheral display. We evaluated the results across the two studies by analyzing quantitative and qualitative data, such as task completion time, number of classifications, followed strategies, and shape of clusters. The results showed differences in user behaviors between the different immersive platforms, i.e., the very large display wall and the VR headset. Even though quantitative data showed no significant differences, qualitatively, users used additional strategies on the wall-display, which hints at a deeper level of sensemaking compared to a VR Headset. The qualitative and quantitative results of the comparison between VR Headsets do not indicate that users perform differently with a VR Headset with an extended field of view.
{"title":"Immersive Analytics Sensemaking on Different Platforms","authors":"Sebastian Blum, Gokhan Cetin, W. Stuerzlinger","doi":"10.24132/csrn.2019.2902.2.9","DOIUrl":"https://doi.org/10.24132/csrn.2019.2902.2.9","url":null,"abstract":"In this work we investigated sensemaking activities on different immersive platforms. We observed user s during a classification task on a very large wall-display system (experiment I) and in a modern Virtual Reality headset (experiment II). In experiment II, we also evaluated a condition with a VR headset with an extended field of view, through a sparse peripheral display. We evaluated the results across the two studies by analyzing quantitative and qualitative data, such as task completion time, number of classifications, followed strategies, and shape of clusters. The results showed differences in user behaviors between the different immersive platforms, i.e., the very large display wall and the VR headset. Even though quantitative data showed no significant differences, qualitatively, users used additional strategies on the wall-display, which hints at a deeper level of sensemaking compared to a VR Headset. The qualitative and quantitative results of the comparison between VR Headsets do not indicate that users perform differently with a VR Headset with an extended field of view.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122583538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.24132/csrn.2019.2902.2.5
R. Khemmar, M. Gouveia, B. Decoux, J. Ertaud
This work aims to show the new approaches in embedded vision dedicated to object detection and tracking for drone visual control. Object/Pedestrian detection has been carried out through two methods: 1. Classical image processing approach through improved Histogram Oriented Gradient (HOG) and Deformable Part Model (DPM) based detection and pattern recognition methods. In this step, we present our improved HOG/DPM approach allowing the detection of a target object in real time. The developed approach allows us not only to detect the object (pedestrian) but also to estimates the distance between the target and the drone. 2. Object/Pedestrian detection-based Deep Learning approach. The target position estimation has been carried out within image analysis. After this, the system sends instruction to the drone engine in order to correct its position and to track target. For this visual servoing, we have applied our improved HOG approach and implemented two kinds of PID controllers. The platform has been validated under different scenarios by comparing measured data to ground truth data given by the drone GPS. Several tests which were ca1rried out at ESIGELEC car park and Rouen city center validate the developed platform.
{"title":"Real Time Pedestrian and Object Detection and Tracking-based Deep Learning. Application to Drone Visual Tracking","authors":"R. Khemmar, M. Gouveia, B. Decoux, J. Ertaud","doi":"10.24132/csrn.2019.2902.2.5","DOIUrl":"https://doi.org/10.24132/csrn.2019.2902.2.5","url":null,"abstract":"This work aims to show the new approaches in embedded vision dedicated to object detection and tracking for drone visual control. Object/Pedestrian detection has been carried out through two methods: 1. Classical image processing approach through \u0000improved Histogram Oriented Gradient (HOG) and Deformable Part Model (DPM) based detection and pattern recognition methods. In this step, we present our improved HOG/DPM approach allowing the detection of a target object in real time. The developed \u0000approach allows us not only to detect the object (pedestrian) but also to estimates the distance between the target and the drone. 2. Object/Pedestrian detection-based Deep Learning approach. The target position estimation has been carried out within image \u0000analysis. After this, the system sends instruction to the drone engine in order to correct its position and to track target. For this visual servoing, we have applied our improved HOG approach and implemented two kinds of PID controllers. The platform has been \u0000validated under different scenarios by comparing measured data to ground truth data given by the drone GPS. Several tests which were ca1rried out at ESIGELEC car park and Rouen city center validate the developed platform.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125440009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.24132/csrn.2019.2902.2.2
Jannis Möller, Benjamin Meyer, M. Eisemann
Simultaneous Localization and Mapping aims to identify the current position of an agent and to map his surroundings at the same time. Visual inertial SLAM algorithms use input from visual and motion sensors for this task. Since modern smartphones are equipped with both needed sensors, using VI-SLAM applications becomes feasible, with Augmented Reality being one of the most promising application areas. Android, having the largest market share of all mobile operating systems, is of special interest as the target platform. For iOS there already exists a high-quality open source implementation for VI-SLAM: The framework VINS-Mobile. In this work we discuss what steps are necessary for porting it to the Android operating system. We provide a practical guide to the main challenge: The correct calibration of device specific parameters for any Android smartphone. We present our results using the Samsung Galaxy S7 and show further improvement possibilities.
{"title":"Porting A Visual Inertial SLAM Algorithm To Android Devices","authors":"Jannis Möller, Benjamin Meyer, M. Eisemann","doi":"10.24132/csrn.2019.2902.2.2","DOIUrl":"https://doi.org/10.24132/csrn.2019.2902.2.2","url":null,"abstract":"Simultaneous Localization and Mapping aims to identify the current position of an agent and to map his surroundings at the same time. Visual inertial SLAM algorithms use input from visual and motion sensors for this task. Since modern smartphones are equipped with both needed sensors, using VI-SLAM applications becomes feasible, with Augmented Reality being one of the most promising application areas. Android, having the largest market share of all mobile operating systems, is of special interest as the target platform. For iOS there already exists a high-quality open source implementation for VI-SLAM: The framework VINS-Mobile. In this work we discuss what steps are necessary for porting it to the Android operating system. We provide a practical guide to the main challenge: The correct calibration of device specific parameters for any Android smartphone. We present our results using the Samsung Galaxy S7 and show further improvement possibilities.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124816167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.24132/csrn.2019.2901.1.14
Maria Francesca Roig-Maimó, Ramon Mas-Sansó
We have developed an image-based head-tracker interface for mobile devices that uses the information of the front camera to detect and track the user’s nose position and translate its movements into a pointing metaphor to the device. However, as already noted in the literature, the measurement errors of the motion tracking leads to a noticeable jittering of the perceived motion. To counterbalance this unpleasant and unwanted behavior, we have applied a Kalman filter to smooth the obtained positions. In this paper we focus on the effect that the use of a Kalman filter can have on the throughput of the interface. Throughput is the human performance measure proposed by the ISO 9241-411 for evaluating the efficiency and effectiveness of non-keyboard input devices. The softness and precision improvements that the Kalman filter infers in the tracking of the cursor are subjectively evident. However, its effects on the ISO’s throughput have to be measured objectively to get an estimation of the benefits and drawbacks of applying a Kalman filter to a pointing device.
{"title":"Collateral effects of the Kalman Filter on the Throughput of a Head-Tracker for Mobile Devices","authors":"Maria Francesca Roig-Maimó, Ramon Mas-Sansó","doi":"10.24132/csrn.2019.2901.1.14","DOIUrl":"https://doi.org/10.24132/csrn.2019.2901.1.14","url":null,"abstract":"We have developed an image-based head-tracker interface for mobile devices that uses the information of the front camera to detect and track the user’s nose position and translate its movements into a pointing metaphor to the device. However, as already noted in the literature, the measurement errors of the motion tracking leads to a noticeable jittering of the perceived motion. To counterbalance this unpleasant and unwanted behavior, we have applied a Kalman filter to smooth the obtained positions. In this paper we focus on the effect that the use of a Kalman filter can have on the throughput of the interface. Throughput is the human performance measure proposed by the ISO 9241-411 for evaluating the efficiency and effectiveness of non-keyboard input devices. The softness and precision improvements that the Kalman filter infers in the tracking of the cursor are subjectively evident. However, its effects on the ISO’s throughput have to be measured objectively to get an estimation of the benefits and drawbacks of applying a Kalman filter to a pointing device.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124682336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.24132/csrn.2019.2902.2.8
Maycon Prado Rocha Silva, J. M. D. Martino
The face plays an important role both socially and culturally and has been extensively studied especially in investigations on perception. It is accepted that an attractive face tends to draw and keep the attention of the observer for a longer time. Drawing and keeping the attention is an important issue that can be beneficial in a variety of applications, including advertising, journalism, and education. In this article, we present a fully automated process to improve the attractiveness of faces in images and video. Our approach automatically identifies points of interest on the face and measures the distances between them, fusing the use of classifiers searches the database of reference face images deemed to be attractive to identify the pattern of points of interest more adequate to improve the attractiveness. The modified points of interest are projected in real-time onto a three-dimensional face mesh to support the consistent transformation of the face in a video sequence. In addition to the geometric transformation, texture is also automatically smoothed through a smoothing mask and weighted sum of textures. The process as a whole enables the improving of attractiveness not only in images but also in videos in real time.
{"title":"Improving facial attraction in videos","authors":"Maycon Prado Rocha Silva, J. M. D. Martino","doi":"10.24132/csrn.2019.2902.2.8","DOIUrl":"https://doi.org/10.24132/csrn.2019.2902.2.8","url":null,"abstract":"The face plays an important role both socially and culturally and has been extensively studied especially in investigations on perception. It is accepted that an attractive face tends to draw and keep the attention of the observer for a longer time. Drawing and keeping the attention is an important issue that can be beneficial in a variety of applications, including advertising, journalism, and education. In this article, we present a fully automated process to improve the attractiveness of faces in images and video. Our approach automatically identifies points of interest on the face and measures the distances between them, fusing the use of classifiers searches the database of reference face images deemed to be attractive to identify the pattern of points of interest more adequate to improve the attractiveness. The modified points of interest are projected in real-time onto a three-dimensional face mesh to support the consistent transformation of the face in a video sequence. In addition to the geometric transformation, texture is also automatically smoothed through a smoothing mask and weighted sum of textures. The process as a whole enables the improving of attractiveness not only in images but also in videos in real time.","PeriodicalId":322214,"journal":{"name":"Computer Science Research Notes","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125031083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}