Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707374
S. Maxwell, M. Kilcher, Alexander Benasutti, Brandon Siebert, Warren Seto, Olivia Shanley, Larry Pearlstein
Prior to the advent of ITU-R Recommendation BT.709 the overwhelming majority of compressed digital video and imagery used the colorspace conversion matrix specified in ITU-R Recommendation BT.601. The introduction of high-definition video formats led to the adoption of Rec. BT.709 for use in colorspace conversion by new systems, and this resulted in confusion in the industry. Specifically, video decoders may not be able to determine the correct matrix to use for converting from the luma/chroma representation used for coding, to the Red-Green-Blue representation needed for display. This confusion has led to a situation where some viewers of decompressed video streams experience subtle, but noticeable, errors in coloration. We have successfully developed and trained a deep convolutional neural network to address this heretofore unsolved problem. We obtained outstanding accuracy on ImageNet data, and on YouTube video frames, and our work can be expected to lead to more accurate color rendering delivered to users of digital imaging and video systems.
{"title":"Automated Detection of Colorspace Via Convolutional Neural Network","authors":"S. Maxwell, M. Kilcher, Alexander Benasutti, Brandon Siebert, Warren Seto, Olivia Shanley, Larry Pearlstein","doi":"10.1109/AIPR.2018.8707374","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707374","url":null,"abstract":"Prior to the advent of ITU-R Recommendation BT.709 the overwhelming majority of compressed digital video and imagery used the colorspace conversion matrix specified in ITU-R Recommendation BT.601. The introduction of high-definition video formats led to the adoption of Rec. BT.709 for use in colorspace conversion by new systems, and this resulted in confusion in the industry. Specifically, video decoders may not be able to determine the correct matrix to use for converting from the luma/chroma representation used for coding, to the Red-Green-Blue representation needed for display. This confusion has led to a situation where some viewers of decompressed video streams experience subtle, but noticeable, errors in coloration. We have successfully developed and trained a deep convolutional neural network to address this heretofore unsolved problem. We obtained outstanding accuracy on ImageNet data, and on YouTube video frames, and our work can be expected to lead to more accurate color rendering delivered to users of digital imaging and video systems.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132958460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707382
Neha Mittal, M. Hanmandlu, S. Vasikarla
Owing to the ever increasing demand of privacy protection and security concerns, hand geometry based biometric system aimed at addressing these concerns is developed. In this study we have developed an authentication system which is based on finger shapes. The shapes of fingers are considered as patterns for extracting features using Eigen vector method and discrete wavelet transform. There are four feature types that include (i) Frequency and power content using Eigen vector method, (ii) Pisaranko’s method (iii) Wavelet entropy of individual fingers, and (iv) Specific area of finger.
{"title":"An Authentication System based on Hybrid Fusion of Finger-Shapes & Geometry","authors":"Neha Mittal, M. Hanmandlu, S. Vasikarla","doi":"10.1109/AIPR.2018.8707382","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707382","url":null,"abstract":"Owing to the ever increasing demand of privacy protection and security concerns, hand geometry based biometric system aimed at addressing these concerns is developed. In this study we have developed an authentication system which is based on finger shapes. The shapes of fingers are considered as patterns for extracting features using Eigen vector method and discrete wavelet transform. There are four feature types that include (i) Frequency and power content using Eigen vector method, (ii) Pisaranko’s method (iii) Wavelet entropy of individual fingers, and (iv) Specific area of finger.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134174726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707377
Noor M. Al-Shakarji, F. Bunyak, G. Seetharaman, K. Palaniappan
Multi-object tracking implemented on airborne wide area motion imagery (WAMI) is still challenging problem in computer vision applications. Extremely camera motion, low frame rate, rapid appearance changes, and occlusion by different objects are the most challenges. Data association, link detected object in the current frame with the existing tracked objects, is the most challenging part for multi-object tracking algorithms. The ambiguity of data association increases in WAMI datasets because objects in the scenes suffer form the lack of rich feature descriptions beside the closeness to each other, and inaccurate object movement displacement. In this paper, detection-based multi-object tracking system that uses a two-step data association scheme to ensure high tracking accuracy and continuity. The first step ensures having reliable short-term tracklets using only spatial information. The second step links tracklets globally and reduces matching hypotheses using discriminative features and tracklets history. Our proposed tracker tested on wide area imagery ABQ dataset [1]. MOTChallage [2] evaluation metrics have been used to evaluate the performance compared to some multi-object-tracking baselines for IWTS42018 [3] and VisDrone2018 [4] challenges. Our tracker shows promising results compared to those trackers.
{"title":"Robust Multi-object Tracking for Wide Area Motion Imagery","authors":"Noor M. Al-Shakarji, F. Bunyak, G. Seetharaman, K. Palaniappan","doi":"10.1109/AIPR.2018.8707377","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707377","url":null,"abstract":"Multi-object tracking implemented on airborne wide area motion imagery (WAMI) is still challenging problem in computer vision applications. Extremely camera motion, low frame rate, rapid appearance changes, and occlusion by different objects are the most challenges. Data association, link detected object in the current frame with the existing tracked objects, is the most challenging part for multi-object tracking algorithms. The ambiguity of data association increases in WAMI datasets because objects in the scenes suffer form the lack of rich feature descriptions beside the closeness to each other, and inaccurate object movement displacement. In this paper, detection-based multi-object tracking system that uses a two-step data association scheme to ensure high tracking accuracy and continuity. The first step ensures having reliable short-term tracklets using only spatial information. The second step links tracklets globally and reduces matching hypotheses using discriminative features and tracklets history. Our proposed tracker tested on wide area imagery ABQ dataset [1]. MOTChallage [2] evaluation metrics have been used to evaluate the performance compared to some multi-object-tracking baselines for IWTS42018 [3] and VisDrone2018 [4] challenges. Our tracker shows promising results compared to those trackers.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128075074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707381
Marissa Dotter, C. Ward
Deep learning models have made great strides in tasks like classification and object detection. However, these models are often computationally intensive, require vast amounts of data in the domain, and typically contain millions or even billions of parameters. They are also relative black-boxes when it comes to being able to interpret and analyze their functionality on data or evaluating the suitability of the network for the data that is available. To address these issues, we investigate compression techniques available off-the-shelf that aid in reducing the dimensionality of the parameter space within a Convolutional Neural Network. In this way, compression will allow us to interpret and evaluate the network more efficiently as only important features will be propagated throughout the network.
{"title":"Visualizing Compression of Deep Learning Models for Classification","authors":"Marissa Dotter, C. Ward","doi":"10.1109/AIPR.2018.8707381","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707381","url":null,"abstract":"Deep learning models have made great strides in tasks like classification and object detection. However, these models are often computationally intensive, require vast amounts of data in the domain, and typically contain millions or even billions of parameters. They are also relative black-boxes when it comes to being able to interpret and analyze their functionality on data or evaluating the suitability of the network for the data that is available. To address these issues, we investigate compression techniques available off-the-shelf that aid in reducing the dimensionality of the parameter space within a Convolutional Neural Network. In this way, compression will allow us to interpret and evaluate the network more efficiently as only important features will be propagated throughout the network.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114909879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/AIPR.2018.8707379
Shuyue Guan, Nada Kamona, M. Loew
Breast cancer is the second leading cause of death for women in the U.S. Early detection of breast cancer has been shown to be the key to higher survival rates for breast cancer patients. We are investigating infrared thermography as a noninvasive adjunctive to mammography for breast screening. Thermal imaging is safe, radiation-free, pain-free, and non-contact. Segmentation of breast area from the acquired thermal images will help limit the area for tumor search and reduce the time and effort needed for manual hand segmentation. Autoencoder-like convolutional and deconvolutional neural networks (C-DCNN) are promising computational approaches to automatically segment breast areas in thermal images. In this study, we apply the C-DCNN to segment breast areas from our thermal breast images database, which we are collecting in our clinical trials by imaging breast cancer patients with our infrared camera (N2 Imager). For training the C-DCNN, the inputs are 132 gray-value thermal images and the corresponding manually-cropped breast area images (binary masks to designate the breast areas). For testing, we input thermal images to the trained C-DCNN and the output after post-processing are the binary breast-area images. Cross-validation and comparison with the ground-truth images show that the C-DCNN is a promising method to segment breast areas. The results demonstrate the capability of C-DCNN to learn essential features of breast regions and delineate them in thermal images.
{"title":"Segmentation of Thermal Breast Images Using Convolutional and Deconvolutional Neural Networks","authors":"Shuyue Guan, Nada Kamona, M. Loew","doi":"10.1109/AIPR.2018.8707379","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707379","url":null,"abstract":"Breast cancer is the second leading cause of death for women in the U.S. Early detection of breast cancer has been shown to be the key to higher survival rates for breast cancer patients. We are investigating infrared thermography as a noninvasive adjunctive to mammography for breast screening. Thermal imaging is safe, radiation-free, pain-free, and non-contact. Segmentation of breast area from the acquired thermal images will help limit the area for tumor search and reduce the time and effort needed for manual hand segmentation. Autoencoder-like convolutional and deconvolutional neural networks (C-DCNN) are promising computational approaches to automatically segment breast areas in thermal images. In this study, we apply the C-DCNN to segment breast areas from our thermal breast images database, which we are collecting in our clinical trials by imaging breast cancer patients with our infrared camera (N2 Imager). For training the C-DCNN, the inputs are 132 gray-value thermal images and the corresponding manually-cropped breast area images (binary masks to designate the breast areas). For testing, we input thermal images to the trained C-DCNN and the output after post-processing are the binary breast-area images. Cross-validation and comparison with the ground-truth images show that the C-DCNN is a promising method to segment breast areas. The results demonstrate the capability of C-DCNN to learn essential features of breast regions and delineate them in thermal images.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133206014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-02DOI: 10.1109/AIPR.2018.8707386
I. F. Ghalyan, V. Kapila
This paper considers the problem of modeling the surrounding environment of a driven car by using the images captured by a dash cam during the driving process. Inspired from a human driver’s interpretation of the car’s surrounding environment, an abstract representation of the environment is developed that can facilitate in decision-making to prevent the car’s collisions with surrounding objects. The proposed technique for modeling the car’s surrounding environment utilizes the dash cam to capture images as the car is driven facing multiple situations and obstacles. By relying on the human driver’s interpretation of various driving scenarios, the images of the car’s surrounding environment are manually grouped into classes that reflect the driver’s abstract knowledge. Grouping the images allows the formulation of knowledge transfer process from the human driver to an autonomous vehicle as a classification problem, producing a meaningful and efficient representation of models arising from real-world scenarios. The framework of convolutional neural networks (CNN) is employed to model the surrounding environment of the driven car, encapsulating the abstract knowledge of the human driver. The proposed modeling approach is applied to determine its efficacy in two experimental scenarios. In the first experiment, a highway driving scenario is considered with three classes. Alternatively, in the second experiment, a scenario of driving in a residential area is addressed with six classes. Excellent modeling performance is reported for both experiments. Comparisons conducted with alternative image classification techniques reveal the superiority of the CNN for modeling the considered driving scenarios.
{"title":"Acquiring Abstract Visual Knowledge of the Real-World Environment for Autonomous Vehicles","authors":"I. F. Ghalyan, V. Kapila","doi":"10.1109/AIPR.2018.8707386","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707386","url":null,"abstract":"This paper considers the problem of modeling the surrounding environment of a driven car by using the images captured by a dash cam during the driving process. Inspired from a human driver’s interpretation of the car’s surrounding environment, an abstract representation of the environment is developed that can facilitate in decision-making to prevent the car’s collisions with surrounding objects. The proposed technique for modeling the car’s surrounding environment utilizes the dash cam to capture images as the car is driven facing multiple situations and obstacles. By relying on the human driver’s interpretation of various driving scenarios, the images of the car’s surrounding environment are manually grouped into classes that reflect the driver’s abstract knowledge. Grouping the images allows the formulation of knowledge transfer process from the human driver to an autonomous vehicle as a classification problem, producing a meaningful and efficient representation of models arising from real-world scenarios. The framework of convolutional neural networks (CNN) is employed to model the surrounding environment of the driven car, encapsulating the abstract knowledge of the human driver. The proposed modeling approach is applied to determine its efficacy in two experimental scenarios. In the first experiment, a highway driving scenario is considered with three classes. Alternatively, in the second experiment, a scenario of driving in a residential area is addressed with six classes. Excellent modeling performance is reported for both experiments. Comparisons conducted with alternative image classification techniques reveal the superiority of the CNN for modeling the considered driving scenarios.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122290969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-02DOI: 10.1109/AIPR.2018.8707380
I. F. Ghalyan, Aneesh Jaydeep, V. Kapila
In many practical situations, robots may encounter objects moving in their work space, resulting in undesirable consequences for either the robots or the moving objects. Such situations often call for sensing arrangements that can produce planar images along with depth measurements, e.g., Kinect sensors, to estimate the position of the moving object in 3-D space. In this paper, we aim to estimate the relative distance of a moving object along the axis orthogonal to a camera lens plane, thus relaxing the need to rely on depth measurements that are often noisy when the object is too close to the sensor. Specifically, multiple images of an object, with distinct orthogonal distances, are firstly captured. In this step, the object’s distance from the camera is measured and the normalized area, which is the normalized sum of pixels, of the object is computed. Both computed normalized area and measured distance are filtered using a Gaussian smoothing filter (GSF). Next, a Bayesian statistical model is developed to map the computed normalized area with the measured distance. The developed Bayesian linear model allows to predict the distance between the camera sensor (or robot) and the object given the normalized computed area, obtained from the 2-D images, of the object. To evaluate the performance of the relative distance estimation process, a test stand was built that consists of a robot equipped with a camera. During the learning process of the statistical model, an ultrasonic sensor was used for measuring the distance corresponding to the captured images. After learning the model, the ultrasonic sensor was removed and excellent performance was achieved when using the developed model in estimating the distance of an object, a human hand carrying a measurement tape, moving back and forth along the axis normal to the camera plane.
{"title":"Learning Robot-Object Distance Using Bayesian Regression with Application to A Collision Avoidance Scenario","authors":"I. F. Ghalyan, Aneesh Jaydeep, V. Kapila","doi":"10.1109/AIPR.2018.8707380","DOIUrl":"https://doi.org/10.1109/AIPR.2018.8707380","url":null,"abstract":"In many practical situations, robots may encounter objects moving in their work space, resulting in undesirable consequences for either the robots or the moving objects. Such situations often call for sensing arrangements that can produce planar images along with depth measurements, e.g., Kinect sensors, to estimate the position of the moving object in 3-D space. In this paper, we aim to estimate the relative distance of a moving object along the axis orthogonal to a camera lens plane, thus relaxing the need to rely on depth measurements that are often noisy when the object is too close to the sensor. Specifically, multiple images of an object, with distinct orthogonal distances, are firstly captured. In this step, the object’s distance from the camera is measured and the normalized area, which is the normalized sum of pixels, of the object is computed. Both computed normalized area and measured distance are filtered using a Gaussian smoothing filter (GSF). Next, a Bayesian statistical model is developed to map the computed normalized area with the measured distance. The developed Bayesian linear model allows to predict the distance between the camera sensor (or robot) and the object given the normalized computed area, obtained from the 2-D images, of the object. To evaluate the performance of the relative distance estimation process, a test stand was built that consists of a robot equipped with a camera. During the learning process of the statistical model, an ultrasonic sensor was used for measuring the distance corresponding to the captured images. After learning the model, the ultrasonic sensor was removed and excellent performance was achieved when using the developed model in estimating the distance of an object, a human hand carrying a measurement tape, moving back and forth along the axis normal to the camera plane.","PeriodicalId":230582,"journal":{"name":"2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131584509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}