Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466652
L. K. Phadtare, R. Kushalnagar, N. Cahill
Automatic gesture recognition, specifically for the purpose of understanding sign language, can be an important aid in communicating with the deaf and hard-of-hearing. Recognition of sign languages requires understanding of various linguistic components such as palm orientation, hand shape, hand location and facial expression. We propose a method and system to estimate the palm orientation and the hand shape of a signer. Our system uses Microsoft Kinect to capture color and the depth images of a signer. It analyzes the depth data corresponding to the hand point region and fits plane to this data and defines the normal to this plane as the orientation of the palm. Then it uses 3-D shape context to determine the hand shape by comparing it to example shapes in the database. Palm orientation of the hand was found to be correct in varying poses. The shape context method for hand shape classification was found to identify 20 test hand shapes correctly and 10 shapes were matched to other but very similar shapes.
{"title":"Detecting hand-palm orientation and hand shapes for sign language gesture recognition using 3D images","authors":"L. K. Phadtare, R. Kushalnagar, N. Cahill","doi":"10.1109/WNYIPW.2012.6466652","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466652","url":null,"abstract":"Automatic gesture recognition, specifically for the purpose of understanding sign language, can be an important aid in communicating with the deaf and hard-of-hearing. Recognition of sign languages requires understanding of various linguistic components such as palm orientation, hand shape, hand location and facial expression. We propose a method and system to estimate the palm orientation and the hand shape of a signer. Our system uses Microsoft Kinect to capture color and the depth images of a signer. It analyzes the depth data corresponding to the hand point region and fits plane to this data and defines the normal to this plane as the orientation of the palm. Then it uses 3-D shape context to determine the hand shape by comparing it to example shapes in the database. Palm orientation of the hand was found to be correct in varying poses. The shape context method for hand shape classification was found to identify 20 test hand shapes correctly and 10 shapes were matched to other but very similar shapes.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117139377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466650
E. Welch, D. Patru, E. Saber, K. Bengtson
Most image processing algorithms are parallelizable, i.e. the calculation of one pixel does not affect another one. SIMD architectures, including Intel's WMMX and SSE and ARM's NEON, can exploit this fact by processing multiple pixels at a time, which can result in significant speedups. This study investigates the use of NEON SIMD instructions for two image processing algorithms. The latter are altered to process four pixels at a time, for which a theoretical speedup factor of four can be achieved. In addition, parts of the original implementation have been replaced with inline functions or modified at assembly code level. Experimental benchmark data shows the actual execution speed to be between two to three times higher than the original reference. These results prove that SIMD instructions can significantly speedup image processing algorithms through proper code manipulations.
{"title":"A study of the use of SIMD instructions for two image processing algorithms","authors":"E. Welch, D. Patru, E. Saber, K. Bengtson","doi":"10.1109/WNYIPW.2012.6466650","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466650","url":null,"abstract":"Most image processing algorithms are parallelizable, i.e. the calculation of one pixel does not affect another one. SIMD architectures, including Intel's WMMX and SSE and ARM's NEON, can exploit this fact by processing multiple pixels at a time, which can result in significant speedups. This study investigates the use of NEON SIMD instructions for two image processing algorithms. The latter are altered to process four pixels at a time, for which a theoretical speedup factor of four can be achieved. In addition, parts of the original implementation have been replaced with inline functions or modified at assembly code level. Experimental benchmark data shows the actual execution speed to be between two to three times higher than the original reference. These results prove that SIMD instructions can significantly speedup image processing algorithms through proper code manipulations.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131338106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466649
Lam Tran, C. Navasca, Jiebo Luo
In this paper, we purpose a method for anomaly detection in surveillance video in a tensor framework. We treat a video as a tensor and utilize a stable PCA to decompose it into two tensors, the first tensor is a low rank tensor that consists of background pixels and the second tensor is a sparse tensor that consists of the foreground pixels. The sparse tensor is then analyzed to detect anomaly. The proposed method is a one-shot framework to determine frames that are anomalous in a video.
{"title":"Video detection anomaly via low-rank and sparse decompositions","authors":"Lam Tran, C. Navasca, Jiebo Luo","doi":"10.1109/WNYIPW.2012.6466649","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466649","url":null,"abstract":"In this paper, we purpose a method for anomaly detection in surveillance video in a tensor framework. We treat a video as a tensor and utilize a stable PCA to decompose it into two tensors, the first tensor is a low rank tensor that consists of background pixels and the second tensor is a sparse tensor that consists of the foreground pixels. The sparse tensor is then analyzed to detect anomaly. The proposed method is a one-shot framework to determine frames that are anomalous in a video.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124248192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466645
A. K. A. Hong, J. Pelz, J. Cockburn
Commercial mobile eye tracking systems are readily available, but are costly and complex. They have an additional disadvantage in that the eye cameras are placed directly in the field of view of the subject in order to obtain a clear frontal view of the eye. We propose a lightweight, low-cost, side-mounted mobile eye tracking system that uses side-view eye images to estimate the gaze of the subject. Cameras are mounted on the side of the head using curved mirrors to split the captured frames into scene and eye images. A hybrid algorithm using both feature-based models and appearance-based models is designed to accommodate this novel system. Image sequences, consisting of 4339 frames from seven subjects are analyzed by the algorithm, resulting in a successful gaze estimation rate of 95.7%.
{"title":"Lightweight, low-cost, side-mounted mobile eye tracking system","authors":"A. K. A. Hong, J. Pelz, J. Cockburn","doi":"10.1109/WNYIPW.2012.6466645","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466645","url":null,"abstract":"Commercial mobile eye tracking systems are readily available, but are costly and complex. They have an additional disadvantage in that the eye cameras are placed directly in the field of view of the subject in order to obtain a clear frontal view of the eye. We propose a lightweight, low-cost, side-mounted mobile eye tracking system that uses side-view eye images to estimate the gaze of the subject. Cameras are mounted on the side of the head using curved mirrors to split the captured frames into scene and eye images. A hybrid algorithm using both feature-based models and appearance-based models is designed to accommodate this novel system. Image sequences, consisting of 4339 frames from seven subjects are analyzed by the algorithm, resulting in a successful gaze estimation rate of 95.7%.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116804628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466646
S. Azary, A. Savakis
Multi-view action classification is an important component of real world applications such as automatic surveillance and sports analysis. Motion History Images capture the location and direction of motion in a scene and sparse representations provide a compact representation of high dimensional signals. In this paper, we propose a multi-view action classification algorithm based on sparse representation of spatio-temporal action representations using motion history images. We find that this approach is effective at multi-view action classification and experiments with the i3DPost Multi-view Dataset achieve high classification rates.
{"title":"Multi-view action classification using sparse representations on Motion History Images","authors":"S. Azary, A. Savakis","doi":"10.1109/WNYIPW.2012.6466646","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466646","url":null,"abstract":"Multi-view action classification is an important component of real world applications such as automatic surveillance and sports analysis. Motion History Images capture the location and direction of motion in a scene and sparse representations provide a compact representation of high dimensional signals. In this paper, we propose a multi-view action classification algorithm based on sparse representation of spatio-temporal action representations using motion history images. We find that this approach is effective at multi-view action classification and experiments with the i3DPost Multi-view Dataset achieve high classification rates.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131105302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466644
Y. Artan, A. Burry, V. Kozitsky, P. Paul
Face detection using local successive mean quantization transform (SMQT) features and the sparse network of winnows (SNoW) classifier has received interest in the computer vision community due to its success under varying illumination conditions. Recent work has also demonstrated the effectiveness of this classification technique for character recognition tasks. However, heavy storage requirements of the SNoW classifier necessitate the development of efficient techniques to reduce storage and computational requirements. This study shows that the SNoW classifier built with only a limited number of distinguishing SMQT features provides comparable performance to the original dense snow classifier. Initial results using the well-known CMU-MIT facial image database and a private character database are used to demonstrate the effectiveness of the proposed method.
{"title":"Efficient SMQT features for snow-based classification on face detection and character recognition tasks","authors":"Y. Artan, A. Burry, V. Kozitsky, P. Paul","doi":"10.1109/WNYIPW.2012.6466644","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466644","url":null,"abstract":"Face detection using local successive mean quantization transform (SMQT) features and the sparse network of winnows (SNoW) classifier has received interest in the computer vision community due to its success under varying illumination conditions. Recent work has also demonstrated the effectiveness of this classification technique for character recognition tasks. However, heavy storage requirements of the SNoW classifier necessitate the development of efficient techniques to reduce storage and computational requirements. This study shows that the SNoW classifier built with only a limited number of distinguishing SMQT features provides comparable performance to the original dense snow classifier. Initial results using the well-known CMU-MIT facial image database and a private character database are used to demonstrate the effectiveness of the proposed method.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122101169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466653
S. Suresh, R. Dube
We present a multi-channel segmentation scheme to identify different features of the solar corona, such as coronal holes, active regions and the quiet sun (especially in the ultraviolet and extreme ultraviolet images). In contrast to common techniques, we use an approach that uses image intensity and relative contribution of each of the wavelengths. This approach is illustrated by using the images taken by the AIA telescopes onboard of the SDO mission. This technique incorporates a nearest-neighbor based classifier followed by Moore-neighbor tracing algorithm to find the boundaries and track the regions of interest. This method requires less computation time as compared to the commonly used fuzzy logic methods and is robust in the sense it performs equally well in both the central and limb regions of the solar disc.
{"title":"A multi-channel approach for segmentation of solar corona in images from the solar dynamics observatory","authors":"S. Suresh, R. Dube","doi":"10.1109/WNYIPW.2012.6466653","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466653","url":null,"abstract":"We present a multi-channel segmentation scheme to identify different features of the solar corona, such as coronal holes, active regions and the quiet sun (especially in the ultraviolet and extreme ultraviolet images). In contrast to common techniques, we use an approach that uses image intensity and relative contribution of each of the wavelengths. This approach is illustrated by using the images taken by the AIA telescopes onboard of the SDO mission. This technique incorporates a nearest-neighbor based classifier followed by Moore-neighbor tracing algorithm to find the boundaries and track the regions of interest. This method requires less computation time as compared to the commonly used fuzzy logic methods and is robust in the sense it performs equally well in both the central and limb regions of the solar disc.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"4 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116826289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466643
J. Sheaffer, M. Moore, M. Bobrov, A. Webster, M. Torres
We present a work-in-progress system for tracking small objects with limited features to produce stabilized, successive images for frame-to-frame analysis in a specialized, hand-held, digital microscopy environment with limited resolution and processing capability and soft real-time requirements. Our system is able to track dirt or imperfections as small as about 10 μm across the end face of a fiber optic cable under a moving camera. It must locate three distinct features and track each of them from frame to frame. The measured positions of the features are used to calculate transformation matrices relative to a selected basis image and move each image in the image stack into the coordinate frame of the basis image so that we can perform focus stacking on the set of images. All of this must be completed in under a second on a low-power, hand-held device.
{"title":"Motion tracking for realtime, offline image stabilization with limited hardware","authors":"J. Sheaffer, M. Moore, M. Bobrov, A. Webster, M. Torres","doi":"10.1109/WNYIPW.2012.6466643","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466643","url":null,"abstract":"We present a work-in-progress system for tracking small objects with limited features to produce stabilized, successive images for frame-to-frame analysis in a specialized, hand-held, digital microscopy environment with limited resolution and processing capability and soft real-time requirements. Our system is able to track dirt or imperfections as small as about 10 μm across the end face of a fiber optic cable under a moving camera. It must locate three distinct features and track each of them from frame to frame. The measured positions of the features are used to calculate transformation matrices relative to a selected basis image and move each image in the image stack into the coordinate frame of the basis image so that we can perform focus stacking on the set of images. All of this must be completed in under a second on a low-power, hand-held device.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127833651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466642
Thomas B. Kinsman, M. Fairchild, J. Pelz
Using a metric feature space for pattern recognition, data mining, and machine learning greatly simplifies the mathematics because distances are preserved under rotation and translation in feature space. A metric space also provides a “ruler”, or absolute measure of how different two feature vectors are. In the computer vision community color can easily be miss-treated as a metric distance. This paper serves as an introduction to why using a non-metric space is a challenge, and provides details of why color is not a valid Euclidean distance metric.
{"title":"Color is not a metric space implications for pattern recognition, machine learning, and computer vision","authors":"Thomas B. Kinsman, M. Fairchild, J. Pelz","doi":"10.1109/WNYIPW.2012.6466642","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466642","url":null,"abstract":"Using a metric feature space for pattern recognition, data mining, and machine learning greatly simplifies the mathematics because distances are preserved under rotation and translation in feature space. A metric space also provides a “ruler”, or absolute measure of how different two feature vectors are. In the computer vision community color can easily be miss-treated as a metric distance. This paper serves as an introduction to why using a non-metric space is a challenge, and provides details of why color is not a valid Euclidean distance metric.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132053030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-11-01DOI: 10.1109/WNYIPW.2012.6466648
T. P. Keane, N. Cahill, H. Rhody, B. Hu, J. Tarduno, R. Jacobs, J. Pelz
Given a set of images, or time-lapsed imagery, that is captured in an unconstrained domain, there are numerous methods to map that data into a domain that is readily displayable on basic rectilinear digital displays. However, while these mappings can be mathematically sound, they are methods that modify the spatio-temporal “scale” of the scene, and thus can become quite semantically confusing when viewed in this manner. In working with data to be mathematically and scientifically analyzed, we have found that we also require a level of semantic consistency, such as in providing accessible imagery for expert annotation or in generating “virtual field trips”. This is why we have applied the same mathematical rigor to instead develop an interactive, OpenGL-based viewing application for spatio-temporally non-linear imagery. This application is extensible and entirely custom, such that we can not only interact with single sets of related imagery, but also view the relationships across sets of imagery in a semantically viable, and mathematically sound, domain.
{"title":"Sphere2: Jerry's rig, an OpenGL application for non-linear panorama viewing and interaction","authors":"T. P. Keane, N. Cahill, H. Rhody, B. Hu, J. Tarduno, R. Jacobs, J. Pelz","doi":"10.1109/WNYIPW.2012.6466648","DOIUrl":"https://doi.org/10.1109/WNYIPW.2012.6466648","url":null,"abstract":"Given a set of images, or time-lapsed imagery, that is captured in an unconstrained domain, there are numerous methods to map that data into a domain that is readily displayable on basic rectilinear digital displays. However, while these mappings can be mathematically sound, they are methods that modify the spatio-temporal “scale” of the scene, and thus can become quite semantically confusing when viewed in this manner. In working with data to be mathematically and scientifically analyzed, we have found that we also require a level of semantic consistency, such as in providing accessible imagery for expert annotation or in generating “virtual field trips”. This is why we have applied the same mathematical rigor to instead develop an interactive, OpenGL-based viewing application for spatio-temporally non-linear imagery. This application is extensible and entirely custom, such that we can not only interact with single sets of related imagery, but also view the relationships across sets of imagery in a semantically viable, and mathematically sound, domain.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"27 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115352697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}