Pub Date : 2017-12-01DOI: 10.1109/ICSIPA.2017.8120601
Ahmed Kamil Hasan Al-Ali, B. Senadji, G. Naik
The performance of forensic speaker verification degrades severely in the presence of high levels of environmental noise and reverberation conditions. Multiple channel speech enhancement algorithms are a possible solution to reduce the effect of environmental noise from the noisy speech signals. Although multiple speech enhancement algorithms such as multi-run independent component analysis (ICA) were used in previous studies to improve the performance of recognition in biosignal applications, the effectiveness of multi-run ICA algorithm to improve the performance of noisy forensic speaker verification under reverberation conditions has not been investigated yet. In this paper, the multi-run ICA algorithm is used to enhance the noisy speech signals by choosing the highest signal to interference ratio (SIR) of the mixing matrix from different mixing matrices generated by iterating the fast ICA algorithm for several times. Wavelet-based mel frequency cepstral coefficients (MFCCs) feature warping approach is applied to the enhanced speech signals to extract the robust features to environmental noise and reverberation conditions. The state-of-the-art intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) are used as a classifier in our approach. Experimental results show that forensic speaker verification based on the multi-run ICA algorithm achieves significant improvements in equal error rate (EER) of 60.88%, 51.84%, 66.15% over the baseline noisy speaker verification when enrolment speech signals reverberated at 0.15 sec and the test speech signals were mixed with STREET, CAR and HOME noises respectively at −10 dB signal to noise ratio (SNR).
{"title":"Enhanced forensic speaker verification using multi-run ICA in the presence of environmental noise and reverberation conditions","authors":"Ahmed Kamil Hasan Al-Ali, B. Senadji, G. Naik","doi":"10.1109/ICSIPA.2017.8120601","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120601","url":null,"abstract":"The performance of forensic speaker verification degrades severely in the presence of high levels of environmental noise and reverberation conditions. Multiple channel speech enhancement algorithms are a possible solution to reduce the effect of environmental noise from the noisy speech signals. Although multiple speech enhancement algorithms such as multi-run independent component analysis (ICA) were used in previous studies to improve the performance of recognition in biosignal applications, the effectiveness of multi-run ICA algorithm to improve the performance of noisy forensic speaker verification under reverberation conditions has not been investigated yet. In this paper, the multi-run ICA algorithm is used to enhance the noisy speech signals by choosing the highest signal to interference ratio (SIR) of the mixing matrix from different mixing matrices generated by iterating the fast ICA algorithm for several times. Wavelet-based mel frequency cepstral coefficients (MFCCs) feature warping approach is applied to the enhanced speech signals to extract the robust features to environmental noise and reverberation conditions. The state-of-the-art intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) are used as a classifier in our approach. Experimental results show that forensic speaker verification based on the multi-run ICA algorithm achieves significant improvements in equal error rate (EER) of 60.88%, 51.84%, 66.15% over the baseline noisy speaker verification when enrolment speech signals reverberated at 0.15 sec and the test speech signals were mixed with STREET, CAR and HOME noises respectively at −10 dB signal to noise ratio (SNR).","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130484090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-13DOI: 10.1109/ICSIPA.2017.8120665
KangUn Jo, Jung-Hui Im, Jingu Kim, Dae-Shik Kim
Multi-class multi-object tracking is an important problem for real-world applications like surveillance system, gesture recognition, and robot vision system. However, building a multi-class multi-object tracker that works in real-time is difficult due to low processing speed for detection, classification, and data association tasks. By using fast and reliable deep learning based algorithm YOLOv2 together with fast detection to tracker algorithm, we build a real-time multi-class multi-object tracking system with competitive accuracy.
{"title":"A real-time multi-class multi-object tracker using YOLOv2","authors":"KangUn Jo, Jung-Hui Im, Jingu Kim, Dae-Shik Kim","doi":"10.1109/ICSIPA.2017.8120665","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120665","url":null,"abstract":"Multi-class multi-object tracking is an important problem for real-world applications like surveillance system, gesture recognition, and robot vision system. However, building a multi-class multi-object tracker that works in real-time is difficult due to low processing speed for detection, classification, and data association tasks. By using fast and reliable deep learning based algorithm YOLOv2 together with fast detection to tracker algorithm, we build a real-time multi-class multi-object tracking system with competitive accuracy.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/ICSIPA.2017.8120620
S. Agha, Farmanullah Jan, Dilshad Sabir, Khurram Saleem, Usman Ali Gulzari, Atif Shakeel
Full search Motion Estimation (M.E.) process is computationally intensive and power consuming, which might be unsuitable for battery powered real time applications. In this work, different M.E. algorithms are being presented. Algorithms 1 to 3 are beneficial for low power and high throughput VLSI implementation while keeping the quality at optimum level. Three VLSI architectures are presented corresponding to the three algorithms. Theoretically, Architecture 1 reduces the pixel accesses from memory and hence power consumption by 23%. Architecture 2 reduces the pixel accesses by 48% and Architecture 3 reduces pixel accesses by 52%. Finally we present a suboptimal fast M.E. algorithm which is a modified form of Diamond Search algorithm, has less complexity and improved quality as compared to standard diamond search M.E. algorithm.
{"title":"Optimal motion estimation using reduced bits and its low power VLSI implementation","authors":"S. Agha, Farmanullah Jan, Dilshad Sabir, Khurram Saleem, Usman Ali Gulzari, Atif Shakeel","doi":"10.1109/ICSIPA.2017.8120620","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120620","url":null,"abstract":"Full search Motion Estimation (M.E.) process is computationally intensive and power consuming, which might be unsuitable for battery powered real time applications. In this work, different M.E. algorithms are being presented. Algorithms 1 to 3 are beneficial for low power and high throughput VLSI implementation while keeping the quality at optimum level. Three VLSI architectures are presented corresponding to the three algorithms. Theoretically, Architecture 1 reduces the pixel accesses from memory and hence power consumption by 23%. Architecture 2 reduces the pixel accesses by 48% and Architecture 3 reduces pixel accesses by 52%. Finally we present a suboptimal fast M.E. algorithm which is a modified form of Diamond Search algorithm, has less complexity and improved quality as compared to standard diamond search M.E. algorithm.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121149136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/ICSIPA.2017.8120673
Umma Hany, L. Akter
Accurate localization of Wireless video capsule endoscope (VCE) is a crucial requirement for proper diagnosis of intestinal abnormalities. A major challenge in RF based localization is the shadow fading and multi-path propagation effects of non-homogeneous medium of human body which causes high random deviations in the measured path loss resulting in high localization error. To address the randomness issue of the scattered path loss, we propose Savitzky-Golay filtering to estimate the smoothed path loss. Then we estimate the positions of the moving capsule using weighted centroid localization (WCL) algorithm by finding the weighted average of the sensor's position. We compute the weight of the sensor receivers position using degree based estimated smoothed path loss. Finally, we propose two position bounds on the estimated positions to improve the accuracy of localization and verify the accuracy using different performance metrics. To validate our proposed algorithm, we develop a simulation platform using MATLAB and observe significant improvement over the literature using our proposed position bounded smoothed path loss based WCL without any prior knowledge of channel parameters or distance.
{"title":"Accuracy of endoscopic capsule localization using position bounds on smoothed path loss based WCL","authors":"Umma Hany, L. Akter","doi":"10.1109/ICSIPA.2017.8120673","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120673","url":null,"abstract":"Accurate localization of Wireless video capsule endoscope (VCE) is a crucial requirement for proper diagnosis of intestinal abnormalities. A major challenge in RF based localization is the shadow fading and multi-path propagation effects of non-homogeneous medium of human body which causes high random deviations in the measured path loss resulting in high localization error. To address the randomness issue of the scattered path loss, we propose Savitzky-Golay filtering to estimate the smoothed path loss. Then we estimate the positions of the moving capsule using weighted centroid localization (WCL) algorithm by finding the weighted average of the sensor's position. We compute the weight of the sensor receivers position using degree based estimated smoothed path loss. Finally, we propose two position bounds on the estimated positions to improve the accuracy of localization and verify the accuracy using different performance metrics. To validate our proposed algorithm, we develop a simulation platform using MATLAB and observe significant improvement over the literature using our proposed position bounded smoothed path loss based WCL without any prior knowledge of channel parameters or distance.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124835576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/ICSIPA.2017.8120639
F. F. Chamasemani, L. S. Affendey, N. Mustapha, F. Khalid
Many researches have been conducted on video abstraction for quick viewing of video archives, however there is a lack of approach that considers abstraction as a pre-processing stage in video analysis. This paper aims to investigate the efficiency of integrating video abstraction in surveillance video indexing and retrieval framework. The basic idea is to reduce the computational complexity and cost of overall processes by using the abstract version of the original video that excludes unnecessary and redundant information. The experimental results show a significant reduction of 87% in computational cost by using the abstract video rather than the original video in both indexing and retrieval processes.
{"title":"Speeded up surveillance video indexing and retrieval using abstraction","authors":"F. F. Chamasemani, L. S. Affendey, N. Mustapha, F. Khalid","doi":"10.1109/ICSIPA.2017.8120639","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120639","url":null,"abstract":"Many researches have been conducted on video abstraction for quick viewing of video archives, however there is a lack of approach that considers abstraction as a pre-processing stage in video analysis. This paper aims to investigate the efficiency of integrating video abstraction in surveillance video indexing and retrieval framework. The basic idea is to reduce the computational complexity and cost of overall processes by using the abstract version of the original video that excludes unnecessary and redundant information. The experimental results show a significant reduction of 87% in computational cost by using the abstract video rather than the original video in both indexing and retrieval processes.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125306903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/ICSIPA.2017.8120632
S. Sari, Y. Y. Chia, M. N. Mohd, N. Taujuddin, Nabilah Ibrahim, H. Roslan
Due to the limitation of camera technologies in low cost development, digital images are easily corrupted by various types of noise such as Salt and Pepper noise, Gaussian noise and Poisson noise. For digital image captured in the photon limited low light condition, the effect of image noise especially Poisson noise will be more obvious, degrading the quality of the image. Thus, this study aims to develop new denoising techniques for Poisson noise removal in low light condition digital images. This study proposed a method which is referred to the OKWW Filter which utilizes Otsu Threshold, Kuwahara Filter, Wiener Filter, and Wavelet Threshold. This filter is designed for high Poisson noise removal. The proposed filter performance is compared with other existing denoising techniques. The results show that proposed OKWW Filter is the best in high level Poisson noise removal while preserving the edges and fine details of noisy images.
{"title":"OKWW filter for poisson noise removal in low-light condition digital image","authors":"S. Sari, Y. Y. Chia, M. N. Mohd, N. Taujuddin, Nabilah Ibrahim, H. Roslan","doi":"10.1109/ICSIPA.2017.8120632","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120632","url":null,"abstract":"Due to the limitation of camera technologies in low cost development, digital images are easily corrupted by various types of noise such as Salt and Pepper noise, Gaussian noise and Poisson noise. For digital image captured in the photon limited low light condition, the effect of image noise especially Poisson noise will be more obvious, degrading the quality of the image. Thus, this study aims to develop new denoising techniques for Poisson noise removal in low light condition digital images. This study proposed a method which is referred to the OKWW Filter which utilizes Otsu Threshold, Kuwahara Filter, Wiener Filter, and Wavelet Threshold. This filter is designed for high Poisson noise removal. The proposed filter performance is compared with other existing denoising techniques. The results show that proposed OKWW Filter is the best in high level Poisson noise removal while preserving the edges and fine details of noisy images.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116380902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/ICSIPA.2017.8120646
E. A. Awalludin, M. S. Hitam, W. Yussof, Z. Bachok
In recent years, monitoring of coral reef status and health are done with the assist from image processing technique. Since underwater images are always suffer from major drawbacks, research in this area is still active. In this paper, we propose to use edge based segmentation where we modify the original canny edge detector and then use the blob processing technique to extract dominant features from the images. We conduct the experiments using images that are extracted from video transect and the results are promising for estimating coral reefs distribution.
{"title":"Modification of canny edge detection for coral reef components estimation distribution from underwater video transect","authors":"E. A. Awalludin, M. S. Hitam, W. Yussof, Z. Bachok","doi":"10.1109/ICSIPA.2017.8120646","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120646","url":null,"abstract":"In recent years, monitoring of coral reef status and health are done with the assist from image processing technique. Since underwater images are always suffer from major drawbacks, research in this area is still active. In this paper, we propose to use edge based segmentation where we modify the original canny edge detector and then use the blob processing technique to extract dominant features from the images. We conduct the experiments using images that are extracted from video transect and the results are promising for estimating coral reefs distribution.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/ICSIPA.2017.8120633
K. Katayama, K. Shibata, Y. Horita
Three-dimensional nerve information is required for the diagnosis of peripheral neuropathy. We have developed a prototype manipulating device and developed an algorithm for extracting peripheral nerves from the ultrasonic wave images captured using this probe and produce three-dimensional median nerve. Unlike the images captured by artificially manipulating the probe, the images captured by our device captures images of same area only at once. They are partially clear making it easy to extract nerve contours, or partially unclear (blurry) which are difficult to extract. In order to solve this problem, this paper reports that noise reduction and inter-organization edge emphasis are applied to ultrasonic wave images and nerves are extracted using the images.
{"title":"Noise reduction and enhancement of contour for median nerve detection in ultrasonic image","authors":"K. Katayama, K. Shibata, Y. Horita","doi":"10.1109/ICSIPA.2017.8120633","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120633","url":null,"abstract":"Three-dimensional nerve information is required for the diagnosis of peripheral neuropathy. We have developed a prototype manipulating device and developed an algorithm for extracting peripheral nerves from the ultrasonic wave images captured using this probe and produce three-dimensional median nerve. Unlike the images captured by artificially manipulating the probe, the images captured by our device captures images of same area only at once. They are partially clear making it easy to extract nerve contours, or partially unclear (blurry) which are difficult to extract. In order to solve this problem, this paper reports that noise reduction and inter-organization edge emphasis are applied to ultrasonic wave images and nerves are extracted using the images.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129976709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new discriminative learning framework to associate the relationship between the objects and the words in an image and perform template matching scheme for complex association patterns. The problem is first formulated as a bipartite graph matching problem. Thereafter, structural support vector machine (SVM) is employed to obtain the optimal compatibility function to encode the association rules between the objects and the words. Moreover, an iterative inference procedure is developed to alternatively infer the association of visual objects and texts and the selection of the template model. Simulations show that the new method outperforms the existing competing counterparts.
本文提出了一种新的判别学习框架,用于关联图像中物体与单词之间的关系,并对复杂的关联模式执行模板匹配方案。该问题首先被表述为一个二部图匹配问题。然后,利用结构支持向量机(structural support vector machine, SVM)得到最优兼容函数,对对象与单词之间的关联规则进行编码。此外,还开发了一个迭代推理程序来交替地推断视觉对象和文本的关联以及模板模型的选择。仿真结果表明,该方法优于现有的同类方法。
{"title":"Learning visual object and word association","authors":"Yie-Tarng Chen, Ting-Zhi Wang, Wen-Hsien Fang, Didik Purwanto","doi":"10.1109/ICSIPA.2017.8120577","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120577","url":null,"abstract":"This paper presents a new discriminative learning framework to associate the relationship between the objects and the words in an image and perform template matching scheme for complex association patterns. The problem is first formulated as a bipartite graph matching problem. Thereafter, structural support vector machine (SVM) is employed to obtain the optimal compatibility function to encode the association rules between the objects and the words. Moreover, an iterative inference procedure is developed to alternatively infer the association of visual objects and texts and the selection of the template model. Simulations show that the new method outperforms the existing competing counterparts.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"498 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127589581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/ICSIPA.2017.8120635
I. Hipiny, Hamimah Ujir, Jacey-Lynn Minoi, Sarah Flora Samson Juan, M. A. Khairuddin, M. Sunar
Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine's understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.
{"title":"Unsupervised segmentation of action segments in egocentric videos using gaze","authors":"I. Hipiny, Hamimah Ujir, Jacey-Lynn Minoi, Sarah Flora Samson Juan, M. A. Khairuddin, M. Sunar","doi":"10.1109/ICSIPA.2017.8120635","DOIUrl":"https://doi.org/10.1109/ICSIPA.2017.8120635","url":null,"abstract":"Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine's understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115839782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}