Efficient hardware implementations routinely approximate mathematical functions with look-up tables, while keeping the error of the approximation under control. For a certain class of commonly occurring 1D functions, namely monotonically increasing or decreasing functions, we found that it is possible to approximate such functions by repeated application of a very low resolution 1D look-up table. There are many advantages to cascading multiple identical LUTs, including the promise of a very simple hardware design and the use of standard linear interpolation. Further, the complexity associated with unequal bin sizes can be avoided. We show that for realistic applications, including gamma correction, high dynamic range encoding and decoding curves, as well as tone mapping and inverse tone mapping applications, multiple cascaded look-up tables can reduce the approximation error by more than 50% compared to a single look-up table with the same total memory footprint.
Linear discriminant analysis has been incorporated with various representations and measurements for dimension reduction and feature extraction. In this paper, we propose two-dimensional quaternion sparse discriminant analysis (2D-QSDA) that meets the requirements of representing RGB and RGB-D images. 2D-QSDA advances in three aspects: 1) including sparse regularization, 2D-QSDA relies only on the important variables, and thus shows good generalization ability to the out-of-sample data which are unseen during the training phase; 2) benefited from quaternion representation, 2D-QSDA well preserves the high order correlation among different image channels and provides a unified approach to extract features from RGB and RGB-D images; 3) the spatial structure of the input images is retained via the matrix-based processing. We tackle the constrained trace ratio problem of 2D-QSDA by solving a corresponding constrained trace difference problem, which is then transformed into a quaternion sparse regression (QSR) model. Afterward, we reformulate the QSR model to an equivalent complex form to avoid the processing of the complicated structure of quaternions. A nested iterative algorithm is designed to learn the solution of 2D-QSDA in the complex space and then we convert this solution back to the quaternion domain. To improve the separability of 2D-QSDA, we further propose 2D-QSDAw using the weighted pairwise between-class distances. Extensive experiments on RGB and RGB-D databases demonstrate the effectiveness of 2D-QSDA and 2D-QSDAw compared with peer competitors.
Abnormal event detection is an important task in video surveillance systems. In this paper, we propose a novel bidirectional multi-scale aggregation networks (BMAN) for abnormal event detection. The proposed BMAN learns spatiotemporal patterns of normal events to detect deviations from the learned normal patterns as abnormalities. The BMAN consists of two main parts: an inter-frame predictor and an appearancemotion joint detector. The inter-frame predictor is devised to encode normal patterns, which generates an inter-frame using bidirectional multi-scale aggregation based on attention. With the feature aggregation, robustness for object scale variations and complex motions is achieved in normal pattern encoding. Based on the encoded normal patterns, abnormal events are detected by the appearance-motion joint detector in which both appearance and motion characteristics of scenes are considered. Comprehensive experiments are performed, and the results show that the proposed method outperforms the existing state-of-the-art methods. The resulting abnormal event detection is interpretable on the visual basis of where the detected events occur. Further, we validate the effectiveness of the proposed network designs by conducting ablation study and feature visualization.
Eye tracking technology in low resolution scenarios is not a completely solved issue to date. The possibility of using eye tracking in a mobile gadget is a challenging objective that would permit to spread this technology to non-explored fields. In this paper, a knowledge based approach is presented to solve gaze estimation in low resolution settings. The understanding of the high resolution paradigm permits to propose alternative models to solve gaze estimation. In this manner, three models are presented: a geometrical model, an interpolation model and a compound model, as solutions for gaze estimation for remote low resolution systems. Since this work considers head position essential to improve gaze accuracy, a method for head pose estimation is also proposed. The methods are validated in an optimal framework, I2Head database, which combines head and gaze data. The experimental validation of the models demonstrates their sensitivity to image processing inaccuracies, critical in the case of the geometrical model. Static and extreme movement scenarios are analyzed showing the higher robustness of compound and geometrical models in the presence of user's displacement. Accuracy values of about 3° have been obtained, increasing to values close to 5° in extreme displacement settings, results fully comparable with the state-of-the-art.
The background Initialization (BI) problem has attracted the attention of researchers in different image/video processing fields. Recently, a tensor-based technique called spatiotemporal slice-based singular value decomposition (SS-SVD) has been proposed for background initialization. SS-SVD applies the SVD on the tensor slices and estimates the background from low-rank information. Despite its efficiency in background initialization, the performance of SS-SVD requires further improvement in the case of complex sequences with challenges such as stationary foreground objects (SFOs), illumination changes, low frame-rate, and clutter. In this paper, a self-motion-assisted tensor completion method is proposed to overcome the limitations of SS-SVD in complex video sequences and enhance the visual appearance of the initialized background. With the proposed method, the motion information, extracted from the sparse portion of the tensor slices, is incorporated with the low-rank information of SS-SVD to eliminate existing artifacts in the initiated background. Efficient blending schemes between the low-rank (background) and sparse (foreground) information of the tensor slices is developed for scenarios such as SFO removal, lighting variation processing, low frame-rate processing, crowdedness estimation, and best frame selection. The performance of the proposed method on video sequences with complex scenarios is compared with the top-ranked state-of-the-art techniques in the field of background initialization. The results not only validate the improved performance over the majority of the tested challenges but also demonstrate the capability of the proposed method to initialize the background in less computational time.
Recently, based on a new tensor algebraic framework for third-order tensors, the tensor singular value decomposition (t-SVD) and its associated tubal rank definition have shed new light on low-rank tensor modeling. Its applications to robust image/video recovery and background modeling show promising performance due to its superior capability in modeling cross-channel/frame information. Under the t-SVD framework, we propose a new tensor norm called tensor spectral k-support norm (TSP-k) by an alternative convex relaxation. As an interpolation between the existing tensor nuclear norm (TNN) and tensor Frobenius norm (TFN), it is able to simultaneously drive minor singular values to zero to induce low-rankness, and to capture more global information for better preserving intrinsic structure. We provide the proximal operator and the polar operator for the TSP-k norm as key optimization blocks, along with two showcase optimization algorithms for medium-and large-size tensors. Experiments on synthetic, image and video datasets in medium and large sizes, all verify the superiority of the TSP-k norm and the effectiveness of both optimization methods in comparison with the existing counterparts.
Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease (PD), a neurodegenerative disorder which impacts millions of people around the world. Accurate assessment of FoG is critical for the management of PD and to evaluate the efficacy of treatments. Currently, the assessment of FoG requires well-trained experts to perform time-consuming annotations via vision-based observations. Thus, automatic FoG detection algorithms are needed. In this study, we formulate vision-based FoG detection, as a fine-grained graph sequence modelling task, by representing the anatomic joints in each temporal segment with a directed graph, since FoG events can be observed through the motion patterns of joints. A novel deep learning method is proposed, namely graph sequence recurrent neural network (GS-RNN), to characterize the FoG patterns by devising graph recurrent cells, which take graph sequences of dynamic structures as inputs. For the cases of which prior edge annotations are not available, a data-driven based adjacency estimation method is further proposed. To the best of our knowledge, this is one of the first studies on vision-based FoG detection using deep neural networks designed for graph sequences of dynamic structures. Experimental results on more than 150 videos collected from 45 patients demonstrated promising performance of the proposed GS-RNN for FoG detection with an AUC value of 0.90.
Free viewpoint video (FVV) has received considerable attention owing to its widespread applications in several areas such as immersive entertainment, remote surveillance and distanced education. Since FVV images are synthesized via a depth image-based rendering (DIBR) procedure in the "blind" environment (without reference images), a real-time and reliable blind quality assessment metric is urgently required. However, the existing image quality assessment metrics are insensitive to the geometric distortions engendered by DIBR. In this research, a novel blind method of DIBR-synthesized images is proposed based on measuring geometric distortion, global sharpness and image complexity. First, a DIBR-synthesized image is decomposed into wavelet subbands by using discrete wavelet transform. Then, the Canny operator is employed to detect the edges of the binarized low-frequency subband and high-frequency subbands. The edge similarities between the binarized low-frequency subband and high-frequency subbands are further computed to quantify geometric distortions in DIBR-synthesized images. Second, the log-energies of wavelet subbands are calculated to evaluate global sharpness in DIBR-synthesized images. Third, a hybrid filter combining the autoregressive and bilateral filters is adopted to compute image complexity. Finally, the overall quality score is derived to normalize geometric distortion and global sharpness by the image complexity. Experiments show that our proposed quality method is superior to the competing reference-free state-of-the-art DIBR-synthesized image quality models.