Datastream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while incrementally updating the model efficiently and stably, especially for high-dimensional and/or large-scale data streams. This paper proposes an efficient framework for novelty detection and incremental learning for unlabeled chunk data streams. First, an accurate factorization-free kernel discriminative analysis (FKDA-X) is put forward through solving a linear system in the kernel space. FKDA-X produces a Reproducing Kernel Hilbert Space (RKHS), in which unlabeled chunk data can be detected and classified by multiple known-classes in a single decision model with a deterministic classification boundary. Moreover, based on FKDA-X, two optimal methods FKDA-CX and FKDA-C are proposed. FKDA-CX uses the micro-cluster centers of original data as the input to achieve excellent performance in novelty detection. FKDA-C and incremental FKDA-C (IFKDA-C) using the class centers of original data as their input have extremely fast speed in online learning. Theoretical analysis and experimental validation on under-sampled and large-scale real-world datasets demonstrate that the proposed algorithms make it possible to learn unlabeled chunk data streams with significantly lower computational costs and comparable accuracies than the state-of-the-art approaches.
This paper conducts a systematic study on the role of visual attention in video object pattern understanding. By elaborately annotating three popular video segmentation datasets (DAVIS 16, Youtube-Objects, and SegTrack V2) with dynamic eye-tracking data in the unsupervised video object segmentation (UVOS) setting. For the first time, we quantitatively verified the high consistency of visual attention behavior among human observers, and found strong correlation between human attention and explicit primary object judgments during dynamic, task-driven viewing. Such novel observations provide an in-depth insight of the underlying rationale behind video object pattens. Inspired by these findings, we decouple UVOS into two sub-tasks: UVOS-driven Dynamic Visual Attention Prediction (DVAP) in spatiotemporal domain, and Attention-Guided Object Segmentation (AGOS) in spatial domain. Our UVOS solution enjoys three major advantages: 1) modular training without using expensive video segmentation annotations, instead, using more affordable dynamic fixation data to train the initial video attention module and using existing fixation-segmentation paired static/image data to train the subsequent segmentation module; 2) comprehensive foreground understanding through multi-source learning; and 3) additional interpretability from the biologically-inspired and assessable attention. Experiments on four popular benchmarks show that, even without using expensive video object mask annotations, our model achieves compelling performance compared with state-of-the-arts and enjoys fast processing speed (10 fps on a single GPU). Our collected eye-tracking data and algorithm implementations have been made publicly available at https://github.com/wenguanwang/AGS.
Estimation of texture similarity is fundamental to many material recognition tasks. This study uses fine-grained human perceptual similarity ground-truth to provide a comprehensive evaluation of 51 texture feature sets. We conduct two types of evaluation and both show that these features do not estimate similarity well when compared against human agreement rates, but that performances are improved when the features are combined using a Random Forest. Using a simple two-stage statistical model we show that few of the features capture long-range aperiodic relationships. We perform two psychophysical experiments which indicate that long-range interactions do provide humans with important cues for estimating texture similarity. This motivates an extension of the study to include Convolutional Neural Networks (CNNs) as they enable arbitrary features of large spatial extent to be learnt. Our conclusions derived from the use of two pre-trained CNNs are: that the large spatial extent exploited by the networks' top convolutional and first fully-connected layers, together with the use of large numbers of filters, confers significant advantage for estimation of perceptual texture similarity.
We present an algorithm to directly solve numerous image restoration problems (e.g., image deblurring, image dehazing, and image deraining). These problems are ill-posed, and the common assumptions for existing methods are usually based on heuristic image priors. In this paper, we show that these problems can be solved by generative models with adversarial learning. However, a straightforward formulation based on a straightforward generative adversarial network (GAN) does not perform well in these tasks, and some structures of the estimated images are usually not preserved well. Motivated by an interesting observation that the estimated results should be consistent with the observed inputs under the physics models, we propose an algorithm that guides the estimation process of a specific task within the GAN framework. The proposed model is trained in an end-to-end fashion and can be applied to a variety of image restoration and low-level vision problems. Extensive experiments demonstrate that the proposed method performs favorably against state-of-the-art algorithms.
Recently, deep convolutional neural network (CNN) has achieved great success for image restoration (IR) and provided hierarchical features at the same time. However, most deep CNN based IR models do not make full use of the hierarchical features from the original low-quality images; thereby, resulting in relatively-low performance. In this work, we propose a novel and efficient residual dense network (RDN) to address this problem in IR, by making a better tradeoff between efficiency and effectiveness in exploiting the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via densely connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism. To adaptively learn more effective features from preceding and current local features and stabilize the training of wider network, we proposed local feature fusion in RDB. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. We demonstrate the effectiveness of RDN with several representative IR applications, single image super-resolution, Gaussian image denoising, image compression artifact reduction, and image deblurring. Experiments on benchmark and real-world datasets show that our RDN achieves favorable performance against state-of-the-art methods for each IR task quantitatively and visually.