Fan Liu, Feifan Li, Sai Yang. Few-shot classification using Gaussianisation prototypical classifier.
IET Computer Vision 2023 February; 17(1); 62–75. https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12129
Fan Liu, Feifan Li, Sai Yang. Few-shot classification using Gaussianisation prototypical classifier.
IET Computer Vision 2023 February; 17(1); 62–75. https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12129
The existing monocular depth estimation methods based on deep learning have difficulty in estimating the depth near the edges of the objects in an image when the depth distance between these objects changes abruptly and decline in accuracy when an image has more noises. Furthermore, these methods consume more hardware resources because they have huge network parameters. To solve these problems, this paper proposes a depth estimation method based on weighted fusion and point-wise convolution. The authors design a maximum-average adaptive pooling weighted fusion module (MAWF) that fuses global features and local features and a continuous point-wise convolution module for processing the fused features derived from the (MAWF) module. The two modules work closely together for three times to perform weighted fusion and point-wise convolution of features of multi-scale from the encoder output, which can better decode the depth information of a scene. Experimental results show that our method achieves state-of-the-art performance on the KITTI dataset with δ1 up to 0.996 and the root mean square error metric down to 8% and has demonstrated the strong generalisation and robustness.
Most of the successful person re-ID models conduct supervised training and need a large number of training data. These models fail to generalise well on unseen unlabelled testing sets. The authors aim to learn a generalisable person re-identification model. The model uses one labelled source dataset and one unlabelled target dataset during training and generalises well on the target testing set. To this end, after a feature extraction by the ResNext-50 network, the authors optimise the model by three loss functions. (a) One loss function is designed to learn the features of the target domain by tuning the distances between target images. Therefore, the trained model will be more robust to overcome the intra-domain variations in the target domain and generalises well on the target testing set. (b) One triplet loss is used which considers both source and target domains and makes the model learn the inter-domain variations between source and target domain as well as the variations in the target domain. (c) Also, one loss function is for supervised learning on the labelled source domain. Extensive experiments on Market1501 and DukeMTMC re-ID show that the model achieves a very competitive performance compared with state-of-the-art models and also it requires an acceptable amount of GPU RAM compared to other successful models.
The authors wish to bring to the readers' attention the following errors in the article by He, D., et al.: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification [1].
In Funding Information section the funding number for National Natural Science Foundation of China is incorrectly mentioned as 2022KYCX032Z. It should be 62171321.
Sketch face recognition has a wide range of applications in criminal investigation, but it remains a challenging task due to the small-scale sample and the semantic deficiencies caused by cross-modality differences. The authors propose a light semantic Transformer network to extract and model the semantic information of cross-modality images. First, the authors employ a meta-learning training strategy to obtain task-related training samples to solve the small sample problem. Then to solve the contradiction between the high complexity of the Transformer and the small sample problem of sketch face recognition, the authors build the light semantic transformer network by proposing a hierarchical group linear transformation and introducing parameter sharing, which can extract highly discriminative semantic features on small–scale datasets. Finally, the authors propose a domain-adaptive focal loss to reduce the cross-modality differences between sketches and photos and improve the training effect of the light semantic Transformer network. Extensive experiments have shown that the features extracted by the proposed method have significant discriminative effects. The authors’ method improves the recognition rate by 7.6% on the UoM-SGFSv2 dataset, and the recognition rate reaches 92.59% on the CUFSF dataset.
Heatmap-based regression (HBR) methods have dominated for a long time in the face alignment field while these methods need complex design and post-processing. In this study, the authors propose an end-to-end and simple enough coordinate-based regression (CBR) method called Dynamic Deformable Transformer (DDT) for face alignment. Unlike general pre-defined landmark queries, DDT uses Dynamic Landmark Queries (DLQs) to query landmarks' classes and coordinates together. Besides, DDT adopts a deformable attention mechanism rather than a regular attention mechanism which has a faster convergence speed and lower computational complexity. Experiment results on three mainstream datasets 300W, WFLW, and COFW demonstrate DDT exceeds the state-of-the-art CBR methods by a large margin and is comparable to the current state-of-the-art HBR methods with much less computational complexity.
Clean energy is a major trend. The importance of photovoltaic power generation is also growing. Photovoltaic power generation is mainly affected by the weather. It is full of uncertainties. Previous work has relied chiefly on historical photovoltaics data for time series forecasts. However, unforeseen weather conditions can sometimes skew. Consequently, a spatial-temporal-meteorological-long short-term memory prediction model (STM-LSTM) is proposed to compensate for the shortage of photovoltaic prediction models for uncertainties. This model can simultaneously process satellite image data, historical meteorological data, and historical power generation data. In this way, historical patterns and meteorological change information are extracted to improve the accuracy of photovoltaic prediction. STM-LSTM processes raw satellite data to obtain cloud image data. It can extract cloud motion information using the dense optical flow method. First, the cloud images are processed to extract cloud position information. By adaptive attentive learning of images in different bands, a better representation for subsequent tasks can be obtained. Second, it is important to process historical meteorological data to learn meteorological change patterns. Last but not least, the historical photovoltaic power generation sequences are combined to obtain the final photovoltaic prediction results. After a series of experimental validation, the performance of the proposed STM-LSTM model has a good improvement compared with the baseline model.
Data augmentation diversifies the information in the dataset. For class imbalance, the copy-paste augmentation generates new class information to alleviate the impact of this problem. However, these methods rely excessively on human intuition. Over-fitting or under-fitting can occur while adding the class information, which is inappropriate. The authors propose a self-adaptive data augmentation: the copy-paste with self-adaptation (CPA) algorithm, which improves the phenomenon of over-fitting and under-fitting. For the CPA, the evaluation results of a model are taken as an important adjustment basis. The evaluation results are combined with the information of class imbalance to generate a set of class weights. Different number of class information will be replenished according to class weights. Finally, the generated images will be inserted into the training dataset and the model will start formal training. The experimental results show that CPA can alleviate class imbalance. For TT100 K dataset, YOLOv3 is trained with the optimised dataset and its AP is increased by 2% for VOC2007 dataset, the mAP of RetinaNet on optimised dataset is 78.46, which is 1.2% higher than original dataset. For COCO2017 dataset, SSD300 is trained with the optimised dataset and its AP is increased by 1.3%.