Some exemplars are more representative of their category than other exemplars. Here we ask whether good exemplars (high representativeness) of real-world scene categories are more informative with respect to their category than bad exemplars (low representativeness) by leveraging machine learning methods and a set of deep neural networks, including convolutional neural network (AlexNet, PlaceNet, ResNet), and transformer models (Vision transformer, transformer-based CLIP). We use a one-shot Support Vector Machine, in which the training set only contains one exemplar per category (two images in total) to ask whether good exemplars generalize to other members of their category better than bad. The resulting classification accuracy is interpreted as reflecting the degree to which information regarding category is present in the training exemplars. We used four natural scene categories (beaches, cities, highways, mountains). Experiment 1 showed good exemplars produced higher classification accuracy than bad exemplars in all the features tested. Experiment 2 demonstrated that multiple bad training exemplars were needed to reach the category information of a single good exemplar. Both experiments indicate that there is more category-related information in good exemplars than bad. Experiment 3 showed that the most informative images and the good exemplars did not fall in the center of the category space but rather towards the edge of the space, and this was true of the category spaces derived from the feature spaces and human similarity judgements. Overall, this work demonstrated that DNNs can capture human representativeness and provide a useful measure for capturing human scene category spaces.
扫码关注我们
求助内容:
应助结果提醒方式:
