{"title":"基于 Manifold 学习的无监督图像分类器","authors":"Jinghao Situ","doi":"10.62051/31s5nw90","DOIUrl":null,"url":null,"abstract":"Currently most of image classification tasks are achieved by supervised learning. High-quality datasets naturally bring difficulties in annotation, and the datasets in real-world applications present a nonlinear structure, and the annotation cost grows exponentially with the number of targets and the difficulty of recognisability. In this context, research about unsupervised image classification is the way to go. Traditional unsupervised learning for classification is mostly based on the Euclidean distance and various paradigms, which is unable to extract the nonlinear structure of the dataset. This shortcoming makes the accuracy of traditional unsupervised image classification drop drastically. In this paper, we propose to first extract the nonlinear structure of the original dataset using the manifold learning method, and then produce pseudo-labels through the agglomerative clustering algorithm. The pseudo-labels obtained in this way can effectively retain the special mathematical structure of the original data with high accuracy. The neural network is trained with these pseudo labels to obtain an unsupervised usable image classifier. The classifier can be trained on small-scale data and then applied to large-scale data sets, thus saving the cost of manual labelling. The experiments are carried out by setting up a control group and two manifold learning groups for the extraction of non-linear structures using LLE and Isomap algorithms respectively. After that, the production of pseudo-labels and the training of neural networks are completed, and the accuracy of the three groups is compared. Finally, it is concluded that the correct rate of the two groups that have gone through the manifold learning algorithm to extract the nonlinear structure is much higher than that of the other one, and the image classifier based on the Isomap algorithm achieves an accuracy of 85% in the test set, which is highly practical.","PeriodicalId":509968,"journal":{"name":"Transactions on Computer Science and Intelligent Systems Research","volume":"31 19","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Image Classifier based on Manifold Learning\",\"authors\":\"Jinghao Situ\",\"doi\":\"10.62051/31s5nw90\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently most of image classification tasks are achieved by supervised learning. High-quality datasets naturally bring difficulties in annotation, and the datasets in real-world applications present a nonlinear structure, and the annotation cost grows exponentially with the number of targets and the difficulty of recognisability. In this context, research about unsupervised image classification is the way to go. Traditional unsupervised learning for classification is mostly based on the Euclidean distance and various paradigms, which is unable to extract the nonlinear structure of the dataset. This shortcoming makes the accuracy of traditional unsupervised image classification drop drastically. In this paper, we propose to first extract the nonlinear structure of the original dataset using the manifold learning method, and then produce pseudo-labels through the agglomerative clustering algorithm. The pseudo-labels obtained in this way can effectively retain the special mathematical structure of the original data with high accuracy. The neural network is trained with these pseudo labels to obtain an unsupervised usable image classifier. The classifier can be trained on small-scale data and then applied to large-scale data sets, thus saving the cost of manual labelling. The experiments are carried out by setting up a control group and two manifold learning groups for the extraction of non-linear structures using LLE and Isomap algorithms respectively. After that, the production of pseudo-labels and the training of neural networks are completed, and the accuracy of the three groups is compared. Finally, it is concluded that the correct rate of the two groups that have gone through the manifold learning algorithm to extract the nonlinear structure is much higher than that of the other one, and the image classifier based on the Isomap algorithm achieves an accuracy of 85% in the test set, which is highly practical.\",\"PeriodicalId\":509968,\"journal\":{\"name\":\"Transactions on Computer Science and Intelligent Systems Research\",\"volume\":\"31 19\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions on Computer Science and Intelligent Systems Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.62051/31s5nw90\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions on Computer Science and Intelligent Systems Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62051/31s5nw90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
目前,大多数图像分类任务都是通过监督学习实现的。高质量的数据集自然会给标注带来困难,而且实际应用中的数据集呈现非线性结构,标注成本会随着目标数量和识别难度的增加而呈指数增长。在这种情况下,有关无监督图像分类的研究就成了必由之路。传统的无监督分类学习大多基于欧氏距离和各种范式,无法提取数据集的非线性结构。这一缺陷使得传统无监督图像分类的准确率急剧下降。本文提出首先利用流形学习方法提取原始数据集的非线性结构,然后通过聚类算法生成伪标签。通过这种方法得到的伪标签可以有效地保留原始数据的特殊数学结构,而且准确率很高。利用这些伪标签对神经网络进行训练,就可以得到一个无监督的可用图像分类器。该分类器可在小规模数据上进行训练,然后应用于大规模数据集,从而节省了人工标注的成本。实验通过设立一个对照组和两个流形学习组,分别使用 LLE 算法和 Isomap 算法提取非线性结构。之后,完成伪标签的制作和神经网络的训练,并比较三组的准确性。最后得出结论:经过流形学习算法提取非线性结构的两组正确率远高于另一组,基于 Isomap 算法的图像分类器在测试集中的准确率达到了 85%,具有很强的实用性。
Unsupervised Image Classifier based on Manifold Learning
Currently most of image classification tasks are achieved by supervised learning. High-quality datasets naturally bring difficulties in annotation, and the datasets in real-world applications present a nonlinear structure, and the annotation cost grows exponentially with the number of targets and the difficulty of recognisability. In this context, research about unsupervised image classification is the way to go. Traditional unsupervised learning for classification is mostly based on the Euclidean distance and various paradigms, which is unable to extract the nonlinear structure of the dataset. This shortcoming makes the accuracy of traditional unsupervised image classification drop drastically. In this paper, we propose to first extract the nonlinear structure of the original dataset using the manifold learning method, and then produce pseudo-labels through the agglomerative clustering algorithm. The pseudo-labels obtained in this way can effectively retain the special mathematical structure of the original data with high accuracy. The neural network is trained with these pseudo labels to obtain an unsupervised usable image classifier. The classifier can be trained on small-scale data and then applied to large-scale data sets, thus saving the cost of manual labelling. The experiments are carried out by setting up a control group and two manifold learning groups for the extraction of non-linear structures using LLE and Isomap algorithms respectively. After that, the production of pseudo-labels and the training of neural networks are completed, and the accuracy of the three groups is compared. Finally, it is concluded that the correct rate of the two groups that have gone through the manifold learning algorithm to extract the nonlinear structure is much higher than that of the other one, and the image classifier based on the Isomap algorithm achieves an accuracy of 85% in the test set, which is highly practical.