使用质量比方差离群因子的无参数砾岩最近邻分类器

International journal of machine learning and computing Pub Date : 2023-01-01 DOI:10.18178/ijml.2023.13.4.1145

Patcharasiri Fuangfoo, Krung Sinapiromsaran

{"title":"使用质量比方差离群因子的无参数砾岩最近邻分类器","authors":"Patcharasiri Fuangfoo, Krung Sinapiromsaran","doi":"10.18178/ijml.2023.13.4.1145","DOIUrl":null,"url":null,"abstract":"Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a global parameter k to proceed. This global parameter may not be suitable for all instances. Naturally, each instance may situate on different regions of clusters such as an interior instance placed inside a cluster, a border instance placed on the outskirts, an outer instance placed faraway from any cluster, which requires a different number of neighbors. To automatically assign a different number of neighbors to each instance, the concept of scoring from the anomaly detection research is desired. The Mass-ratio-variance Outlier Factor, MOF, is selected as the scoring scheme for the number of neighbors of each instance. MOF gives the highest score to an instance placed very far from any cluster and the lowest score to an instance surrounded by other instances. This leads to the proposed classifier called the conglomerate nearest neighbor classifier, which does not require any parameter assigning the appropriate number of neighbors to each instance ordered by MOF. Experimental results show that this classifier exhibits similar accuracy to the k-nearest neighbor algorithm with the best k over the synthesized datasets. Six UCI datasets, the QSAR dataset, the German dataset, the Cancer dataset, the Wholesale dataset, the Haberman dataset, and the Glass3 dataset are used in the experiment. This method outperforms two UCI datasets, Wholesale and Glass3, and displays similar performance with respect to these six UCI datasets.","PeriodicalId":91709,"journal":{"name":"International journal of machine learning and computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parameter-Free Conglomerate nearest Neighbor Classifier Using Mass-Ratio-Variance Outlier Factors\",\"authors\":\"Patcharasiri Fuangfoo, Krung Sinapiromsaran\",\"doi\":\"10.18178/ijml.2023.13.4.1145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a global parameter k to proceed. This global parameter may not be suitable for all instances. Naturally, each instance may situate on different regions of clusters such as an interior instance placed inside a cluster, a border instance placed on the outskirts, an outer instance placed faraway from any cluster, which requires a different number of neighbors. To automatically assign a different number of neighbors to each instance, the concept of scoring from the anomaly detection research is desired. The Mass-ratio-variance Outlier Factor, MOF, is selected as the scoring scheme for the number of neighbors of each instance. MOF gives the highest score to an instance placed very far from any cluster and the lowest score to an instance surrounded by other instances. This leads to the proposed classifier called the conglomerate nearest neighbor classifier, which does not require any parameter assigning the appropriate number of neighbors to each instance ordered by MOF. Experimental results show that this classifier exhibits similar accuracy to the k-nearest neighbor algorithm with the best k over the synthesized datasets. Six UCI datasets, the QSAR dataset, the German dataset, the Cancer dataset, the Wholesale dataset, the Haberman dataset, and the Glass3 dataset are used in the experiment. This method outperforms two UCI datasets, Wholesale and Glass3, and displays similar performance with respect to these six UCI datasets.\",\"PeriodicalId\":91709,\"journal\":{\"name\":\"International journal of machine learning and computing\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of machine learning and computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18178/ijml.2023.13.4.1145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of machine learning and computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18178/ijml.2023.13.4.1145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

分类是机器学习中的一个重要领域，它通过分类器从已知类别的历史数据中标记实例的类别。其中一个流行的分类器是k- nn，它代表“k近邻”，需要一个全局参数k来进行分类。此全局参数可能不适合所有实例。当然，每个实例可能位于集群的不同区域，例如内部实例放置在集群内，边界实例放置在外围，外部实例放置在远离任何集群的地方，这需要不同数量的邻居。为了给每个实例自动分配不同数量的邻居，需要异常检测研究中的评分概念。选取质量比方差离群因子(Mass-ratio-variance Outlier Factor, MOF)作为每个实例的邻居数的评分方案。MOF给离集群很远的实例最高分，给被其他实例包围的实例最低分。这导致了所提出的分类器称为组合最近邻分类器，它不需要任何参数为按MOF排序的每个实例分配适当数量的邻居。实验结果表明，该分类器在合成数据集上具有与k近邻算法相似的最佳k值。实验中使用了六个UCI数据集:QSAR数据集、德国数据集、Cancer数据集、Wholesale数据集、Haberman数据集和Glass3数据集。该方法优于两个UCI数据集Wholesale和Glass3，并且在这六个UCI数据集上显示出相似的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Parameter-Free Conglomerate nearest Neighbor Classifier Using Mass-Ratio-Variance Outlier Factors

Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a global parameter k to proceed. This global parameter may not be suitable for all instances. Naturally, each instance may situate on different regions of clusters such as an interior instance placed inside a cluster, a border instance placed on the outskirts, an outer instance placed faraway from any cluster, which requires a different number of neighbors. To automatically assign a different number of neighbors to each instance, the concept of scoring from the anomaly detection research is desired. The Mass-ratio-variance Outlier Factor, MOF, is selected as the scoring scheme for the number of neighbors of each instance. MOF gives the highest score to an instance placed very far from any cluster and the lowest score to an instance surrounded by other instances. This leads to the proposed classifier called the conglomerate nearest neighbor classifier, which does not require any parameter assigning the appropriate number of neighbors to each instance ordered by MOF. Experimental results show that this classifier exhibits similar accuracy to the k-nearest neighbor algorithm with the best k over the synthesized datasets. Six UCI datasets, the QSAR dataset, the German dataset, the Cancer dataset, the Wholesale dataset, the Haberman dataset, and the Glass3 dataset are used in the experiment. This method outperforms two UCI datasets, Wholesale and Glass3, and displays similar performance with respect to these six UCI datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of machine learning and computing

自引率

0.00%

发文量