{"title":"两两相似性划分算法:一种使用任意相似性度量对地球科学和其他数据集进行无监督划分的方法","authors":"G. Petty","doi":"10.1175/aies-d-22-0005.1","DOIUrl":null,"url":null,"abstract":"\nA simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, non-overlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, though clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two non-trivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well-suited for isolating rare or anomalous members of a dataset. The method is inductive, in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.","PeriodicalId":94369,"journal":{"name":"Artificial intelligence for the earth systems","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Pairwise Similarity Partitioning algorithm: a method for unsupervised partitioning of geoscientific and other datasets using arbitrary similarity metrics\",\"authors\":\"G. Petty\",\"doi\":\"10.1175/aies-d-22-0005.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nA simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, non-overlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, though clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two non-trivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well-suited for isolating rare or anomalous members of a dataset. The method is inductive, in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.\",\"PeriodicalId\":94369,\"journal\":{\"name\":\"Artificial intelligence for the earth systems\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence for the earth systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1175/aies-d-22-0005.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence for the earth systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/aies-d-22-0005.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Pairwise Similarity Partitioning algorithm: a method for unsupervised partitioning of geoscientific and other datasets using arbitrary similarity metrics
A simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, non-overlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, though clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two non-trivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well-suited for isolating rare or anomalous members of a dataset. The method is inductive, in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.