两两相似性划分算法:一种使用任意相似性度量对地球科学和其他数据集进行无监督划分的方法

Artificial intelligence for the earth systems Pub Date : 2022-08-26 DOI:10.1175/aies-d-22-0005.1

G. Petty

{"title":"两两相似性划分算法:一种使用任意相似性度量对地球科学和其他数据集进行无监督划分的方法","authors":"G. Petty","doi":"10.1175/aies-d-22-0005.1","DOIUrl":null,"url":null,"abstract":"\nA simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, non-overlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, though clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two non-trivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well-suited for isolating rare or anomalous members of a dataset. The method is inductive, in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.","PeriodicalId":94369,"journal":{"name":"Artificial intelligence for the earth systems","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Pairwise Similarity Partitioning algorithm: a method for unsupervised partitioning of geoscientific and other datasets using arbitrary similarity metrics\",\"authors\":\"G. Petty\",\"doi\":\"10.1175/aies-d-22-0005.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nA simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, non-overlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, though clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two non-trivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well-suited for isolating rare or anomalous members of a dataset. The method is inductive, in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.\",\"PeriodicalId\":94369,\"journal\":{\"name\":\"Artificial intelligence for the earth systems\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence for the earth systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1175/aies-d-22-0005.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence for the earth systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1175/aies-d-22-0005.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文描述了一种简单而灵活且鲁棒的算法，用于将任意数据集完全划分为紧凑的、不重叠的组或类，并根据大小进行排序，完全基于成对相似矩阵和用户指定的相似阈值。与许多聚类算法不同，它不假设数据集中存在自然聚类，尽管当存在聚类时，可能优先分配给一个或多个类。该方法也不需要在任何坐标系内比较数据对象，而是允许用户使用几乎任何可以想到的标准定义成对相似性。因此，该方法适合于传统聚类方法不适合的某些地球科学应用，包括作为示例的两个不同的数据集。除了识别包含许多相似数据集成员的大型类之外，它还非常适合于隔离数据集的罕见或异常成员。该方法是归纳的，因为在较大数据集的代表性子集中识别的原型可用于对其余部分进行分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Pairwise Similarity Partitioning algorithm: a method for unsupervised partitioning of geoscientific and other datasets using arbitrary similarity metrics

A simple yet flexible and robust algorithm is described for fully partitioning an arbitrary dataset into compact, non-overlapping groups or classes, sorted by size, based entirely on a pairwise similarity matrix and a user-specified similarity threshold. Unlike many clustering algorithms, there is no assumption that natural clusters exist in the dataset, though clusters, when present, may be preferentially assigned to one or more classes. The method also does not require data objects to be compared within any coordinate system but rather permits the user to define pairwise similarity using almost any conceivable criterion. The method therefore lends itself to certain geoscientific applications for which conventional clustering methods are unsuited, including two non-trivial and distinctly different datasets presented as examples. In addition to identifying large classes containing numerous similar dataset members, it is also well-suited for isolating rare or anomalous members of a dataset. The method is inductive, in that prototypes identified in representative subset of a larger dataset can be used to classify the remainder.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial intelligence for the earth systems

自引率

0.00%

发文量