{"title":"Statistical clustering Of curves in the geosciences - the answer to everything?","authors":"L. Hamilton","doi":"10.1109/OCEANSSYD.2010.5603858","DOIUrl":null,"url":null,"abstract":"Classification and exploration of large geodata sets consisting of tens to hundreds of thousands of single valued curves, e.g. oceanic wave spectra and grain size distributions, is often made through use of features or proxy parameters extracted from the curves. Feature extraction is typically enabled through curve fitting, and hence implicit or explicit application of a statistical model. Principal Components Analysis (PCA) is commonly applied to the proxies or to the curves as a dimensional reduction measure, and the first few principal components are used as the final classification features. Statistical clustering is then applied to the selected proxies or principal components to produce groups or classes which best describe the properties of the proxy data set, usually in a least squares sense. A far simpler and model free technique is to directly cluster the curves themselves. The curves are essentially treated as geometric entities, and calculation of features or proxies is unnecessary. The methodology for this concept is outlined, and is demonstrated for several geodata sets, ranging in size from a hundred objects to several tens of thousands, and for spatial scales of several tens of metres to the South Pacific ocean basin.","PeriodicalId":129808,"journal":{"name":"OCEANS'10 IEEE SYDNEY","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"OCEANS'10 IEEE SYDNEY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCEANSSYD.2010.5603858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Classification and exploration of large geodata sets consisting of tens to hundreds of thousands of single valued curves, e.g. oceanic wave spectra and grain size distributions, is often made through use of features or proxy parameters extracted from the curves. Feature extraction is typically enabled through curve fitting, and hence implicit or explicit application of a statistical model. Principal Components Analysis (PCA) is commonly applied to the proxies or to the curves as a dimensional reduction measure, and the first few principal components are used as the final classification features. Statistical clustering is then applied to the selected proxies or principal components to produce groups or classes which best describe the properties of the proxy data set, usually in a least squares sense. A far simpler and model free technique is to directly cluster the curves themselves. The curves are essentially treated as geometric entities, and calculation of features or proxies is unnecessary. The methodology for this concept is outlined, and is demonstrated for several geodata sets, ranging in size from a hundred objects to several tens of thousands, and for spatial scales of several tens of metres to the South Pacific ocean basin.