{"title":"Short communication: The Wasserstein distance as a dissimilarity metric for comparing detrital age spectra and other geological distributions","authors":"A. Lipp, P. Vermeesch","doi":"10.5194/gchron-5-263-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Distributional data such as detrital age populations or grain size distributions are common in the geological sciences. As analytical techniques become more sophisticated, increasingly large amounts of distributional data are being gathered. These advances require quantitative and objective methods, such as multidimensional scaling (MDS), to analyse large numbers of samples. Crucial to such methods is choosing a sensible measure of dissimilarity between samples. At present, the Kolmogorov–Smirnov (KS) statistic is the most widely used of these dissimilarity measures. However, the KS statistic has some limitations such as high sensitivity to differences between the modes of two distributions and insensitivity to their tails. Here, we propose the Wasserstein-2 distance (W2) as an additional and alternative metric for use in geochronology. Whereas the KS distance is defined as the maximum vertical distance between two empirical cumulative distribution functions, the W2 distance is a function of the horizontal distances (i.e. age differences) between observations. Using a variety of synthetic and real datasets, we explore scenarios where the W2 may provide greater geological insight than the KS statistic. We find that in cases where absolute time differences are not relevant (e.g. mixing of known, discrete age peaks), the KS statistic can be more intuitive. However, in scenarios where absolute age differences are important (e.g. temporally and/or spatially evolving sources, thermochronology, and overcoming laboratory biases), W2 is preferable. The W2 distance has been added to the R package, IsoplotR, for immediate use in detrital geochronology and other applications. The W2 distance can be generalized to multiple dimensions, which opens opportunities beyond distributional data.\n","PeriodicalId":12723,"journal":{"name":"Geochronology","volume":"9 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geochronology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/gchron-5-263-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 2
Abstract
Abstract. Distributional data such as detrital age populations or grain size distributions are common in the geological sciences. As analytical techniques become more sophisticated, increasingly large amounts of distributional data are being gathered. These advances require quantitative and objective methods, such as multidimensional scaling (MDS), to analyse large numbers of samples. Crucial to such methods is choosing a sensible measure of dissimilarity between samples. At present, the Kolmogorov–Smirnov (KS) statistic is the most widely used of these dissimilarity measures. However, the KS statistic has some limitations such as high sensitivity to differences between the modes of two distributions and insensitivity to their tails. Here, we propose the Wasserstein-2 distance (W2) as an additional and alternative metric for use in geochronology. Whereas the KS distance is defined as the maximum vertical distance between two empirical cumulative distribution functions, the W2 distance is a function of the horizontal distances (i.e. age differences) between observations. Using a variety of synthetic and real datasets, we explore scenarios where the W2 may provide greater geological insight than the KS statistic. We find that in cases where absolute time differences are not relevant (e.g. mixing of known, discrete age peaks), the KS statistic can be more intuitive. However, in scenarios where absolute age differences are important (e.g. temporally and/or spatially evolving sources, thermochronology, and overcoming laboratory biases), W2 is preferable. The W2 distance has been added to the R package, IsoplotR, for immediate use in detrital geochronology and other applications. The W2 distance can be generalized to multiple dimensions, which opens opportunities beyond distributional data.