Algorithms for biodistance analysis based on various squared Euclidean and generalized Mahalanobis distances combined with probabilistic hierarchical cluster analysis and multidimensional scaling
{"title":"Algorithms for biodistance analysis based on various squared Euclidean and generalized Mahalanobis distances combined with probabilistic hierarchical cluster analysis and multidimensional scaling","authors":"Efthymia Nikita, Panos Nikitas","doi":"10.1007/s12520-024-02098-y","DOIUrl":null,"url":null,"abstract":"<div><p>Biodistance analysis identifies groups that exhibit biological affinity based on phenotypic data. This study proposes and evaluates the performance of algorithms for biodistance analysis based on various squared Euclidean and generalized Mahalanobis distances by combining them with probabilistic hierarchical cluster analysis (HCA) and multidimensional scaling (MDS). Four archaeological datasets of human dental metrics and/or non-metric traits were used. To analyze the data, we integrated our previous work on biodistances and developed algorithms that calculate various types of squared Euclidean and generalized Mahalanobis distances, estimate various parameters, apply modified MDS and HCA methods to compute all possible cluster probabilities, and provide MDS confidence ellipses and dendrograms with cluster probabilities. All algorithms are implemented in R. From the data analysis, we found that all distances studied are simulated very satisfactorily by the Monte-Carlo method, resulting in the estimation of accurate cluster probabilities. Examining the probabilities of expected cluster formation, we found that these probabilities are higher when calculated using generalized Mahalanobis distances than the corresponding Euclidean distances. Therefore, the cluster probabilities supported that the generalized Mahalanobis distances are better than the corresponding Euclidean distances in cluster analysis. From a methodological point of view, clustering information concerning population affinities should not be based on a single dendrogram but instead be extracted from the list of the most frequent clusters obtained from all simulated dendrograms.</p></div>","PeriodicalId":8214,"journal":{"name":"Archaeological and Anthropological Sciences","volume":"16 12","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archaeological and Anthropological Sciences","FirstCategoryId":"89","ListUrlMain":"https://link.springer.com/article/10.1007/s12520-024-02098-y","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Biodistance analysis identifies groups that exhibit biological affinity based on phenotypic data. This study proposes and evaluates the performance of algorithms for biodistance analysis based on various squared Euclidean and generalized Mahalanobis distances by combining them with probabilistic hierarchical cluster analysis (HCA) and multidimensional scaling (MDS). Four archaeological datasets of human dental metrics and/or non-metric traits were used. To analyze the data, we integrated our previous work on biodistances and developed algorithms that calculate various types of squared Euclidean and generalized Mahalanobis distances, estimate various parameters, apply modified MDS and HCA methods to compute all possible cluster probabilities, and provide MDS confidence ellipses and dendrograms with cluster probabilities. All algorithms are implemented in R. From the data analysis, we found that all distances studied are simulated very satisfactorily by the Monte-Carlo method, resulting in the estimation of accurate cluster probabilities. Examining the probabilities of expected cluster formation, we found that these probabilities are higher when calculated using generalized Mahalanobis distances than the corresponding Euclidean distances. Therefore, the cluster probabilities supported that the generalized Mahalanobis distances are better than the corresponding Euclidean distances in cluster analysis. From a methodological point of view, clustering information concerning population affinities should not be based on a single dendrogram but instead be extracted from the list of the most frequent clusters obtained from all simulated dendrograms.
期刊介绍:
Archaeological and Anthropological Sciences covers the full spectrum of natural scientific methods with an emphasis on the archaeological contexts and the questions being studied. It bridges the gap between archaeologists and natural scientists providing a forum to encourage the continued integration of scientific methodologies in archaeological research.
Coverage in the journal includes: archaeology, geology/geophysical prospection, geoarchaeology, geochronology, palaeoanthropology, archaeozoology and archaeobotany, genetics and other biomolecules, material analysis and conservation science.
The journal is endorsed by the German Society of Natural Scientific Archaeology and Archaeometry (GNAA), the Hellenic Society for Archaeometry (HSC), the Association of Italian Archaeometrists (AIAr) and the Society of Archaeological Sciences (SAS).