Gerardo A. Lo Valvo, Oscar E. R. Lehmann, Diego Balseiro
{"title":"A novel distance that reduces information loss in continuous characters with few observations","authors":"Gerardo A. Lo Valvo, Oscar E. R. Lehmann, Diego Balseiro","doi":"10.26879/1250","DOIUrl":null,"url":null,"abstract":"The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R program-ming language.","PeriodicalId":56100,"journal":{"name":"Palaeontologia Electronica","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Palaeontologia Electronica","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.26879/1250","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the character spans an interval, and pairs of objects can have overlapping intervals, which further complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computationally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R program-ming language.
期刊介绍:
Founded in 1997, Palaeontologia Electronica (PE) is the longest running open-access, peer-reviewed electronic journal and covers all aspects of palaeontology. PE uses an external double-blind peer review system for all manuscripts. Copyright of scientific papers is held by one of the three sponsoring professional societies at the author''s choice. Reviews, commentaries, and other material is placed in the public domain. PE papers comply with regulations for taxonomic nomenclature established in the International Code of Zoological Nomenclature and the International Code of Nomenclature for Algae, Fungi, and Plants.