{"title":"Nearest neighbour distributions: New statistical measures for cosmological clustering","authors":"Arka Banerjee, T. Abel","doi":"10.1093/mnras/staa3604","DOIUrl":null,"url":null,"abstract":"The use of summary statistics beyond the two-point correlation function to analyze the non-Gaussian clustering on small scales is an active field of research in cosmology. In this paper, we explore a set of new summary statistics -- the $k$-Nearest Neighbor Cumulative Distribution Functions ($k{\\rm NN}$-${\\rm CDF}$). This is the empirical cumulative distribution function of distances from a set of volume-filling, Poisson distributed random points to the $k$--nearest data points, and is sensitive to all connected $N$--point correlations in the data. The $k{\\rm NN}$-${\\rm CDF}$ can be used to measure counts in cell, void probability distributions and higher $N$--point correlation functions, all using the same formalism exploiting fast searches with spatial tree data structures. We demonstrate how it can be computed efficiently from various data sets - both discrete points, and the generalization for continuous fields. We use data from a large suite of $N$-body simulations to explore the sensitivity of this new statistic to various cosmological parameters, compared to the two-point correlation function, while using the same range of scales. We demonstrate that the use of $k{\\rm NN}$-${\\rm CDF}$ improves the constraints on the cosmological parameters by more than a factor of $2$ when applied to the clustering of dark matter in the range of scales between $10h^{-1}{\\rm Mpc}$ and $40h^{-1}{\\rm Mpc}$. We also show that relative improvement is even greater when applied on the same scales to the clustering of halos in the simulations at a fixed number density, both in real space, as well as in redshift space. Since the $k{\\rm NN}$-${\\rm CDF}$ are sensitive to all higher order connected correlation functions in the data, the gains over traditional two-point analyses are expected to grow as progressively smaller scales are included in the analysis of cosmological data.","PeriodicalId":8431,"journal":{"name":"arXiv: Cosmology and Nongalactic Astrophysics","volume":"78 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Cosmology and Nongalactic Astrophysics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/mnras/staa3604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 39
Abstract
The use of summary statistics beyond the two-point correlation function to analyze the non-Gaussian clustering on small scales is an active field of research in cosmology. In this paper, we explore a set of new summary statistics -- the $k$-Nearest Neighbor Cumulative Distribution Functions ($k{\rm NN}$-${\rm CDF}$). This is the empirical cumulative distribution function of distances from a set of volume-filling, Poisson distributed random points to the $k$--nearest data points, and is sensitive to all connected $N$--point correlations in the data. The $k{\rm NN}$-${\rm CDF}$ can be used to measure counts in cell, void probability distributions and higher $N$--point correlation functions, all using the same formalism exploiting fast searches with spatial tree data structures. We demonstrate how it can be computed efficiently from various data sets - both discrete points, and the generalization for continuous fields. We use data from a large suite of $N$-body simulations to explore the sensitivity of this new statistic to various cosmological parameters, compared to the two-point correlation function, while using the same range of scales. We demonstrate that the use of $k{\rm NN}$-${\rm CDF}$ improves the constraints on the cosmological parameters by more than a factor of $2$ when applied to the clustering of dark matter in the range of scales between $10h^{-1}{\rm Mpc}$ and $40h^{-1}{\rm Mpc}$. We also show that relative improvement is even greater when applied on the same scales to the clustering of halos in the simulations at a fixed number density, both in real space, as well as in redshift space. Since the $k{\rm NN}$-${\rm CDF}$ are sensitive to all higher order connected correlation functions in the data, the gains over traditional two-point analyses are expected to grow as progressively smaller scales are included in the analysis of cosmological data.