Siqi Liu, Sidhanth Mohanty, Tselil Schramm, Elizabeth Yang
{"title":"Testing Thresholds for High-Dimensional Sparse Random Geometric Graphs","authors":"Siqi Liu, Sidhanth Mohanty, Tselil Schramm, Elizabeth Yang","doi":"10.1137/23m1545203","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Computing, Ahead of Print. <br/> Abstract. The random geometric graph model [math] is a distribution over graphs in which the edges capture a latent geometry. To sample [math], we identify each of our [math] vertices with an independently and uniformly sampled vector from the [math]-dimensional unit sphere [math], and we connect pairs of vertices whose vectors are “sufficiently close,” such that the marginal probability of an edge is [math]. Because of the underlying geometry, this model is natural for applications in data science and beyond. We investigate the problem of testing for this latent geometry, or, in other words, distinguishing an Erdős–Rényi graph [math] from a random geometric graph [math]. It is not too difficult to show that if [math] while [math] is held fixed, the two distributions become indistinguishable; we wish to understand how fast [math] must grow as a function of [math] for indistinguishability to occur. When [math] for constant [math], we prove that if [math], the total variation distance between the two distributions is close to 0; this improves upon the best previous bound of Brennan, Bresler, and Nagaraj (2020), which required [math], and further our result is nearly tight, resolving a conjecture of Bubeck, Ding, Eldan, and Rácz (2016) up to logarithmic factors. We also obtain improved upper bounds on the statistical indistinguishability thresholds in [math] for the full range of [math] satisfying [math], improving upon the previous bounds by polynomial factors. Our analysis uses the belief propagation algorithm to characterize the distributions of (subsets of) the random vectors conditioned on producing a particular graph. In this sense, our analysis is connected to the “cavity method” from statistical physics. To analyze this process, we rely on novel sharp estimates for the area of the intersection of a random sphere cap with an arbitrary subset of [math], which we prove using optimal transport maps and entropy-transport inequalities on the unit sphere. We believe these techniques may be of independent interest.","PeriodicalId":49532,"journal":{"name":"SIAM Journal on Computing","volume":"25 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1137/23m1545203","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
SIAM Journal on Computing, Ahead of Print. Abstract. The random geometric graph model [math] is a distribution over graphs in which the edges capture a latent geometry. To sample [math], we identify each of our [math] vertices with an independently and uniformly sampled vector from the [math]-dimensional unit sphere [math], and we connect pairs of vertices whose vectors are “sufficiently close,” such that the marginal probability of an edge is [math]. Because of the underlying geometry, this model is natural for applications in data science and beyond. We investigate the problem of testing for this latent geometry, or, in other words, distinguishing an Erdős–Rényi graph [math] from a random geometric graph [math]. It is not too difficult to show that if [math] while [math] is held fixed, the two distributions become indistinguishable; we wish to understand how fast [math] must grow as a function of [math] for indistinguishability to occur. When [math] for constant [math], we prove that if [math], the total variation distance between the two distributions is close to 0; this improves upon the best previous bound of Brennan, Bresler, and Nagaraj (2020), which required [math], and further our result is nearly tight, resolving a conjecture of Bubeck, Ding, Eldan, and Rácz (2016) up to logarithmic factors. We also obtain improved upper bounds on the statistical indistinguishability thresholds in [math] for the full range of [math] satisfying [math], improving upon the previous bounds by polynomial factors. Our analysis uses the belief propagation algorithm to characterize the distributions of (subsets of) the random vectors conditioned on producing a particular graph. In this sense, our analysis is connected to the “cavity method” from statistical physics. To analyze this process, we rely on novel sharp estimates for the area of the intersection of a random sphere cap with an arbitrary subset of [math], which we prove using optimal transport maps and entropy-transport inequalities on the unit sphere. We believe these techniques may be of independent interest.
期刊介绍:
The SIAM Journal on Computing aims to provide coverage of the most significant work going on in the mathematical and formal aspects of computer science and nonnumerical computing. Submissions must be clearly written and make a significant technical contribution. Topics include but are not limited to analysis and design of algorithms, algorithmic game theory, data structures, computational complexity, computational algebra, computational aspects of combinatorics and graph theory, computational biology, computational geometry, computational robotics, the mathematical aspects of programming languages, artificial intelligence, computational learning, databases, information retrieval, cryptography, networks, distributed computing, parallel algorithms, and computer architecture.