{"title":"New instability results for high-dimensional nearest neighbor search","authors":"Chris Giannella","doi":"10.1016/j.ipl.2009.07.012","DOIUrl":null,"url":null,"abstract":"<div><p>Consider a dataset of <span><math><mi>n</mi><mo>(</mo><mi>d</mi><mo>)</mo></math></span> points generated independently from <span><math><msup><mi>R</mi><mi>d</mi></msup></math></span> according to a common p.d.f. <span><math><msub><mi>f</mi><mi>d</mi></msub></math></span> with <span><math><mi>support</mi><mo>(</mo><msub><mi>f</mi><mi>d</mi></msub><mo>)</mo><mo>=</mo><msup><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mi>d</mi></msup></math></span> and <span><math><mi>sup</mi><mo>{</mo><msub><mi>f</mi><mi>d</mi></msub><mo>(</mo><msup><mi>R</mi><mi>d</mi></msup><mo>)</mo><mo>}</mo></math></span> growing sub-exponentially in <em>d</em>. We prove that: (i) if <span><math><mi>n</mi><mo>(</mo><mi>d</mi><mo>)</mo></math></span> grows sub-exponentially in <em>d</em>, then, for any query point <span><math><mover><mi>q</mi><mo>→</mo></mover><mo>∈</mo><msup><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mi>d</mi></msup></math></span> and any <span><math><mi>ϵ</mi><mo>></mo><mn>0</mn></math></span>, the ratio of the distance between any two dataset points and <span><math><mover><mi>q</mi><mo>→</mo></mover></math></span> is less that <span><math><mn>1</mn><mo>+</mo><mi>ϵ</mi></math></span> with probability →1 as <span><math><mi>d</mi><mo>→</mo><mo>∞</mo></math></span>; (ii) if <span><math><mi>n</mi><mo>(</mo><mi>d</mi><mo>)</mo><mo>></mo><msup><mrow><mo>[</mo><mn>4</mn><mo>(</mo><mn>1</mn><mo>+</mo><mi>ϵ</mi><mo>)</mo><mo>]</mo></mrow><mi>d</mi></msup></math></span> for large <em>d</em>, then for all <span><math><mover><mi>q</mi><mo>→</mo></mover><mo>∈</mo><msup><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mi>d</mi></msup></math></span> (except a small subset) and any <span><math><mi>ϵ</mi><mo>></mo><mn>0</mn></math></span>, the distance ratio is less than <span><math><mn>1</mn><mo>+</mo><mi>ϵ</mi></math></span> with limiting probability strictly bounded away from one. Moreover, we provide preliminary results along the lines of (i) when <span><math><msub><mi>f</mi><mi>d</mi></msub><mo>=</mo><mi>N</mi><mo>(</mo><msub><mover><mi>μ</mi><mo>→</mo></mover><mi>d</mi></msub><mo>,</mo><msub><mi>Σ</mi><mi>d</mi></msub><mo>)</mo></math></span>.</p></div>","PeriodicalId":56290,"journal":{"name":"Information Processing Letters","volume":"109 19","pages":"Pages 1109-1113"},"PeriodicalIF":0.6000,"publicationDate":"2009-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ipl.2009.07.012","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020019009002257","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 10
Abstract
Consider a dataset of points generated independently from according to a common p.d.f. with and growing sub-exponentially in d. We prove that: (i) if grows sub-exponentially in d, then, for any query point and any , the ratio of the distance between any two dataset points and is less that with probability →1 as ; (ii) if for large d, then for all (except a small subset) and any , the distance ratio is less than with limiting probability strictly bounded away from one. Moreover, we provide preliminary results along the lines of (i) when .
期刊介绍:
Information Processing Letters invites submission of original research articles that focus on fundamental aspects of information processing and computing. This naturally includes work in the broadly understood field of theoretical computer science; although papers in all areas of scientific inquiry will be given consideration, provided that they describe research contributions credibly motivated by applications to computing and involve rigorous methodology. High quality experimental papers that address topics of sufficiently broad interest may also be considered.
Since its inception in 1971, Information Processing Letters has served as a forum for timely dissemination of short, concise and focused research contributions. Continuing with this tradition, and to expedite the reviewing process, manuscripts are generally limited in length to nine pages when they appear in print.