Tutorial: Are You My Neighbor?: Bringing Order to Neighbor Computing Problems.

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI:10.1145/3292500.3332292

D. Anastasiu, H. Rangwala, Andrea Tagarelli

{"title":"Tutorial: Are You My Neighbor?: Bringing Order to Neighbor Computing Problems.","authors":"D. Anastasiu, H. Rangwala, Andrea Tagarelli","doi":"10.1145/3292500.3332292","DOIUrl":null,"url":null,"abstract":"Finding nearest neighbors is an important topic that has attracted much attention over the years and has applications in many fields, such as market basket analysis, plagiarism and anomaly detection, community detection, ligand-based virtual screening, etc. As data are easier and easier to collect, finding neighbors has become a potential bottleneck in analysis pipelines. Performing pairwise comparisons given the massive datasets of today is no longer feasible. The high computational complexity of the task has led researchers to develop approximate methods, which find many but not all of the nearest neighbors. Yet, for some types of data, efficient exact solutions have been found by carefully partitioning or filtering the search space in a way that avoids most unnecessary comparisons. In recent years, there have been several fundamental advances in our ability to efficiently identify appropriate neighbors, especially in non-traditional data, such as graphs or document collections. In this tutorial, we provide an in-depth overview of recent methods for finding (nearest) neighbors, focusing on the intuition behind choices made in the design of those algorithms and on the utility of the methods in real-world applications. Our tutorial aims to provide a unifying view of \"neighbor computing\" problems, spanning from numerical data to graph data, from categorical data to sequential data, and related application scenarios. For each type of data, we will review the current state-of-the-art approaches used to identify neighbors and discuss how neighbor search methods are used to solve important problems.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292500.3332292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Finding nearest neighbors is an important topic that has attracted much attention over the years and has applications in many fields, such as market basket analysis, plagiarism and anomaly detection, community detection, ligand-based virtual screening, etc. As data are easier and easier to collect, finding neighbors has become a potential bottleneck in analysis pipelines. Performing pairwise comparisons given the massive datasets of today is no longer feasible. The high computational complexity of the task has led researchers to develop approximate methods, which find many but not all of the nearest neighbors. Yet, for some types of data, efficient exact solutions have been found by carefully partitioning or filtering the search space in a way that avoids most unnecessary comparisons. In recent years, there have been several fundamental advances in our ability to efficiently identify appropriate neighbors, especially in non-traditional data, such as graphs or document collections. In this tutorial, we provide an in-depth overview of recent methods for finding (nearest) neighbors, focusing on the intuition behind choices made in the design of those algorithms and on the utility of the methods in real-world applications. Our tutorial aims to provide a unifying view of "neighbor computing" problems, spanning from numerical data to graph data, from categorical data to sequential data, and related application scenarios. For each type of data, we will review the current state-of-the-art approaches used to identify neighbors and discuss how neighbor search methods are used to solve important problems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

教程:你是我的邻居吗?:为邻居计算问题带来秩序。

寻找最近邻是近年来备受关注的一个重要课题，在市场购物篮分析、抄袭与异常检测、社区检测、基于配体的虚拟筛选等诸多领域都有应用。随着数据越来越容易收集，寻找邻居已经成为分析管道中的一个潜在瓶颈。考虑到今天的海量数据集，两两比较已不再可行。这项任务的高计算复杂性促使研究人员开发出近似方法，这种方法可以找到许多但不是所有的最近邻。然而，对于某些类型的数据，通过以避免大多数不必要的比较的方式仔细划分或过滤搜索空间，已经找到了有效的精确解决方案。近年来，我们在有效地识别适当的邻居的能力方面取得了一些根本性的进步，特别是在非传统数据中，如图形或文档集合。在本教程中，我们将深入概述查找(最近)邻居的最新方法，重点关注这些算法设计中所做选择背后的直觉，以及这些方法在实际应用程序中的实用性。我们的教程旨在提供“邻居计算”问题的统一视图，从数值数据到图形数据，从分类数据到顺序数据，以及相关的应用场景。对于每种类型的数据，我们将回顾当前用于识别邻居的最先进方法，并讨论如何使用邻居搜索方法来解决重要问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

自引率

0.00%

发文量

期刊最新文献

Tackle Balancing Constraint for Incremental Semi-Supervised Support Vector Learning HATS Temporal Probabilistic Profiles for Sepsis Prediction in the ICU Large-scale User Visits Understanding and Forecasting with Deep Spatial-Temporal Tensor Factorization Framework Adaptive Influence Maximization