{"title":"Outlier Detection with Cluster Catch Digraphs","authors":"Rui Shi, Nedret Billor, Elvan Ceyhan","doi":"arxiv-2409.11596","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel family of outlier detection algorithms based on\nCluster Catch Digraphs (CCDs), specifically tailored to address the challenges\nof high dimensionality and varying cluster shapes, which deteriorate the\nperformance of most traditional outlier detection methods. We propose the\nUniformity-Based CCD with Mutual Catch Graph (U-MCCD), the Uniformity- and\nNeighbor-Based CCD with Mutual Catch Graph (UN-MCCD), and their shape-adaptive\nvariants (SU-MCCD and SUN-MCCD), which are designed to detect outliers in data\nsets with arbitrary cluster shapes and high dimensions. We present the\nadvantages and shortcomings of these algorithms and provide the motivation or\nneed to define each particular algorithm. Through comprehensive Monte Carlo\nsimulations, we assess their performance and demonstrate the robustness and\neffectiveness of our algorithms across various settings and contamination\nlevels. We also illustrate the use of our algorithms on various real-life data\nsets. The U-MCCD algorithm efficiently identifies outliers while maintaining\nhigh true negative rates, and the SU-MCCD algorithm shows substantial\nimprovement in handling non-uniform clusters. Additionally, the UN-MCCD and\nSUN-MCCD algorithms address the limitations of existing methods in\nhigh-dimensional spaces by utilizing Nearest Neighbor Distances (NND) for\nclustering and outlier detection. Our results indicate that these novel\nalgorithms offer substantial advancements in the accuracy and adaptability of\noutlier detection, providing a valuable tool for various real-world\napplications. Keyword: Outlier detection, Graph-based clustering, Cluster catch digraphs,\n$k$-nearest-neighborhood, Mutual catch graphs, Nearest neighbor distance.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces a novel family of outlier detection algorithms based on
Cluster Catch Digraphs (CCDs), specifically tailored to address the challenges
of high dimensionality and varying cluster shapes, which deteriorate the
performance of most traditional outlier detection methods. We propose the
Uniformity-Based CCD with Mutual Catch Graph (U-MCCD), the Uniformity- and
Neighbor-Based CCD with Mutual Catch Graph (UN-MCCD), and their shape-adaptive
variants (SU-MCCD and SUN-MCCD), which are designed to detect outliers in data
sets with arbitrary cluster shapes and high dimensions. We present the
advantages and shortcomings of these algorithms and provide the motivation or
need to define each particular algorithm. Through comprehensive Monte Carlo
simulations, we assess their performance and demonstrate the robustness and
effectiveness of our algorithms across various settings and contamination
levels. We also illustrate the use of our algorithms on various real-life data
sets. The U-MCCD algorithm efficiently identifies outliers while maintaining
high true negative rates, and the SU-MCCD algorithm shows substantial
improvement in handling non-uniform clusters. Additionally, the UN-MCCD and
SUN-MCCD algorithms address the limitations of existing methods in
high-dimensional spaces by utilizing Nearest Neighbor Distances (NND) for
clustering and outlier detection. Our results indicate that these novel
algorithms offer substantial advancements in the accuracy and adaptability of
outlier detection, providing a valuable tool for various real-world
applications. Keyword: Outlier detection, Graph-based clustering, Cluster catch digraphs,
$k$-nearest-neighborhood, Mutual catch graphs, Nearest neighbor distance.