{"title":"Corruption-based anomaly detection and interpretation in tabular data","authors":"Chunghyup Mok , Seoung Bum Kim","doi":"10.1016/j.patcog.2024.111149","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in self-supervised learning (SSL) have proven crucial in effectively learning representations of unstructured data, encompassing text, images, and audio. Although the applications of these advances in anomaly detection have been explored extensively, applying SSL to tabular data presents challenges because of the absence of prior information on data structure. In response, we propose a framework for anomaly detection in tabular datasets using variable corruption. Through selective variable corruption and assignment of new labels based on the degree of corruption, our framework can effectively distinguish between normal and abnormal data. Furthermore, analyzing the impact of corruption on anomaly scores aids in the identification of important variables. Experimental results obtained from various tabular datasets validate the precision and applicability of the proposed method. The source code can be accessed at <span><span>https://github.com/mokch/CAIT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111149"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009002","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in self-supervised learning (SSL) have proven crucial in effectively learning representations of unstructured data, encompassing text, images, and audio. Although the applications of these advances in anomaly detection have been explored extensively, applying SSL to tabular data presents challenges because of the absence of prior information on data structure. In response, we propose a framework for anomaly detection in tabular datasets using variable corruption. Through selective variable corruption and assignment of new labels based on the degree of corruption, our framework can effectively distinguish between normal and abnormal data. Furthermore, analyzing the impact of corruption on anomaly scores aids in the identification of important variables. Experimental results obtained from various tabular datasets validate the precision and applicability of the proposed method. The source code can be accessed at https://github.com/mokch/CAIT.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.