Xu Tan , Jiawei Yang , Junqi Chen , Sylwan Rahardja , Susanto Rahardja
{"title":"MSS-PAE: Saving Autoencoder-based Outlier Detection from Unexpected Reconstruction","authors":"Xu Tan , Jiawei Yang , Junqi Chen , Sylwan Rahardja , Susanto Rahardja","doi":"10.1016/j.patcog.2025.111467","DOIUrl":null,"url":null,"abstract":"<div><div>The Autoencoder (AE) is popular in Outlier Detection (OD) now due to its strong modeling ability. However, AE-based OD methods face the unexpected reconstruction problem: outliers are reconstructed with low errors, impeding their distinction from inliers. This stems from two aspects. First, AE may overconfidently produce good reconstructions in regions where outliers or potential outliers exist while using the mean squared error. To address this, the aleatoric uncertainty was introduced to construct the Probabilistic Autoencoder (PAE), and the Weighted Negative Log-Likelihood (WNLL) was proposed to enlarge the score disparity between inliers and outliers. Second, AE focuses on global modeling yet lacks the perception of local information. Therefore, the Mean-Shift Scoring (MSS) method was proposed to utilize the local relationship of data to reduce the false inliers caused by AE. Moreover, experiments on 32 real-world OD datasets proved the effectiveness of the proposed methods. The combination of WNLL and MSS achieved 45% relative performance improvement compared to the best baseline. In addition, MSS improved the detection performance of multiple AE-based outlier detectors by an average of 20%. The proposed methods have the potential to advance AE’s development in OD.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"163 ","pages":"Article 111467"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032500127X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The Autoencoder (AE) is popular in Outlier Detection (OD) now due to its strong modeling ability. However, AE-based OD methods face the unexpected reconstruction problem: outliers are reconstructed with low errors, impeding their distinction from inliers. This stems from two aspects. First, AE may overconfidently produce good reconstructions in regions where outliers or potential outliers exist while using the mean squared error. To address this, the aleatoric uncertainty was introduced to construct the Probabilistic Autoencoder (PAE), and the Weighted Negative Log-Likelihood (WNLL) was proposed to enlarge the score disparity between inliers and outliers. Second, AE focuses on global modeling yet lacks the perception of local information. Therefore, the Mean-Shift Scoring (MSS) method was proposed to utilize the local relationship of data to reduce the false inliers caused by AE. Moreover, experiments on 32 real-world OD datasets proved the effectiveness of the proposed methods. The combination of WNLL and MSS achieved 45% relative performance improvement compared to the best baseline. In addition, MSS improved the detection performance of multiple AE-based outlier detectors by an average of 20%. The proposed methods have the potential to advance AE’s development in OD.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.