DFNO: Detecting Fuzzy Neighborhood Outliers

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-10-21 DOI:10.1109/TKDE.2024.3484448

Zhong Yuan;Peng Hu;Hongmei Chen;Yingke Chen;Qilin Li

{"title":"DFNO: Detecting Fuzzy Neighborhood Outliers","authors":"Zhong Yuan;Peng Hu;Hongmei Chen;Yingke Chen;Qilin Li","doi":"10.1109/TKDE.2024.3484448","DOIUrl":null,"url":null,"abstract":"Outlier Detection (OD) has attracted extensive research due to its application in many fields. The idea of neighborhood computing is one of the widely used methods in outlier analysis. Nevertheless, these methods mainly use certainty strategies to model outlier detection, so they cannot effectively handle the fuzzy information in the dataset. Moreover, they mainly focus on dealing with outlier detection in numerical data and cannot effectively find outliers in mixed-attribute data. Fuzzy information granulation theory is an effective granular computing model that allows objects to belong to a set to a certain extent (i.e., membership degree), which makes it possible to better handle uncertainty problems such as fuzziness. In this work, we propose an outlier detection model based on fuzzy neighborhoods. First, a hybrid fuzzy similarity is constructed to granulate the set of objects to form fuzzy information granules. Second, the fuzzy \n<inline-formula><tex-math>$k$</tex-math></inline-formula>\n-nearest neighbor is defined to describe the fuzzy local information. Then, the fuzzy neighborhood density is defined to indicate the degree of aggregation of each object. The smaller the fuzzy neighborhood density of an object, the more likely it is to be an outlier. Based on this idea, the fuzzy neighborhood deviation degree is defined to quantify the degree of outliers of objects. Finally, the fuzzy deviation degree on the set of conditional attributes is constructed to indicate the outlier scores of objects. Experimental comparisons with state-of-the-art methods show that the proposed method has a significant improvement on the AUC index and applies to three types of data.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"200-209"},"PeriodicalIF":10.4000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10726700/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Outlier Detection (OD) has attracted extensive research due to its application in many fields. The idea of neighborhood computing is one of the widely used methods in outlier analysis. Nevertheless, these methods mainly use certainty strategies to model outlier detection, so they cannot effectively handle the fuzzy information in the dataset. Moreover, they mainly focus on dealing with outlier detection in numerical data and cannot effectively find outliers in mixed-attribute data. Fuzzy information granulation theory is an effective granular computing model that allows objects to belong to a set to a certain extent (i.e., membership degree), which makes it possible to better handle uncertainty problems such as fuzziness. In this work, we propose an outlier detection model based on fuzzy neighborhoods. First, a hybrid fuzzy similarity is constructed to granulate the set of objects to form fuzzy information granules. Second, the fuzzy

$k$

-nearest neighbor is defined to describe the fuzzy local information. Then, the fuzzy neighborhood density is defined to indicate the degree of aggregation of each object. The smaller the fuzzy neighborhood density of an object, the more likely it is to be an outlier. Based on this idea, the fuzzy neighborhood deviation degree is defined to quantify the degree of outliers of objects. Finally, the fuzzy deviation degree on the set of conditional attributes is constructed to indicate the outlier scores of objects. Experimental comparisons with state-of-the-art methods show that the proposed method has a significant improvement on the AUC index and applies to three types of data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DFNO：检测模糊邻域异常值

异常点检测（OD）由于在许多领域的应用而引起了广泛的研究。邻域计算思想是离群值分析中应用广泛的方法之一。然而，这些方法主要采用确定性策略对离群点检测进行建模，无法有效处理数据集中的模糊信息。此外，它们主要集中在处理数值数据中的离群点检测，不能有效地发现混合属性数据中的离群点。模糊信息粒化理论是一种有效的颗粒计算模型，它允许对象在一定程度上（即隶属度）属于一个集合，从而可以更好地处理模糊性等不确定性问题。在这项工作中，我们提出了一个基于模糊邻域的离群值检测模型。首先，构造混合模糊相似度，对目标集进行颗粒化，形成模糊信息颗粒；其次，定义模糊k近邻来描述模糊局部信息；然后，定义模糊邻域密度来表示每个目标的聚集程度。一个物体的模糊邻域密度越小，它就越有可能是一个离群值。在此基础上，定义模糊邻域偏差度，量化目标离群值的程度。最后，构造条件属性集上的模糊偏差度来表示对象的离群值。实验结果表明，该方法对AUC指数有显著提高，适用于三种类型的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.

期刊最新文献

2025 Reviewers List XiYan-SQL: A Novel Multi-Generator Framework for Text-to-SQL Toward Federated Learning of Deep Graph Neural Networks HCGBot: Learning Homophilous Context Graphs for Twitter Bot Detection Optimizing KBQA by Correcting LLM-Generated Non-Executable Logical Form Through Knowledge-Assisted Path Reconstruction