{"title":"Outlier detection for set-valued data based on rough set theory and granular computing","authors":"Hai Lin, Zhaowen Li","doi":"10.1080/03081079.2022.2132491","DOIUrl":null,"url":null,"abstract":"Outlier detection has been broadly used in industrial practices such as public security and fraud detection, etc. Outlier detection from various perspectives against different backgrounds has been proposed. However, most of outlier detection consider categorical or numerical data. There are few researches on outlier detection for set-valued data, and a set-valued information system (SVIS) is a proper way of tackling the problem of missing values in data sets. This paper investigates outlier detection for set-valued data based on rough set theory (RST) and granular computing (GrC). First, the similarity between two information values in an SVIS is introduced and a variable parameter to control the similarity is given. Then, the tolerance relations on the object set are defined, and based on this tolerance relation, θ-lower and θ-upper approximations in an SVIS are put forward. Next, the outlier factor in an SVIS is presented and applied to various data sets. Finally, outlier detection method for set-valued data based on RST and GrC is proposed, and the corresponding algorithms are designed. Through numerical experiments based on UCI, the designed algorithm is compared with six other detection algorithms. The experimental results show the designed algorithm is arguably the best choice under the context of an SVIS. It is worth mentioning that for a comprehensive comparison, we use two criteria: AUC value and F-1 measure, to show the superiority of the designed algorithm.","PeriodicalId":50322,"journal":{"name":"International Journal of General Systems","volume":"52 1","pages":"385 - 413"},"PeriodicalIF":2.4000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of General Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/03081079.2022.2132491","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Outlier detection has been broadly used in industrial practices such as public security and fraud detection, etc. Outlier detection from various perspectives against different backgrounds has been proposed. However, most of outlier detection consider categorical or numerical data. There are few researches on outlier detection for set-valued data, and a set-valued information system (SVIS) is a proper way of tackling the problem of missing values in data sets. This paper investigates outlier detection for set-valued data based on rough set theory (RST) and granular computing (GrC). First, the similarity between two information values in an SVIS is introduced and a variable parameter to control the similarity is given. Then, the tolerance relations on the object set are defined, and based on this tolerance relation, θ-lower and θ-upper approximations in an SVIS are put forward. Next, the outlier factor in an SVIS is presented and applied to various data sets. Finally, outlier detection method for set-valued data based on RST and GrC is proposed, and the corresponding algorithms are designed. Through numerical experiments based on UCI, the designed algorithm is compared with six other detection algorithms. The experimental results show the designed algorithm is arguably the best choice under the context of an SVIS. It is worth mentioning that for a comprehensive comparison, we use two criteria: AUC value and F-1 measure, to show the superiority of the designed algorithm.
期刊介绍:
International Journal of General Systems is a periodical devoted primarily to the publication of original research contributions to system science, basic as well as applied. However, relevant survey articles, invited book reviews, bibliographies, and letters to the editor are also published.
The principal aim of the journal is to promote original systems ideas (concepts, principles, methods, theoretical or experimental results, etc.) that are broadly applicable to various kinds of systems. The term “general system” in the name of the journal is intended to indicate this aim–the orientation to systems ideas that have a general applicability. Typical subject areas covered by the journal include: uncertainty and randomness; fuzziness and imprecision; information; complexity; inductive and deductive reasoning about systems; learning; systems analysis and design; and theoretical as well as experimental knowledge regarding various categories of systems. Submitted research must be well presented and must clearly state the contribution and novelty. Manuscripts dealing with particular kinds of systems which lack general applicability across a broad range of systems should be sent to journals specializing in the respective topics.