{"title":"基于局部模糊自信息,对具有有限缺失标签的大规模混合数据进行局部模糊粗糙属性还原","authors":"Zhaowen Li , Run Guo , Ning Lin , Tao Lu","doi":"10.1016/j.ins.2024.121613","DOIUrl":null,"url":null,"abstract":"<div><div>The advent of the era of big data is accompanied by the generation of large-scale data of various types. Extracting the potential value and rules from such data has always been a challenge. Due to various external and internal factors, it is commonplace for large-scale data to exhibit the phenomenon of missing limited labels. In addressing a large-scale mixed information system with limited label missing (LSMDISLML), local neighborhood rough set model (LNRS-model) is typically employed. However, the identical neighborhood radius is often used by such model when confronted with numerical attributes, which could potentially attenuate the classification capability of the data. Local fuzzy rough set model (LFRS-model) can overcome this point. This paper studies local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on LFRS-model via local fuzzy self information and overlap degree function. First, leveraging the statistical distribution of data as a foundation, fuzzy relations on the entire sample set are established, which has the advantage of being able to use different fuzzy similarity radii to calculate similarity, thereby adapting to different data distributions. Subsequently, the samples with missing labels are discarded as they constitute a small proportion of the entire sample set and have little impact on overall performance of dataset. The limited computing resources and storage space are focused on the sample set with complete labels (denoted as target set). Thereafter, based on the target set, local fuzzy <em>λ</em>-upper and lower approximations are defined, and LFRS-model is constructed. This model not only reduces processing time and sources of error in large-scale data but also improves data quality and enhances the reliability of the experimental results. Then, local fuzzy <em>λ</em>-self information is introduced and used to design a local fuzzy rough attribute reduction algorithm in a LSMDISLML. Furthermore, a overlap degree function is introduced to evaluate and reorder the attributes based on their importance, prioritizing the elimination of redundant attributes with high overlap and low importance from the preordered attribute set. This strategy effectively improves the efficiency of obtaining the optimal subset. Finally, a series of experiments are carried out. The experiment results demonstrate that the designed algorithm exhibits excellent performance in classification tasks and outlier detection tasks, surpassing existing four algorithms.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"691 ","pages":"Article 121613"},"PeriodicalIF":8.1000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on local fuzzy self information\",\"authors\":\"Zhaowen Li , Run Guo , Ning Lin , Tao Lu\",\"doi\":\"10.1016/j.ins.2024.121613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The advent of the era of big data is accompanied by the generation of large-scale data of various types. Extracting the potential value and rules from such data has always been a challenge. Due to various external and internal factors, it is commonplace for large-scale data to exhibit the phenomenon of missing limited labels. In addressing a large-scale mixed information system with limited label missing (LSMDISLML), local neighborhood rough set model (LNRS-model) is typically employed. However, the identical neighborhood radius is often used by such model when confronted with numerical attributes, which could potentially attenuate the classification capability of the data. Local fuzzy rough set model (LFRS-model) can overcome this point. This paper studies local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on LFRS-model via local fuzzy self information and overlap degree function. First, leveraging the statistical distribution of data as a foundation, fuzzy relations on the entire sample set are established, which has the advantage of being able to use different fuzzy similarity radii to calculate similarity, thereby adapting to different data distributions. Subsequently, the samples with missing labels are discarded as they constitute a small proportion of the entire sample set and have little impact on overall performance of dataset. The limited computing resources and storage space are focused on the sample set with complete labels (denoted as target set). Thereafter, based on the target set, local fuzzy <em>λ</em>-upper and lower approximations are defined, and LFRS-model is constructed. This model not only reduces processing time and sources of error in large-scale data but also improves data quality and enhances the reliability of the experimental results. Then, local fuzzy <em>λ</em>-self information is introduced and used to design a local fuzzy rough attribute reduction algorithm in a LSMDISLML. Furthermore, a overlap degree function is introduced to evaluate and reorder the attributes based on their importance, prioritizing the elimination of redundant attributes with high overlap and low importance from the preordered attribute set. This strategy effectively improves the efficiency of obtaining the optimal subset. Finally, a series of experiments are carried out. The experiment results demonstrate that the designed algorithm exhibits excellent performance in classification tasks and outlier detection tasks, surpassing existing four algorithms.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"691 \",\"pages\":\"Article 121613\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025524015275\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524015275","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on local fuzzy self information
The advent of the era of big data is accompanied by the generation of large-scale data of various types. Extracting the potential value and rules from such data has always been a challenge. Due to various external and internal factors, it is commonplace for large-scale data to exhibit the phenomenon of missing limited labels. In addressing a large-scale mixed information system with limited label missing (LSMDISLML), local neighborhood rough set model (LNRS-model) is typically employed. However, the identical neighborhood radius is often used by such model when confronted with numerical attributes, which could potentially attenuate the classification capability of the data. Local fuzzy rough set model (LFRS-model) can overcome this point. This paper studies local fuzzy rough attribute reduction for large-scale mixed data with limited missing labels based on LFRS-model via local fuzzy self information and overlap degree function. First, leveraging the statistical distribution of data as a foundation, fuzzy relations on the entire sample set are established, which has the advantage of being able to use different fuzzy similarity radii to calculate similarity, thereby adapting to different data distributions. Subsequently, the samples with missing labels are discarded as they constitute a small proportion of the entire sample set and have little impact on overall performance of dataset. The limited computing resources and storage space are focused on the sample set with complete labels (denoted as target set). Thereafter, based on the target set, local fuzzy λ-upper and lower approximations are defined, and LFRS-model is constructed. This model not only reduces processing time and sources of error in large-scale data but also improves data quality and enhances the reliability of the experimental results. Then, local fuzzy λ-self information is introduced and used to design a local fuzzy rough attribute reduction algorithm in a LSMDISLML. Furthermore, a overlap degree function is introduced to evaluate and reorder the attributes based on their importance, prioritizing the elimination of redundant attributes with high overlap and low importance from the preordered attribute set. This strategy effectively improves the efficiency of obtaining the optimal subset. Finally, a series of experiments are carried out. The experiment results demonstrate that the designed algorithm exhibits excellent performance in classification tasks and outlier detection tasks, surpassing existing four algorithms.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.