{"title":"一种基于类依赖和特征不相似度的特征选择方法","authors":"Niphat Claypo, S. Jaiyen","doi":"10.1109/ICAICTA.2015.7335366","DOIUrl":null,"url":null,"abstract":"Feature selection method is an important task for data preprocessing in data mining. Before a classifier learns the training data, there are a lot of features in each data set that makes the learning process slower. It is not appropriated for big data analytics. This paper proposes feature selection method based on the class dependency and feature dissimilarity (CDFD) using mutual information and Euclidean distance. The mutual information is applied to determine the dependency between the feature and the class if the dataset contains discrete data. If the dataset contains continuous data, the correlation between the feature and the class is used instead. The Euclidean distance is used for reducing the duplicated features based on dissimilarity between features. The experiments are conducted on five datasets. From the experimental results, the propose feature selection method can reduce the number of features in the data set and reduce the classification error of classifiers. Furthermore, it can be applied to discrete and continuous data and it can help classifiers improving their classification accuracies and reducing the computational times for learning.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A new feature selection based on class dependency and feature dissimilarity\",\"authors\":\"Niphat Claypo, S. Jaiyen\",\"doi\":\"10.1109/ICAICTA.2015.7335366\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection method is an important task for data preprocessing in data mining. Before a classifier learns the training data, there are a lot of features in each data set that makes the learning process slower. It is not appropriated for big data analytics. This paper proposes feature selection method based on the class dependency and feature dissimilarity (CDFD) using mutual information and Euclidean distance. The mutual information is applied to determine the dependency between the feature and the class if the dataset contains discrete data. If the dataset contains continuous data, the correlation between the feature and the class is used instead. The Euclidean distance is used for reducing the duplicated features based on dissimilarity between features. The experiments are conducted on five datasets. From the experimental results, the propose feature selection method can reduce the number of features in the data set and reduce the classification error of classifiers. Furthermore, it can be applied to discrete and continuous data and it can help classifiers improving their classification accuracies and reducing the computational times for learning.\",\"PeriodicalId\":319020,\"journal\":{\"name\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA.2015.7335366\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A new feature selection based on class dependency and feature dissimilarity
Feature selection method is an important task for data preprocessing in data mining. Before a classifier learns the training data, there are a lot of features in each data set that makes the learning process slower. It is not appropriated for big data analytics. This paper proposes feature selection method based on the class dependency and feature dissimilarity (CDFD) using mutual information and Euclidean distance. The mutual information is applied to determine the dependency between the feature and the class if the dataset contains discrete data. If the dataset contains continuous data, the correlation between the feature and the class is used instead. The Euclidean distance is used for reducing the duplicated features based on dissimilarity between features. The experiments are conducted on five datasets. From the experimental results, the propose feature selection method can reduce the number of features in the data set and reduce the classification error of classifiers. Furthermore, it can be applied to discrete and continuous data and it can help classifiers improving their classification accuracies and reducing the computational times for learning.