{"title":"A Novel Feature Selection Method for Risk Management in High-Dimensional Time Series of Cryptocurrency Market","authors":"Erfan Varedi, R. Boostani","doi":"10.1145/3597309","DOIUrl":null,"url":null,"abstract":"In this study, a novel approach for feature selection has been presented in order to overcome the challenge of classifying positive and negative risk prediction in the cryptocurrency market, which contains high fluctuation. This approach is based on maximizing information gain with simultaneously minimizing the similarity of selected features to achieve a proper feature set for improving classification accuracy. The proposed method was compared with other feature selection techniques, such as sequential and bidirectional feature selection, univariate feature selection, and least absolute shrinkage and selection operator. To evaluate the feature selection techniques, several classifiers were employed: XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression, long short-term memory, and deep neural networks. The features were elicited from the time series of Bitcoin, Binance, and Ethereum cryptocurrencies. The results of applying the selected features to different classifiers indicated that XGBoost and random forest provided better results on the time series datasets. Furthermore, the proposed feature selection method achieved the best results on two (out of three) cryptocurrencies. The accuracy in the best state varied between 55% to 68% for different time series. It is worth mentioning that preprocessed features were used in this research, meaning that raw data (candle data) were used to derive efficient features that can explain the problem and help the classifiers in predicting the labels.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"27 1","pages":"1 - 14"},"PeriodicalIF":1.5000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3597309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
In this study, a novel approach for feature selection has been presented in order to overcome the challenge of classifying positive and negative risk prediction in the cryptocurrency market, which contains high fluctuation. This approach is based on maximizing information gain with simultaneously minimizing the similarity of selected features to achieve a proper feature set for improving classification accuracy. The proposed method was compared with other feature selection techniques, such as sequential and bidirectional feature selection, univariate feature selection, and least absolute shrinkage and selection operator. To evaluate the feature selection techniques, several classifiers were employed: XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression, long short-term memory, and deep neural networks. The features were elicited from the time series of Bitcoin, Binance, and Ethereum cryptocurrencies. The results of applying the selected features to different classifiers indicated that XGBoost and random forest provided better results on the time series datasets. Furthermore, the proposed feature selection method achieved the best results on two (out of three) cryptocurrencies. The accuracy in the best state varied between 55% to 68% for different time series. It is worth mentioning that preprocessed features were used in this research, meaning that raw data (candle data) were used to derive efficient features that can explain the problem and help the classifiers in predicting the labels.