{"title":"Performance Analysis of XGBoost Algorithm to Determine the Most Optimal Parameters and Features in Predicting Stock Price Movement","authors":"Affan Ardana","doi":"10.31315/telematika.v20i1.9329","DOIUrl":null,"url":null,"abstract":"Purpose: The research aims to find the best parameters and features for predicting stock price movement using the XGBoost algorithm. The parameters are searched using the RMSE value, and the features are searched using the importance value.Design/methodology/approach: The research data is the stock data of Amazon.com company (AMZN). The dataset contains the Date, Low, Open, Volume, High, Close, and Adjusted Close features. The dataset is ensured to have no missing data by handling missing values. The input feature is selected using the Pearson Correlation feature selection method. To prevent the difference between the highest and lowest stock price from being too far apart, the data is scaled using the scaling method. To avoid bias that may appear in the prediction result, cross-validation is used with the Min Max Scaling method, which will devide the dataset into training data and testing data within a range of 30 days after the training data. The parameters to be tested include n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, and max_depth (tree depth) = 3, 4, 5.Findings/result: The result of the research that a learning rate of 0.05 and a tree depth of 5 obtained the lowest RMSE result compared to other models, with an RMSE of 0.009437. The Low feature obtained the highest importance value among all the models built.Originality/value/state of the art: This study used testing data within a range of 30 days after the training data and used a combination of parameters, including n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, amd max_depth (tree depth) = 3, 4, 5. ","PeriodicalId":31716,"journal":{"name":"Telematika","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31315/telematika.v20i1.9329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The research aims to find the best parameters and features for predicting stock price movement using the XGBoost algorithm. The parameters are searched using the RMSE value, and the features are searched using the importance value.Design/methodology/approach: The research data is the stock data of Amazon.com company (AMZN). The dataset contains the Date, Low, Open, Volume, High, Close, and Adjusted Close features. The dataset is ensured to have no missing data by handling missing values. The input feature is selected using the Pearson Correlation feature selection method. To prevent the difference between the highest and lowest stock price from being too far apart, the data is scaled using the scaling method. To avoid bias that may appear in the prediction result, cross-validation is used with the Min Max Scaling method, which will devide the dataset into training data and testing data within a range of 30 days after the training data. The parameters to be tested include n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, and max_depth (tree depth) = 3, 4, 5.Findings/result: The result of the research that a learning rate of 0.05 and a tree depth of 5 obtained the lowest RMSE result compared to other models, with an RMSE of 0.009437. The Low feature obtained the highest importance value among all the models built.Originality/value/state of the art: This study used testing data within a range of 30 days after the training data and used a combination of parameters, including n_estimator = 500, early stopping round = 3, learning rate = 0.01, 0.05, 0.1, amd max_depth (tree depth) = 3, 4, 5.