Clustering High-Dimensional Stock Data using Data Mining Approach

Dhea Indriyanti, Arian Dhini
{"title":"Clustering High-Dimensional Stock Data using Data Mining Approach","authors":"Dhea Indriyanti, Arian Dhini","doi":"10.1109/ICSSSM.2019.8887724","DOIUrl":null,"url":null,"abstract":"In recent year, stock investor in Indonesia increased rapidly, so it is required to do analysis about the stock that helps the investor in their investment plan. Clustering is beneficial to select the appropriate stock for investors. Unfortunately, stock prices keep varying from time to time. Consequently, it is not an easy work to select the stock for investment. In addition, stock price time series data are high dimensional data that influenced by many factors. In this study, high dimensional data are obtained by the time frame of each factor. Therefore, it is important to use a suitable technique to cluster high dimensional data. This paper presents High Dimensional Data Clustering (HDDC), a model-based clustering based on Gaussian Mixture Model, using the Expectation-Maximization (EM) algorithm. HDDC via EM algorithm gives a more robust result, and it possible to make an additional assumption. Moreover, this paper combines a high-dimensional clustering technique HDDC via EM algorithm and the most popular feature extraction technique Principal Component Analysis (PCA). This paper comparing methods of clustering technique HDDC and the combination between HDDC and PCA to know the most effective method which gives better result in clustering high-dimensional time series data. The 155 data features are reduced to 7 principal components using PCA analysis. Despite PCA has increased the time efficiency of building the model, clustering technique HDDC via EM algorithm enables to handle the high-dimensional data better than the combination with PCA.","PeriodicalId":442421,"journal":{"name":"2019 16th International Conference on Service Systems and Service Management (ICSSSM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th International Conference on Service Systems and Service Management (ICSSSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSSM.2019.8887724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent year, stock investor in Indonesia increased rapidly, so it is required to do analysis about the stock that helps the investor in their investment plan. Clustering is beneficial to select the appropriate stock for investors. Unfortunately, stock prices keep varying from time to time. Consequently, it is not an easy work to select the stock for investment. In addition, stock price time series data are high dimensional data that influenced by many factors. In this study, high dimensional data are obtained by the time frame of each factor. Therefore, it is important to use a suitable technique to cluster high dimensional data. This paper presents High Dimensional Data Clustering (HDDC), a model-based clustering based on Gaussian Mixture Model, using the Expectation-Maximization (EM) algorithm. HDDC via EM algorithm gives a more robust result, and it possible to make an additional assumption. Moreover, this paper combines a high-dimensional clustering technique HDDC via EM algorithm and the most popular feature extraction technique Principal Component Analysis (PCA). This paper comparing methods of clustering technique HDDC and the combination between HDDC and PCA to know the most effective method which gives better result in clustering high-dimensional time series data. The 155 data features are reduced to 7 principal components using PCA analysis. Despite PCA has increased the time efficiency of building the model, clustering technique HDDC via EM algorithm enables to handle the high-dimensional data better than the combination with PCA.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于数据挖掘方法的高维库存数据聚类
近年来,印度尼西亚的股票投资者迅速增加,因此需要对股票进行分析,以帮助投资者制定投资计划。聚类有利于投资者选择合适的股票。不幸的是,股票价格不时变化。因此,选择股票进行投资并不是一件容易的事。此外,股票价格时间序列数据是受多种因素影响的高维数据。在本研究中,通过各因素的时间框架获得高维数据。因此,采用合适的技术对高维数据进行聚类是非常重要的。本文提出了一种基于高斯混合模型的高维数据聚类(HDDC)算法,该算法采用期望最大化(EM)算法。基于EM算法的HDDC具有更强的鲁棒性,并且可以进行额外的假设。此外,本文将基于EM算法的高维聚类技术HDDC与最流行的特征提取技术主成分分析(PCA)相结合。本文比较了HDDC聚类方法和HDDC与PCA相结合的聚类方法,找出了对高维时间序列数据聚类效果较好的最有效方法。利用主成分分析法将155个数据特征简化为7个主成分。尽管PCA提高了构建模型的时间效率,但是基于EM算法的聚类技术HDDC能够比结合PCA更好地处理高维数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on the Influence Mechanism of Gamification Elements on Users' Willingness to Continue Using in Interest-based Virtual Communities ‐‐ Based on ECM-ISC Model The Application of Offshore Operation Risk Classification Management Method An empirical study of corporate environmental liability performance, industry characteristics and financial performance The Application of Safety&security System in the Long Distance Landing Subsea Pipeline A Clustering-based Approach for Reorganizing Bus Route on Bus Rapid Transit System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1