Proceedings of the 6th International Conference on Information System and Data Mining最新文献

英文中文

N-gram and Word2Vec Feature Engineering Approaches for Spam Recognition on Some Influential Twitter Topics in Saudi Arabia N-gram和Word2Vec特征工程方法在沙特阿拉伯一些有影响力的Twitter话题上的垃圾邮件识别

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546173

Ahmed M. Balfagih, Vlado Keselj, Stacey Taylor

Social media platforms, such as Twitter, have become powerful sources of information on people's perception of major events. Many people use Twitter to express their views on various issues and events and use it to develop their opinion on the diverse economic, political, technical, and social occurrences related to their daily lives. Spam and non-relevant tweets are a major challenge for Twitter trend detection. Saudi Arabia is a top ranked country in Twitter usage worldwide, and in recent years has experienced difficulties due to the use and rise of hashtags based on misleading tweets and spam. The goal of this paper is to apply machine learning techniques to identify spam on the Saudi tweets collected to the end of 2020. To date, spam detection on Twitter data has been mostly done in English, leaving other major languages, such as Arabic, insufficiently covered. Additionally, publicly accessible Arabic Twitter datasets are hard to find. For our research, we use eight Twitter datasets on some significant topics in politics, health, national affairs, economy, and sport, to train and evaluate different machine learning algorithms, with a focus on two feature generation techniques based on N-grams and Word2Vec embeddings. One contribution of this paper is providing these new labelled datasets with embeddings. The experimental results show improvement from using embeddings over N-grams in more balanced datasets vs. more unbalanced ones. We also find a superior performance of the Random Forest algorithm over other algorithms in most experiments.

Twitter等社交媒体平台已经成为人们对重大事件看法的强大信息来源。许多人使用Twitter来表达他们对各种问题和事件的看法，并利用它来发展他们对与日常生活相关的各种经济、政治、技术和社会事件的看法。垃圾邮件和不相关的tweet是Twitter趋势检测的主要挑战。沙特阿拉伯是全球推特使用率最高的国家，近年来，由于基于误导性推文和垃圾邮件的标签的使用和兴起，沙特阿拉伯遇到了困难。本文的目标是应用机器学习技术来识别到2020年底收集的沙特推文中的垃圾邮件。迄今为止，Twitter数据上的垃圾邮件检测主要是用英语完成的，而其他主要语言，如阿拉伯语，没有得到充分的覆盖。此外，很难找到可公开访问的阿拉伯语Twitter数据集。在我们的研究中，我们使用了八个Twitter数据集，涉及政治、卫生、国家事务、经济和体育等一些重要主题，以训练和评估不同的机器学习算法，重点关注基于N-grams和Word2Vec嵌入的两种特征生成技术。本文的一个贡献是为这些新的标记数据集提供嵌入。实验结果表明，在更平衡的数据集上使用n -图的嵌入比在更不平衡的数据集上使用嵌入有改进。在大多数实验中，我们也发现随机森林算法的性能优于其他算法。

{"title":"N-gram and Word2Vec Feature Engineering Approaches for Spam Recognition on Some Influential Twitter Topics in Saudi Arabia","authors":"Ahmed M. Balfagih, Vlado Keselj, Stacey Taylor","doi":"10.1145/3546157.3546173","DOIUrl":"https://doi.org/10.1145/3546157.3546173","url":null,"abstract":"Social media platforms, such as Twitter, have become powerful sources of information on people's perception of major events. Many people use Twitter to express their views on various issues and events and use it to develop their opinion on the diverse economic, political, technical, and social occurrences related to their daily lives. Spam and non-relevant tweets are a major challenge for Twitter trend detection. Saudi Arabia is a top ranked country in Twitter usage worldwide, and in recent years has experienced difficulties due to the use and rise of hashtags based on misleading tweets and spam. The goal of this paper is to apply machine learning techniques to identify spam on the Saudi tweets collected to the end of 2020. To date, spam detection on Twitter data has been mostly done in English, leaving other major languages, such as Arabic, insufficiently covered. Additionally, publicly accessible Arabic Twitter datasets are hard to find. For our research, we use eight Twitter datasets on some significant topics in politics, health, national affairs, economy, and sport, to train and evaluate different machine learning algorithms, with a focus on two feature generation techniques based on N-grams and Word2Vec embeddings. One contribution of this paper is providing these new labelled datasets with embeddings. The experimental results show improvement from using embeddings over N-grams in more balanced datasets vs. more unbalanced ones. We also find a superior performance of the Random Forest algorithm over other algorithms in most experiments.","PeriodicalId":422215,"journal":{"name":"Proceedings of the 6th International Conference on Information System and Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130856055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Link prediction with Simple Graph Convolution and regularized Simple Graph Convolution 用简单图卷积和正则化简单图卷积进行链接预测

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546163

Patrick Pho, Alexander V. Mantzaris

Attributed graphs are used to model real-life systems in many domains such as social science, biology, etc. Link prediction is an important task on attributed graph with a wide range of useful applications. Simple link prediction approaches have limitation in their capability to capture network topology and node attributes. Graph Neural Networks (GNNs) provide an efficient framework incorporating node attributes and connectivity to produce informative embeddings for many downstream tasks including link prediction. In this work, we study two variants of GNNs, namely Simple Graph Convolution (SGC) and its extension for link prediction on three citation datasets. While it is fast and efficient, our model is insufficient to capture the complex node connectivities. On the other hand, imposing regularization reduces overfitting and improves model performance.

在社会科学、生物学等许多领域中，属性图被用来对现实系统进行建模。链路预测是属性图上的一项重要任务，有着广泛的应用前景。简单的链路预测方法在捕获网络拓扑和节点属性方面存在局限性。图神经网络(gnn)提供了一个有效的框架，结合节点属性和连通性，为包括链路预测在内的许多下游任务产生信息嵌入。在这项工作中，我们研究了gnn的两种变体，即简单图卷积(SGC)及其扩展，用于三个引文数据集的链接预测。虽然该模型快速高效，但不足以捕获复杂的节点连接。另一方面，施加正则化可以减少过拟合并提高模型性能。

引用次数: 1

BD-ECG: Identification of Myocardial Infarction in ECG via Behavior Coupling BD-ECG:通过行为耦合识别心电图中的心肌梗死

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546169

Uzair Iqbal, Teh Ying Wah, Muhammad Habib Ur Rehman, Muhammad Bilal, Adeel Ahmed

In the cardiovascular diseases, early detection and identification of the relationship between different diseases are still open problems for cardiologists. In this paper, we propose a novel scheme for behavioral detection in electrocardiography data named as Behavioral detection-Electrography. The Behavioral detection-Electrography is used for early detection of abnormalities in electrography especially myocardial infarction. The Behavioral detection-Electrography embeds the two-tier architecture in which we integrate the behavioral relationship concepts with myocardial detection algorithm. In future the highlighted integral scheme will help us to identify the nature of cardiovascular diseases either it's normal or abnormal

在心血管疾病中，早期发现和识别不同疾病之间的关系仍然是心脏病学家面临的开放性问题。本文提出了一种新的心电图数据行为检测方案，称为行为检测-电图。行为检测-电图用于早期发现心电图异常，尤其是心肌梗死。行为检测-电图嵌入了两层架构，将行为关系概念与心肌检测算法相结合。在未来，突出显示的积分方案将帮助我们识别心血管疾病的性质，无论是正常还是异常

引用次数: 1

Access Control using Blockchain: A Taxonomy and Review 使用区块链的访问控制:分类和回顾

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546165

S. Malik, M. A. Shah

After the introduction of blockchain as a cryptocurrency platform, researchers and industry leaders have come up with novel ways of utilizing the technology. One emerging use case for blockchain is access control, since it solves the problem of trust deficit while being distributed, auditable and private. This paper lists various access control models that have been proposed or implemented on blockchain platforms. It also provides an analysis of the performance of some of these models. Analysis shows that access control models are progressing from the traditional identity-based systems to role/attribute-based systems with observed structural shifts from centralized to decentralized. Key benefits of blockchain-based access control systems noted are improved transparency over access control and logging; improved, more complex policy management; and the possibility to implement access controls in trust-less systems.

在引入区块链作为加密货币平台之后，研究人员和行业领导者提出了利用该技术的新方法。区块链的一个新兴用例是访问控制，因为它在分布式、可审计和私有的同时解决了信任赤字问题。本文列出了在区块链平台上提出或实现的各种访问控制模型。本文还对其中一些模型的性能进行了分析。分析表明，访问控制模型正在从传统的基于身份的系统向基于角色/属性的系统发展，结构从集中式向分散式转变。基于区块链的访问控制系统的主要好处是提高了访问控制和日志记录的透明度;改进、更复杂的政策管理;以及在无信任系统中实现访问控制的可能性。

引用次数: 0

Traffic Sign Recognition with Vision Transformers 使用视觉变压器识别交通标志

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546166

Haolan Wang

Traffic sign recognition is an integral part of future autonomous driving systems. Deep learning has been applied in this task, while the performance of the recent vision Transformers is unexplored. In this study, eight different vision Transformers are validated in three real-world traffic sign datasets for the first time. The experimental results demonstrate that the best vision Transformer has a performance between the pre-trained DenseNet and the DenseNet trained from scratch. Besides, the best vision Transformers model has less training time compared to DenseNet.

交通标志识别是未来自动驾驶系统的重要组成部分。深度学习已应用于该任务，而最近的视觉变形金刚的性能尚未得到探索。在本研究中，首次在三个真实的交通标志数据集中验证了八种不同的视觉变形器。实验结果表明，最佳视觉变压器在预训练的DenseNet和从头训练的DenseNet之间具有良好的性能。此外，与DenseNet相比，最佳视觉变形金刚模型的训练时间更短。

引用次数: 2

Influence of Transformational Leadership on Emotional Labor of Employees: —Mediating Role of Psychological Empowerment 变革型领导对员工情绪劳动的影响:心理授权的中介作用

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546179

Pengfei Cheng, Jingxuan Jiang, Shasha Tian

Based on the job demands-resources model, this paper explores the mediating role of psychological empowerment in the process of transformational leadership impacts the effects of employees’ emotional labor. We report empirical results indicating that transformational leadership has a strong negative relationship to surface behavior and a strong positive relationship to deep behavior, while psychological empowerment mediates the relationship between transformational leadership and employees' emotional labor. Specifically, transformational leadership can indirectly influence the deep acting of front-line employees through meaning, influence, self-determination and self-efficacy; Transformational leadership can indirectly influence front-line employees' surface acting through influence, self-determination and self-efficacy, while meaning has no significant influence on surface acting.

本文基于工作需求-资源模型，探讨了心理授权在变革型领导过程中对员工情绪劳动效果的中介作用。实证结果表明，变革型领导对表层行为有强烈的负向关系，对深层行为有强烈的正向关系，而心理授权在变革型领导与员工情绪劳动之间起中介作用。具体而言，变革型领导可以通过意义、影响力、自我决定和自我效能间接影响一线员工的深层行为;变革型领导可以通过影响力、自我决定和自我效能间接影响一线员工的表层行为，而意义对表层行为的影响不显著。

引用次数: 0

A Nonsynaptic Memory Based Neural Network for Hand-Written Digit Classification Using an Explainable Feature Extraction Method 基于可解释特征提取的手写数字分类非突触记忆神经网络

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546168

F. Faghihi, Siqi Cai, A. Moustafa, Hany Alashwal

Deep learning methods have been developed for handwritten digit classification. However, these methods work as ‘black-boxes’ and need large training data. In this study, an explainable feature extraction method is developed for handwritten digit classification. The features of the digit image include horizontal, vertical, and orthogonal lines as well as full or semi-circles. In our proposed method, such features are extracted using 10 neurons as computational units. Specifically, the neurons store the features through network training and store them inside the neurons in a non-synaptic memory manner. Following that, the trained neurons are used for the retrieval of information from test images to assign them to digit categories. Our method shows an accuracy of 75 % accuracy using 0.016 % of the training data and achieves a high accuracy of 86 % using one epoch of whole training data of the MNIST dataset. To the best of our knowledge, this is the first model that stores information inside a few single neurons (i.e., non-synaptic memory) instead of storing the information in synapses of connected feed-forward layers. Due to enabling single neurons to compute individually, it is expected that such a class of neural networks can be combined with synaptic memory architectures that we expect to show higher performance compared to traditional neural networks.

深度学习方法已被开发用于手写数字分类。然而，这些方法像“黑盒”一样工作，需要大量的训练数据。本研究提出了一种手写体数字分类的可解释特征提取方法。数字图像的特征包括水平、垂直和正交线以及全圆或半圆。在我们提出的方法中，使用10个神经元作为计算单位提取这些特征。具体来说，神经元通过网络训练存储特征，并以非突触记忆的方式存储在神经元内。随后，训练的神经元用于从测试图像中检索信息，并将其分配到数字类别。我们的方法使用0.016%的训练数据达到75%的准确率，使用MNIST数据集的整个训练数据的一个历元达到86%的准确率。据我们所知，这是第一个将信息存储在几个单个神经元(即非突触记忆)中而不是将信息存储在连接前馈层的突触中的模型。由于使单个神经元能够单独计算，我们期望这类神经网络可以与突触记忆架构相结合，我们期望与传统神经网络相比，表现出更高的性能。

{"title":"A Nonsynaptic Memory Based Neural Network for Hand-Written Digit Classification Using an Explainable Feature Extraction Method","authors":"F. Faghihi, Siqi Cai, A. Moustafa, Hany Alashwal","doi":"10.1145/3546157.3546168","DOIUrl":"https://doi.org/10.1145/3546157.3546168","url":null,"abstract":"Deep learning methods have been developed for handwritten digit classification. However, these methods work as ‘black-boxes’ and need large training data. In this study, an explainable feature extraction method is developed for handwritten digit classification. The features of the digit image include horizontal, vertical, and orthogonal lines as well as full or semi-circles. In our proposed method, such features are extracted using 10 neurons as computational units. Specifically, the neurons store the features through network training and store them inside the neurons in a non-synaptic memory manner. Following that, the trained neurons are used for the retrieval of information from test images to assign them to digit categories. Our method shows an accuracy of 75 % accuracy using 0.016 % of the training data and achieves a high accuracy of 86 % using one epoch of whole training data of the MNIST dataset. To the best of our knowledge, this is the first model that stores information inside a few single neurons (i.e., non-synaptic memory) instead of storing the information in synapses of connected feed-forward layers. Due to enabling single neurons to compute individually, it is expected that such a class of neural networks can be combined with synaptic memory architectures that we expect to show higher performance compared to traditional neural networks.","PeriodicalId":422215,"journal":{"name":"Proceedings of the 6th International Conference on Information System and Data Mining","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Docker Container based Crowd Control Analysis Using Dask Hadoop Framework 基于Dask Hadoop框架的Docker容器人群控制分析

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546159

G. RadhikaE., Jai Bhaarath, Naveen, Ritesh Nirmal

Crowd control is a public policy technique in which massive crowds are handled in order to avoid the emergence of possible issues or threats caused by COVID-19 and over-crowding. In this pandemic, social distancing is critical as there is a high chance of being infected in a crowd. With mounting fears about public disease transmission, the significance of crowd monitoring is crucial in these testing times. In the existing system, the model takes more time and resources to process the data from the crowd control application thus resulting in delayed prediction. Early prediction of the crowd level will help people and other government agencies to control and monitor the crowd. Hence, the main goal of the proposed system is to process a large amount of input from the crowd control application in minimal time using Dynamic Task Scheduling (Dask) based Hadoop framework in a multi-node docker cluster. The multi-node cluster processes the input data in different clusters. Each cluster data is fed to model for prediction and forecasting the count of crowd at a location. The models considered for evaluation are RNN_LSTM and ARIMA. The results shown that RNN_LSTM model has provided better accuracy of 97% compared to the ARIMA of 89%. The results show that the prediction performance of RNN_LSTM has shown 40% decrease in Mean Absolute Error (MAE) and 30% decrease in Root Mean Squared Error (RMSE) over the existing ARIMA model. The proposed system is available as an application to the public and enable them to decide whether to visit a particular place or not.

人群控制是为了避免因新冠疫情和过度拥挤而可能出现的问题或威胁，对大量人群进行管理的公共政策技术。在这次大流行中，保持社交距离至关重要，因为在人群中被感染的可能性很高。随着对公共疾病传播的担忧日益加剧，在这些测试时期，人群监测的重要性至关重要。在现有的系统中，模型需要花费更多的时间和资源来处理来自人群控制应用程序的数据，从而导致预测延迟。对人群水平的早期预测将有助于人们和其他政府机构控制和监测人群。因此，该系统的主要目标是在多节点docker集群中使用基于动态任务调度(Dask)的Hadoop框架在最短的时间内处理来自人群控制应用程序的大量输入。多节点集群在不同的集群中处理输入的数据。每个聚类数据都被输入到模型中进行预测和预测某一地点的人群数量。考虑评估的模型是RNN_LSTM和ARIMA。结果表明，RNN_LSTM模型的准确率为97%，而ARIMA模型的准确率为89%。结果表明，RNN_LSTM的预测性能与现有的ARIMA模型相比，平均绝对误差(MAE)降低了40%，均方根误差(RMSE)降低了30%。建议的系统可供市民申请，市民可自行决定是否参观某一地点。

{"title":"Docker Container based Crowd Control Analysis Using Dask Hadoop Framework","authors":"G. RadhikaE., Jai Bhaarath, Naveen, Ritesh Nirmal","doi":"10.1145/3546157.3546159","DOIUrl":"https://doi.org/10.1145/3546157.3546159","url":null,"abstract":"Crowd control is a public policy technique in which massive crowds are handled in order to avoid the emergence of possible issues or threats caused by COVID-19 and over-crowding. In this pandemic, social distancing is critical as there is a high chance of being infected in a crowd. With mounting fears about public disease transmission, the significance of crowd monitoring is crucial in these testing times. In the existing system, the model takes more time and resources to process the data from the crowd control application thus resulting in delayed prediction. Early prediction of the crowd level will help people and other government agencies to control and monitor the crowd. Hence, the main goal of the proposed system is to process a large amount of input from the crowd control application in minimal time using Dynamic Task Scheduling (Dask) based Hadoop framework in a multi-node docker cluster. The multi-node cluster processes the input data in different clusters. Each cluster data is fed to model for prediction and forecasting the count of crowd at a location. The models considered for evaluation are RNN_LSTM and ARIMA. The results shown that RNN_LSTM model has provided better accuracy of 97% compared to the ARIMA of 89%. The results show that the prediction performance of RNN_LSTM has shown 40% decrease in Mean Absolute Error (MAE) and 30% decrease in Root Mean Squared Error (RMSE) over the existing ARIMA model. The proposed system is available as an application to the public and enable them to decide whether to visit a particular place or not.","PeriodicalId":422215,"journal":{"name":"Proceedings of the 6th International Conference on Information System and Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115148439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effective online learning management system to improve and enhance the online learning and student engagement experience.: This document contains how the proposed solution, an extension, can help enhance distance learning. 有效的在线学习管理系统，改善和提高在线学习和学生的参与体验。本文件包含了作为扩展的拟议解决方案如何有助于加强远程学习。

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546172

T. Ginige, Shenal Thilaksha Vanderwall

With the spread of Covid-19, there has been a shift towards distance learning and teachers find it difficult to keep track of students who are attentive during class. Unlike before, where the traditional classroom environment helped teachers keep track of the students who are not fully concentrating during the lessons. This shift to online learning has made teachers find it much more difficult to keep track of students who are idling during their lecture period. For this the following solution is proposed to introduce an extension to help teachers integrate with existing video conferencing platforms. This solution will help teachers to know whether the student has been attentive during class, by keeping track of their peripheral device movements, such as mouse movements or keystrokes. Previous studies have been conducted to keep track of student's eye movement and browser history, but no solution has been developed to easily ‘plug and play’ into an existing platform for teachers to get real time progress of a student's interaction to the lecture. The main objective of this research will be to help enhance the learning experience of a student by keeping the teacher aware of the student's progress just like in a traditional classroom environment.

随着新冠肺炎疫情的蔓延，人们开始转向远程学习，教师发现很难跟踪上课专心听讲的学生。与以前不同的是，传统的课堂环境帮助教师跟踪那些在课堂上没有完全集中注意力的学生。这种向在线学习的转变使得老师们发现要跟踪那些在课堂上无所事事的学生要困难得多。为此，提出以下解决方案，引入扩展，帮助教师与现有的视频会议平台集成。这个解决方案将帮助教师了解学生是否在课堂上专心听讲，通过跟踪他们的外围设备的运动，如鼠标的运动或键盘的敲击。以前的研究一直在跟踪学生的眼球运动和浏览历史，但目前还没有开发出一种解决方案，可以轻松地“即插即用”地进入现有的平台，让教师获得学生与讲座互动的实时进展。这项研究的主要目的是帮助提高学生的学习体验，让老师了解学生的进步，就像在传统的课堂环境中一样。

{"title":"Effective online learning management system to improve and enhance the online learning and student engagement experience.: This document contains how the proposed solution, an extension, can help enhance distance learning.","authors":"T. Ginige, Shenal Thilaksha Vanderwall","doi":"10.1145/3546157.3546172","DOIUrl":"https://doi.org/10.1145/3546157.3546172","url":null,"abstract":"With the spread of Covid-19, there has been a shift towards distance learning and teachers find it difficult to keep track of students who are attentive during class. Unlike before, where the traditional classroom environment helped teachers keep track of the students who are not fully concentrating during the lessons. This shift to online learning has made teachers find it much more difficult to keep track of students who are idling during their lecture period. For this the following solution is proposed to introduce an extension to help teachers integrate with existing video conferencing platforms. This solution will help teachers to know whether the student has been attentive during class, by keeping track of their peripheral device movements, such as mouse movements or keystrokes. Previous studies have been conducted to keep track of student's eye movement and browser history, but no solution has been developed to easily ‘plug and play’ into an existing platform for teachers to get real time progress of a student's interaction to the lecture. The main objective of this research will be to help enhance the learning experience of a student by keeping the teacher aware of the student's progress just like in a traditional classroom environment.","PeriodicalId":422215,"journal":{"name":"Proceedings of the 6th International Conference on Information System and Data Mining","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122248806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Long Short-Term Memory for Bitcoin Price Prediction 比特币价格预测的长短期记忆

Proceedings of the 6th International Conference on Information System and Data Mining

Pub Date : 2022-05-27 DOI: 10.1145/3546157.3546162

Jordan Jones, Doga Demirel

With time-series data being prevalent everywhere, there is a need to predict this data accurately. This kind of data includes weather data, financial data such as stock price, and cryptocurrency price. Most of the trades in the stock market in this day and age are being made using artificial intelligence. An estimated 50% of trades were done using an algorithm, which increased to 60% in 2020 [1]. This highlights the demand for reliable and accurate predictions. The prediction of the price is very challenging. Some success has been seen when predicting stock prices, but not many studies have been done on cryptocurrency. Cryptocurrency, specifically Bitcoin, has seen a substantial increase in popularity, and the price has reflected this popularity. The price also follows patterns specifically when reaching new all-time highs. In this work, an Artificial intelligence is created and trained on the previous data to observe these patterns and predict the next price. The artificial intelligence chosen for this subject is Long short-term memory (LSTM). LSTMs are capable of finding patterns in time series data. LSTM solves the vanishing gradient problem present in the RNN (Recurrent Neural Network). The Market Price of Bitcoin is used as input here. The data values for input range from 20,000 up to 65,000 in testing. Once an optimal starting point is found, there is an 80/20 split of data, 80 percent of the data is used for training and 20 is used for testing. With the data being split, one of the most important jobs is figuring out the optimal lags (how far back into the past) when used to predict values. This range for this experiment is set to ten previous price days. Epochs (number of iterations) and Batch size (how much of the training data is used per epoch) are tested at different values to find optimal solutions. With batch size values such that batchSize ∈ {20, 21…26} and epochs such that epochs ∈ {10, 20….70}. Overfitting is hard to detect and thus can be an issue with too many epochs and smaller batch sizes (smaller means more of the training data is used). Too little and the LSTM will not learn the data patterns and thus will not have good accuracy. This is why different configurations are used in the experiment to maximize accuracy. This LSTM was used to achieve a Mean Absolute Percentage Error score of 3.23% and a Root Mean Squared Error score of 1892.87 when predicting next-day prices throughout 350.

由于时间序列数据无处不在，因此需要准确地预测这些数据。这类数据包括天气数据、股票价格等金融数据和加密货币价格。在这个时代，股票市场上的大多数交易都是用人工智能进行的。据估计，50%的交易是通过算法完成的，到2020年，这一比例将增加到60%。这突出了对可靠和准确预测的需求。价格的预测非常具有挑战性。在预测股价方面已经取得了一些成功，但对加密货币的研究并不多。加密货币，特别是比特币，越来越受欢迎，价格也反映了这种受欢迎程度。价格也遵循模式，特别是当达到新的历史高点。在这项工作中，人工智能被创建并训练在之前的数据上观察这些模式并预测下一个价格。本课题选择的人工智能是长短期记忆(LSTM)。lstm能够在时间序列数据中发现模式。LSTM解决了递归神经网络中存在的梯度消失问题。这里使用比特币的市场价格作为输入。在测试中，输入的数据值范围从20,000到65,000。一旦找到了最优的起点，就会有80/20的数据分割，80%的数据用于训练，20%用于测试。随着数据被分割，最重要的工作之一是计算出用于预测值的最佳滞后(追溯到过去的时间)。这个实验的范围设置为前10个价格日。epoch(迭代次数)和Batch大小(每个epoch使用多少训练数据)在不同的值下进行测试，以找到最佳解决方案。batch size的值batchSize∈{20,21…26}，epochs的值epochs∈{10,20 ....70}。过拟合很难检测，因此可能是太多epoch和较小批大小(较小意味着使用更多的训练数据)的问题。太少，LSTM将无法学习数据模式，从而不会具有良好的准确性。这就是为什么在实验中使用不同的配置来最大限度地提高精度。该LSTM用于预测整个350的次日价格时，平均绝对百分比误差得分为3.23%，均方根误差得分为1892.87。

{"title":"Long Short-Term Memory for Bitcoin Price Prediction","authors":"Jordan Jones, Doga Demirel","doi":"10.1145/3546157.3546162","DOIUrl":"https://doi.org/10.1145/3546157.3546162","url":null,"abstract":"With time-series data being prevalent everywhere, there is a need to predict this data accurately. This kind of data includes weather data, financial data such as stock price, and cryptocurrency price. Most of the trades in the stock market in this day and age are being made using artificial intelligence. An estimated 50% of trades were done using an algorithm, which increased to 60% in 2020 [1]. This highlights the demand for reliable and accurate predictions. The prediction of the price is very challenging. Some success has been seen when predicting stock prices, but not many studies have been done on cryptocurrency. Cryptocurrency, specifically Bitcoin, has seen a substantial increase in popularity, and the price has reflected this popularity. The price also follows patterns specifically when reaching new all-time highs. In this work, an Artificial intelligence is created and trained on the previous data to observe these patterns and predict the next price. The artificial intelligence chosen for this subject is Long short-term memory (LSTM). LSTMs are capable of finding patterns in time series data. LSTM solves the vanishing gradient problem present in the RNN (Recurrent Neural Network). The Market Price of Bitcoin is used as input here. The data values for input range from 20,000 up to 65,000 in testing. Once an optimal starting point is found, there is an 80/20 split of data, 80 percent of the data is used for training and 20 is used for testing. With the data being split, one of the most important jobs is figuring out the optimal lags (how far back into the past) when used to predict values. This range for this experiment is set to ten previous price days. Epochs (number of iterations) and Batch size (how much of the training data is used per epoch) are tested at different values to find optimal solutions. With batch size values such that batchSize ∈ {20, 21…26} and epochs such that epochs ∈ {10, 20….70}. Overfitting is hard to detect and thus can be an issue with too many epochs and smaller batch sizes (smaller means more of the training data is used). Too little and the LSTM will not learn the data patterns and thus will not have good accuracy. This is why different configurations are used in the experiment to maximize accuracy. This LSTM was used to achieve a Mean Absolute Percentage Error score of 3.23% and a Root Mean Squared Error score of 1892.87 when predicting next-day prices throughout 350.","PeriodicalId":422215,"journal":{"name":"Proceedings of the 6th International Conference on Information System and Data Mining","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128288596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 6th International Conference on Information System and Data Mining

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀