首页 > 最新文献

2022 International Conference on Data Science and Its Applications (ICoDSA)最新文献

英文 中文
EEG Emotion Recognition using Parallel Hybrid Convolutional-Recurrent Neural Networks 基于并行混合卷积-递归神经网络的脑电情绪识别
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862853
Nursilva Aulianisa Putri, Esmeralda Contessa Djamal, Fikri Nugraha, Fatan Kasyidi
Electroencephalogram (EEG) signals of certain emotions contain waves with specific frequency bands. So, emotion recognition uses the network containing each wave to become relevant. EEG signals record electrical activity in the brain from several channels. Therefore, EEG signal processing needs consideration to spatial and temporal. Spatial is a signal between channels, while temporal is a sequence. Several methods were used, Convolutional Neural Networks (CNN) with various dimensions, Recurrent Neural Networks (RNN), and hybrid CNN-RNN. This paper proposed a hybrid 2D CNN-RNN method for identifying emotions from a parallel network of each wave. Two-dimensional CNN is used in channel extraction in a short time of the signal. Using short-time signals is intended to minimize the non-stationary characteristic of EEG signals. Meanwhile, the identification of emotions is carried out with RNN using the output of 2D CNN extraction. The modeling and testing used a dataset from SEED, with three emotion classes: positive, neutral, and negative. The experimental results show that using a split network of each wave increased accuracy from 80.92% to 84.71% and a decreased Loss value. While the use of 2D CNN only increased a less significant accuracy than 1D CNN. Evaluation of the waves shows that Beta and Gamma waves provided the best precision, 87-91%, and Theta waves gave 79-85% precision. Alpha wave degrades overall performance, which only has 56-61% precision, considering it is a mid-wave between Theta and Beta. It is necessary to choose the proper weight updating technique. Adaptive Moment (Adam) increased accuracy than AdaDelta, AdaGrad, and RMSprop.
某些情绪的脑电图(EEG)信号包含特定频带的波。因此,情感识别使用包含每个波的网络来变得相关。脑电图信号通过几个通道记录大脑中的电活动。因此,脑电信号的处理需要兼顾空间和时间。空间是通道之间的信号,而时间是序列。采用了不同维数的卷积神经网络(CNN)、递归神经网络(RNN)和CNN-RNN混合方法。本文提出了一种混合二维CNN-RNN方法,用于从每个波的并行网络中识别情绪。利用二维CNN在短时间内对信号进行信道提取。利用短时信号是为了尽量减少脑电信号的非平稳特性。同时,利用二维CNN提取的输出,用RNN进行情绪识别。建模和测试使用了SEED的数据集,其中包含三种情绪类别:积极、中性和消极。实验结果表明,对每一波使用分离网络,准确率从80.92%提高到84.71%,Loss值降低。而使用2D CNN只比1D CNN提高了不太显著的精度。波的评估表明,β波和伽马波的精度最高,为87-91%,θ波的精度为79-85%。α波降低了整体性能,只有56-61%的精度,考虑到它是Theta和Beta之间的中波。选择合适的权值更新技术是必要的。自适应时刻(Adam)比AdaDelta, AdaGrad和RMSprop提高精度。
{"title":"EEG Emotion Recognition using Parallel Hybrid Convolutional-Recurrent Neural Networks","authors":"Nursilva Aulianisa Putri, Esmeralda Contessa Djamal, Fikri Nugraha, Fatan Kasyidi","doi":"10.1109/ICoDSA55874.2022.9862853","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862853","url":null,"abstract":"Electroencephalogram (EEG) signals of certain emotions contain waves with specific frequency bands. So, emotion recognition uses the network containing each wave to become relevant. EEG signals record electrical activity in the brain from several channels. Therefore, EEG signal processing needs consideration to spatial and temporal. Spatial is a signal between channels, while temporal is a sequence. Several methods were used, Convolutional Neural Networks (CNN) with various dimensions, Recurrent Neural Networks (RNN), and hybrid CNN-RNN. This paper proposed a hybrid 2D CNN-RNN method for identifying emotions from a parallel network of each wave. Two-dimensional CNN is used in channel extraction in a short time of the signal. Using short-time signals is intended to minimize the non-stationary characteristic of EEG signals. Meanwhile, the identification of emotions is carried out with RNN using the output of 2D CNN extraction. The modeling and testing used a dataset from SEED, with three emotion classes: positive, neutral, and negative. The experimental results show that using a split network of each wave increased accuracy from 80.92% to 84.71% and a decreased Loss value. While the use of 2D CNN only increased a less significant accuracy than 1D CNN. Evaluation of the waves shows that Beta and Gamma waves provided the best precision, 87-91%, and Theta waves gave 79-85% precision. Alpha wave degrades overall performance, which only has 56-61% precision, considering it is a mid-wave between Theta and Beta. It is necessary to choose the proper weight updating technique. Adaptive Moment (Adam) increased accuracy than AdaDelta, AdaGrad, and RMSprop.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123951911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Expansion with Word2Vec for Topic Classification with Gradient Boosted Decision Tree on Twitter 使用Word2Vec进行Twitter上的梯度增强决策树主题分类的特征扩展
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862907
Dhuhita Trias Maulidia, Erwin Budi Setiawan
Online Social Networks have an essential role as a source of information, especially during an emergency. One of them is Twitter, a service that allows users to send and read messages but is limited in character. Thus, tweets that are written are very short and do not always use the correct grammar and use many variations of words. Using word variations can increase the likelihood of vocabulary mismatches and make tweets difficult to understand. One solution to overcome this problem is to expand the features of the tweet. The feature expansion on Twitter is a semantic addition to the process of multiplying the original text to make it look like large text. In this study, Word2Vec will be used with the Gradient Boosted Decision Tree Method to classify it. The expected result of this research is to reduce words or sentences in the classification of Twitter topics which are evaluated using the accuracy value, F1-Measure. The highest accuracy value in the application of feature expansion using Word2Vec with the Gradient Boosted Decision Tree classification method is 85.44%.
在线社交网络作为信息来源具有重要作用,特别是在紧急情况下。其中之一就是Twitter,这是一项允许用户发送和阅读信息的服务,但在字符上有限制。因此,写的tweet非常短,并不总是使用正确的语法,并且使用了许多不同的单词。使用单词变体会增加词汇不匹配的可能性,并使tweet难以理解。克服这个问题的一个解决方案是扩展tweet的特征。Twitter上的功能扩展是对原始文本的放大过程的语义添加,使其看起来像大文本。在本研究中,将使用Word2Vec和梯度提升决策树方法对其进行分类。本研究的预期结果是减少Twitter主题分类中的单词或句子,使用准确度值F1-Measure进行评估。在Word2Vec与梯度提升决策树分类方法的特征展开应用中,准确率最高值为85.44%。
{"title":"Feature Expansion with Word2Vec for Topic Classification with Gradient Boosted Decision Tree on Twitter","authors":"Dhuhita Trias Maulidia, Erwin Budi Setiawan","doi":"10.1109/ICoDSA55874.2022.9862907","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862907","url":null,"abstract":"Online Social Networks have an essential role as a source of information, especially during an emergency. One of them is Twitter, a service that allows users to send and read messages but is limited in character. Thus, tweets that are written are very short and do not always use the correct grammar and use many variations of words. Using word variations can increase the likelihood of vocabulary mismatches and make tweets difficult to understand. One solution to overcome this problem is to expand the features of the tweet. The feature expansion on Twitter is a semantic addition to the process of multiplying the original text to make it look like large text. In this study, Word2Vec will be used with the Gradient Boosted Decision Tree Method to classify it. The expected result of this research is to reduce words or sentences in the classification of Twitter topics which are evaluated using the accuracy value, F1-Measure. The highest accuracy value in the application of feature expansion using Word2Vec with the Gradient Boosted Decision Tree classification method is 85.44%.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115804933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predictive Model of Student Academic Performance in Private Higher Education Institution (Case in Undergraduate Management Program) 民办高校学生学习成绩预测模型(以本科管理专业为例)
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862822
S. Noviaristanti, G. Ramantoko, Akas Triono Hadi, Alfi Inayati
A private university must consider many things in accepting prospective students. Students enrolled are expected to stay until their studies are completed, have good academic performance, and be able to graduate on time. Private universities, from the beginning of the admission of new students, it is necessary to choose which prospective students are accepted to achieve the quality of education goals in the study program. This work aims to study the prediction class and class order of variable importance to students’ length of stay and academic performance labeled graduation. The method adopted falls into a technique called feature extraction. This study uses rank methods information gain and gain ratio to confront other methods χ2 and random forest. A dataset of 7676 observations, spanning the years from 2010-2021, students from a management program of a private university in Indonesia, is used. This study collects data from the faculty-specific department from the university’s academic admissions as inputs. The result of the study shows that all techniques vote IP/GPA (IP) as the most critical feature in predicting length of stay and graduation. Origin of High School, Selection Test Score, and Gender get split votes. This study is unique because it sheds light on the case particularity to Indonesia.
私立大学在录取未来的学生时必须考虑很多事情。被录取的学生应留到学业完成,学习成绩良好,并能按时毕业。私立大学,从招收新生开始,就要选择哪些准学生被录取,以达到学习计划中的教育质量目标。本研究旨在研究不同重要度的班级及班级顺序对学生留校时间及毕业成绩的预测。所采用的方法属于一种称为特征提取的技术。本研究采用秩法、信息增益法和增益比法对抗其他方法,χ2和随机森林。本研究使用了印度尼西亚一所私立大学管理项目的学生从2010年至2021年的7676个观察数据集。本研究收集了来自大学学术招生部门的教员特定部门的数据作为输入。研究结果表明,所有技术都认为IP/GPA (IP)是预测逗留时间和毕业时间的最关键特征。高中出身,选拔考试成绩和性别得到了分裂投票。这项研究的独特之处在于它揭示了印尼的个案特殊性。
{"title":"Predictive Model of Student Academic Performance in Private Higher Education Institution (Case in Undergraduate Management Program)","authors":"S. Noviaristanti, G. Ramantoko, Akas Triono Hadi, Alfi Inayati","doi":"10.1109/ICoDSA55874.2022.9862822","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862822","url":null,"abstract":"A private university must consider many things in accepting prospective students. Students enrolled are expected to stay until their studies are completed, have good academic performance, and be able to graduate on time. Private universities, from the beginning of the admission of new students, it is necessary to choose which prospective students are accepted to achieve the quality of education goals in the study program. This work aims to study the prediction class and class order of variable importance to students’ length of stay and academic performance labeled graduation. The method adopted falls into a technique called feature extraction. This study uses rank methods information gain and gain ratio to confront other methods χ2 and random forest. A dataset of 7676 observations, spanning the years from 2010-2021, students from a management program of a private university in Indonesia, is used. This study collects data from the faculty-specific department from the university’s academic admissions as inputs. The result of the study shows that all techniques vote IP/GPA (IP) as the most critical feature in predicting length of stay and graduation. Origin of High School, Selection Test Score, and Gender get split votes. This study is unique because it sheds light on the case particularity to Indonesia.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Electronic Nose and Neural Network Algorithm for Multiclass Classification of Meat Quality 肉质多类分类的电子鼻与神经网络算法
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862888
Alif Firman Juannata, Dedy Rahman Wijaya, Wawa Wikusna
Meat is a source of food that contains many nutrients. The nutritional content of meat consists of fat, calories, trans fat, saturated fat, calcium, protein, vitamin D, vitamin B6, vitamin B12, and magnesium. Due to its good nutritional content, the demand for meat in Indonesia has increased. However, there are problems with meat health. Meat is prone to spoilage and is quickly contaminated with microbes. The microbial population can spoil or spoil the meat. Checking the feasibility of meat is usually done by looking at the texture of the meat traditionally. However, this method is less effective in assessing the feasibility of meat. Therefore, another method is used to determine the feasibility of meat, namely using the Electronic Nose (e-nose) with the Neural Network (NN) algorithm. Because by using an e-nose, that can find out the smell or smell of decent meat. They are applying the NN algorithm for classification to work in a structured manner on each component needed to determine meat quality. These results can help people to get the meat of good quality. The experiment was carried out using a dataset that had a total of 2220 data. The experimental results show that using the NN algorithm with the e-nose sensor gets an accuracy of 0.92.
肉是一种含有许多营养物质的食物来源。肉类的营养成分包括脂肪、卡路里、反式脂肪、饱和脂肪、钙、蛋白质、维生素D、维生素B6、维生素B12和镁。由于其良好的营养成分,印尼对肉类的需求有所增加。然而,肉类健康也存在问题。肉容易变质,并很快被微生物污染。微生物群能使肉变质或变质。检验肉的可行性通常是通过观察肉的质地来完成的。然而,这种方法在评估肉类的可行性方面效果较差。因此,采用另一种方法来确定肉的可行性,即使用带有神经网络(NN)算法的电子鼻(e-nose)。因为通过电子鼻,它可以找到好肉的气味。他们正在应用神经网络算法进行分类,以结构化的方式对确定肉类质量所需的每个组件进行分类。这些结果可以帮助人们获得优质的肉类。实验使用了一个总共有2220个数据的数据集。实验结果表明,将神经网络算法与电子鼻传感器结合使用,准确率达到0.92。
{"title":"Electronic Nose and Neural Network Algorithm for Multiclass Classification of Meat Quality","authors":"Alif Firman Juannata, Dedy Rahman Wijaya, Wawa Wikusna","doi":"10.1109/ICoDSA55874.2022.9862888","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862888","url":null,"abstract":"Meat is a source of food that contains many nutrients. The nutritional content of meat consists of fat, calories, trans fat, saturated fat, calcium, protein, vitamin D, vitamin B6, vitamin B12, and magnesium. Due to its good nutritional content, the demand for meat in Indonesia has increased. However, there are problems with meat health. Meat is prone to spoilage and is quickly contaminated with microbes. The microbial population can spoil or spoil the meat. Checking the feasibility of meat is usually done by looking at the texture of the meat traditionally. However, this method is less effective in assessing the feasibility of meat. Therefore, another method is used to determine the feasibility of meat, namely using the Electronic Nose (e-nose) with the Neural Network (NN) algorithm. Because by using an e-nose, that can find out the smell or smell of decent meat. They are applying the NN algorithm for classification to work in a structured manner on each component needed to determine meat quality. These results can help people to get the meat of good quality. The experiment was carried out using a dataset that had a total of 2220 data. The experimental results show that using the NN algorithm with the e-nose sensor gets an accuracy of 0.92.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115247101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Eye Tracking and Emotion Recognition Using Multiple Spatial-Temporal Networks 基于多时空网络的眼动追踪与情绪识别
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862881
Eprian Junan Setianto, Esmeralda Contessa Djamal, Fikri Nugraha, Fatan Kasyidi
E-commerce products need to be measured by reader responses as a more objective evaluation. Some of them are through emotion expression identification or eye-tracking. Using these two variables from video capture provides a more thorough evaluation of the response to interest and emotion. This study proposes a spatial-temporal multi-networks method using Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) from video for 60 seconds. The results showed that two classes of emotional expression and four directions of eye-tracking gave better accuracy, namely 95.83%, compared to three classes of emotion and four directions of eye-tracking, which was 91.67%. Experiments also show that using CNN-LSTM significantly increased accuracy, while the weight correction technique does not have much effect. The evaluated F1 score shows the consistency of the proposed model.
电子商务产品需要通过读者的反应来衡量,作为一种更客观的评价。其中一些是通过情绪表达识别或眼球追踪。使用视频捕捉中的这两个变量可以更彻底地评估对兴趣和情感的反应。本研究提出了一种基于卷积神经网络(CNN)和循环神经网络(RNN)的时空多网络方法。结果表明,两类情绪表达和四方向眼动的准确率为95.83%,而三类情绪表达和四方向眼动的准确率为91.67%。实验还表明,使用CNN-LSTM可以显著提高准确率,而权值校正技术效果不明显。评估的F1分数显示了所提出模型的一致性。
{"title":"Eye Tracking and Emotion Recognition Using Multiple Spatial-Temporal Networks","authors":"Eprian Junan Setianto, Esmeralda Contessa Djamal, Fikri Nugraha, Fatan Kasyidi","doi":"10.1109/ICoDSA55874.2022.9862881","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862881","url":null,"abstract":"E-commerce products need to be measured by reader responses as a more objective evaluation. Some of them are through emotion expression identification or eye-tracking. Using these two variables from video capture provides a more thorough evaluation of the response to interest and emotion. This study proposes a spatial-temporal multi-networks method using Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) from video for 60 seconds. The results showed that two classes of emotional expression and four directions of eye-tracking gave better accuracy, namely 95.83%, compared to three classes of emotion and four directions of eye-tracking, which was 91.67%. Experiments also show that using CNN-LSTM significantly increased accuracy, while the weight correction technique does not have much effect. The evaluated F1 score shows the consistency of the proposed model.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122843865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AB-HT: An Ensemble Incremental Learning Algorithm for Network Intrusion Detection Systems 网络入侵检测系统的集成增量学习算法
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862833
Mahendra Data, M. Aritsugi
Most machine learning models used in network intrusion detection system (IDS) studies are batch models which require all targeted intrusions to be present in the training data. This approach is slow because computer networks produce massive amounts of data. Furthermore, new network intrusion variants continuously emerge. Retraining the model using these extensive and evolving data takes time and resources. This study proposes AB-HT: an ensemble incremental learning algorithm for IDSs. AB-HT utilizes incremental Adaptive Boosting (AdaBoost) and Hoeffding Tree algorithms. AB-HT model could detect new intrusions without retraining the model using old training data. Thus, it could reduce the computational resources needed to retrain the model while maintaining the model’s performance. We compared it to an AdaBoost-Decision Tree model, a batch learning model, to analyze the effectiveness of the incremental learning approach. Then we compared it to other incremental learning models, the Hoeffding Tree (HT) and Hoeffding Anytime Tree (HATT) models. The experimental results showed that the proposed incremental model had shorter training times than the AdaBoost-Decision Tree model in the long run. Also, on average, the AB-HT model has 18% higher F1-score values than the HT and HATT models. These advantages show that the AB-HT algorithm has promising potential to be used in the IDS field.
网络入侵检测系统(IDS)研究中使用的大多数机器学习模型都是批处理模型,要求所有目标入侵都存在于训练数据中。这种方法很慢,因为计算机网络会产生大量的数据。此外,新的网络入侵变体不断涌现。使用这些广泛且不断变化的数据重新训练模型需要时间和资源。本研究提出一种集成增量学习算法AB-HT。AB-HT采用增量自适应增强(AdaBoost)和Hoeffding树算法。AB-HT模型无需使用旧的训练数据对模型进行再训练,即可检测到新的入侵。因此,它可以减少重新训练模型所需的计算资源,同时保持模型的性能。我们将其与AdaBoost-Decision Tree模型(一种批量学习模型)进行比较,以分析增量学习方法的有效性。然后我们将其与其他增量学习模型,Hoeffding树(HT)和Hoeffding随时树(HATT)模型进行比较。实验结果表明,从长期来看,所提出的增量模型比AdaBoost-Decision Tree模型的训练时间更短。此外,AB-HT模型的平均f1评分值比HT和HATT模型高18%。这些优点表明AB-HT算法在IDS领域具有广阔的应用前景。
{"title":"AB-HT: An Ensemble Incremental Learning Algorithm for Network Intrusion Detection Systems","authors":"Mahendra Data, M. Aritsugi","doi":"10.1109/ICoDSA55874.2022.9862833","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862833","url":null,"abstract":"Most machine learning models used in network intrusion detection system (IDS) studies are batch models which require all targeted intrusions to be present in the training data. This approach is slow because computer networks produce massive amounts of data. Furthermore, new network intrusion variants continuously emerge. Retraining the model using these extensive and evolving data takes time and resources. This study proposes AB-HT: an ensemble incremental learning algorithm for IDSs. AB-HT utilizes incremental Adaptive Boosting (AdaBoost) and Hoeffding Tree algorithms. AB-HT model could detect new intrusions without retraining the model using old training data. Thus, it could reduce the computational resources needed to retrain the model while maintaining the model’s performance. We compared it to an AdaBoost-Decision Tree model, a batch learning model, to analyze the effectiveness of the incremental learning approach. Then we compared it to other incremental learning models, the Hoeffding Tree (HT) and Hoeffding Anytime Tree (HATT) models. The experimental results showed that the proposed incremental model had shorter training times than the AdaBoost-Decision Tree model in the long run. Also, on average, the AB-HT model has 18% higher F1-score values than the HT and HATT models. These advantages show that the AB-HT algorithm has promising potential to be used in the IDS field.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129704541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Diseases Video Recommender System using Keyword-Based Vector Space on Youtube and Vimeo 在Youtube和Vimeo上使用基于关键字的矢量空间的疾病视频推荐系统
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862826
Saskia Putri Ananda, Z. Baizal
Digital health solutions can be done in various ways, one of which is by searching for information on the internet. However, when someone searches on a search engine, the videos that are displayed are only videos based on keywords, without considering what kind of videos the user likes. Meanwhile, when searching for videos on YouTube, the recommended videos are only videos found on YouTube, so the range of recommended videos is limited. To overcome this problem, we build a web-based video recommender system about diseases that is more organized with a wider range of videos taken from YouTube and Vimeo. In addition, the system not only recommends videos based on the searched keywords but also recommends videos based on videos that are liked by users. The YouTube and Vimeo APIs are used to retrieve videos about the disease being searched for. We use content-based filtering for the recommendation process. Keyword-based vector space does some tasks: 1) converts the title and description of a video into a vector space, 2) calculates the cross product of the term frequency, 3) determines the proximity of the title using cosine similarity. The test results show that the average performance is 92.67% according to the purpose of the recommendation system made, namely novelty and relevance.
数字健康解决方案可以通过多种方式实现,其中一种方式是在互联网上搜索信息。然而,当有人在搜索引擎上搜索时,显示的视频只是基于关键字的视频,而不考虑用户喜欢什么样的视频。同时,在YouTube上搜索视频时,推荐的视频只是在YouTube上找到的视频,因此推荐的视频范围有限。为了克服这个问题,我们建立了一个基于网络的关于疾病的视频推荐系统,该系统更有条理地使用了来自YouTube和Vimeo的更广泛的视频。此外,系统不仅可以根据搜索的关键词进行视频推荐,还可以根据用户喜欢的视频进行视频推荐。YouTube和Vimeo api用于检索正在搜索的有关该疾病的视频。我们在推荐过程中使用基于内容的过滤。基于关键字的向量空间完成一些任务:1)将视频的标题和描述转换为向量空间,2)计算术语频率的叉积,3)使用余弦相似度确定标题的接近度。测试结果表明,根据所做推荐系统的目的,即新颖性和相关性,平均性能为92.67%。
{"title":"Diseases Video Recommender System using Keyword-Based Vector Space on Youtube and Vimeo","authors":"Saskia Putri Ananda, Z. Baizal","doi":"10.1109/ICoDSA55874.2022.9862826","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862826","url":null,"abstract":"Digital health solutions can be done in various ways, one of which is by searching for information on the internet. However, when someone searches on a search engine, the videos that are displayed are only videos based on keywords, without considering what kind of videos the user likes. Meanwhile, when searching for videos on YouTube, the recommended videos are only videos found on YouTube, so the range of recommended videos is limited. To overcome this problem, we build a web-based video recommender system about diseases that is more organized with a wider range of videos taken from YouTube and Vimeo. In addition, the system not only recommends videos based on the searched keywords but also recommends videos based on videos that are liked by users. The YouTube and Vimeo APIs are used to retrieve videos about the disease being searched for. We use content-based filtering for the recommendation process. Keyword-based vector space does some tasks: 1) converts the title and description of a video into a vector space, 2) calculates the cross product of the term frequency, 3) determines the proximity of the title using cosine similarity. The test results show that the average performance is 92.67% according to the purpose of the recommendation system made, namely novelty and relevance.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127943650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forecast of Aviation Traffic in Indonesia Based on Google Trend and Macroeconomic Data using Long Short-Term Memory 基于Google趋势和长短期记忆宏观经济数据的印尼航空交通预测
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862894
Muhammad Khanif Khafidli, A. Choiruddin
The COVID-19 pandemic has impacted many sectors. For example, in the aviation sector, flight traffic went down drastically with no certainty of being recovered. This calls for a methodology to predict the flight traffic to provide strategic planning on flight schedules operational, route structuring, and flight navigation service cost determination. However, current developments mainly focus on flight traffic forecasting based on historical data without considering external factors. In this study, we propose the Long Short-Term Memory (LSTM) technique to forecast flight traffic in Indonesia involving external variables such as macroeconomic variables and Google Trends. LSTM is proposed because of its flexibility to model non-linear time series data and has a good reputation for predictive accuracy. We first select a few among Google Trends and macroeconomic variables using nonlinearity analysis and cross-correlation function (CCF). We then employ the selected variables to forecast the flight traffic and compare it to the one using only historical flight traffic data. Our results concluded, based on the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE), that the model involving google trend outperforms the other three models, i.e., the model with only historical data, the model with macroeconomics, and the model with both macroeconomic and Google Trends. It is because, in this digital era, Google Trends can reflect population psychology in an up-to-date manner.
2019冠状病毒病大流行对许多行业产生了影响。例如,在航空部门,飞行交通急剧下降,没有恢复的把握。这需要一种方法来预测飞行交通,以提供飞行计划、航线结构和飞行导航服务成本确定方面的战略规划。然而,目前的发展主要集中在基于历史数据的飞行交通预测,而不考虑外部因素。在这项研究中,我们提出了长短期记忆(LSTM)技术来预测印尼的航班交通,包括宏观经济变量和谷歌趋势等外部变量。LSTM因其对非线性时间序列数据建模的灵活性和预测精度而被提出。我们首先使用非线性分析和互相关函数(CCF)从谷歌趋势和宏观经济变量中选择一些。然后,我们使用选定的变量来预测飞行流量,并将其与仅使用历史飞行流量数据的变量进行比较。我们的研究结果表明,基于均方根误差(RMSE)和平均绝对百分比误差(MAPE),涉及谷歌趋势的模型优于其他三种模型,即仅包含历史数据的模型、包含宏观经济的模型和同时包含宏观经济和谷歌趋势的模型。这是因为,在这个数字时代,谷歌趋势可以以最新的方式反映人口心理。
{"title":"Forecast of Aviation Traffic in Indonesia Based on Google Trend and Macroeconomic Data using Long Short-Term Memory","authors":"Muhammad Khanif Khafidli, A. Choiruddin","doi":"10.1109/ICoDSA55874.2022.9862894","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862894","url":null,"abstract":"The COVID-19 pandemic has impacted many sectors. For example, in the aviation sector, flight traffic went down drastically with no certainty of being recovered. This calls for a methodology to predict the flight traffic to provide strategic planning on flight schedules operational, route structuring, and flight navigation service cost determination. However, current developments mainly focus on flight traffic forecasting based on historical data without considering external factors. In this study, we propose the Long Short-Term Memory (LSTM) technique to forecast flight traffic in Indonesia involving external variables such as macroeconomic variables and Google Trends. LSTM is proposed because of its flexibility to model non-linear time series data and has a good reputation for predictive accuracy. We first select a few among Google Trends and macroeconomic variables using nonlinearity analysis and cross-correlation function (CCF). We then employ the selected variables to forecast the flight traffic and compare it to the one using only historical flight traffic data. Our results concluded, based on the Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE), that the model involving google trend outperforms the other three models, i.e., the model with only historical data, the model with macroeconomics, and the model with both macroeconomic and Google Trends. It is because, in this digital era, Google Trends can reflect population psychology in an up-to-date manner.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"172 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120941306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Influence of Sentiment on the Movement of Bank Mandiri (BMRI) Stock Price with Word2Vec Feature Expansion and the Naïve Bayes-Support Vector Machine (NBSVM) Classifier 基于Word2Vec特征扩展和Naïve贝叶斯-支持向量机(NBSVM)分类器的情绪对Bank Mandiri (BMRI)股价走势的影响
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862919
Ridhwan Nashir, E. B. Setiawan, D. Adytia
Sentiment towards a company is suspected of influencing the company's stock price movement. The sentiment is gathered from Twitter, Youtube, Facebook with some news media such as Consumer News and Business Channel (CNBC), Kontan, Detik, Cable News Network (CNN), Stockbit, and Liputan6 which discussed Bank Mandiri. Word2Vec is used to reduce vocabulary errors in sentiment analysis using word embedding. The Word2Vec model was built using the combined corpus of Wikipedia articles and scraped data with a total of 474,277 lines of text data. This study indicates that the correlation between sentiment and stock movements of Bank Mandiri has a positive correlation with a low relationship, indicated by the Spearman Rank test coefficient value of 0.138 and 0.123 for positive and negative sentiment, respectively. The Naïve Bayes-Support Vector Machine (NBSVM) classification model outperforms the Naïve Bayes and Support Vector Machine methods, where the baseline NBSVM gets an accuracy of 64.67%, and after the feature expansion process, the accuracy becomes 70.42%, an increase of 5.75%. This study proves there is a correlation between sentiment and the movement of Bank Mandiri's shares, and Word2Vec feature expansion can increase the model's accuracy.
人们怀疑对一家公司的情绪会影响该公司的股价走势。这些观点来自Twitter、Youtube、Facebook和一些新闻媒体,如消费者新闻和商业频道(CNBC)、Kontan、Detik、有线电视新闻网(CNN)、Stockbit和Liputan6,这些媒体讨论了Mandiri银行。Word2Vec是一种利用词嵌入来减少情感分析中的词汇错误的方法。Word2Vec模型是使用维基百科文章和抓取数据的组合语料库构建的,总共有474,277行文本数据。本研究表明,情绪与曼迪利银行股票走势的相关关系为正相关,但关系较低,其正情绪和负情绪的Spearman Rank检验系数分别为0.138和0.123。Naïve贝叶斯-支持向量机(NBSVM)分类模型优于Naïve贝叶斯和支持向量机方法,其中基线NBSVM的准确率为64.67%,经过特征展开处理后准确率为70.42%,提高了5.75%。本研究证明情绪与Mandiri银行股票走势之间存在相关性,Word2Vec特征扩展可以提高模型的准确性。
{"title":"The Influence of Sentiment on the Movement of Bank Mandiri (BMRI) Stock Price with Word2Vec Feature Expansion and the Naïve Bayes-Support Vector Machine (NBSVM) Classifier","authors":"Ridhwan Nashir, E. B. Setiawan, D. Adytia","doi":"10.1109/ICoDSA55874.2022.9862919","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862919","url":null,"abstract":"Sentiment towards a company is suspected of influencing the company's stock price movement. The sentiment is gathered from Twitter, Youtube, Facebook with some news media such as Consumer News and Business Channel (CNBC), Kontan, Detik, Cable News Network (CNN), Stockbit, and Liputan6 which discussed Bank Mandiri. Word2Vec is used to reduce vocabulary errors in sentiment analysis using word embedding. The Word2Vec model was built using the combined corpus of Wikipedia articles and scraped data with a total of 474,277 lines of text data. This study indicates that the correlation between sentiment and stock movements of Bank Mandiri has a positive correlation with a low relationship, indicated by the Spearman Rank test coefficient value of 0.138 and 0.123 for positive and negative sentiment, respectively. The Naïve Bayes-Support Vector Machine (NBSVM) classification model outperforms the Naïve Bayes and Support Vector Machine methods, where the baseline NBSVM gets an accuracy of 64.67%, and after the feature expansion process, the accuracy becomes 70.42%, an increase of 5.75%. This study proves there is a correlation between sentiment and the movement of Bank Mandiri's shares, and Word2Vec feature expansion can increase the model's accuracy.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133264183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
What Affects User Satisfaction of Payroll Information Systems? 影响工资信息系统用户满意度的因素?
Pub Date : 2022-07-06 DOI: 10.1109/ICoDSA55874.2022.9862938
R. V. Priyanka Hafidz, Amelia Setiawan
With digital developments that continue to occur, it can affect several parts of the institution or company by changing the system from manual to online. With this progress, government institutions have also begun to implement a computerized system and payroll is one of the sections that is affected by it. For example, payroll which is usually given in person, can now be sent via bank transfer. This study was conducted to analyze the quality of information, system quality, and information system security on user satisfaction in the payroll section of a government institution, namely, the Center for Financial Transaction Reports and Analysis. This study will use data that has been processed in two ways. The first is one of the functions of Microsoft Excel, namely data analysis and the second uses SEM PLS analysis to test the three pre-determined hypotheses. The results of hypothesis testing indicate that the quality of information and information system security affect user satisfaction significantly, while system quality does not substantially affect user satisfaction. The limitation of this research is the limited number of employees in the payroll section of the Financial Transaction Reports and Analysis Center. Suggestions for further research are to use a more general section that has more employees.
随着数字化的不断发展,它可以通过将系统从手动更改为在线来影响机构或公司的几个部分。随着这一进展,政府机构也开始实施计算机化系统,工资是受其影响的部分之一。例如,通常亲自发放的工资单现在可以通过银行转账发送。本研究旨在分析某政府机构,即金融交易报告与分析中心的工资部门的信息质量、系统质量和信息系统安全对用户满意度的影响。本研究将使用以两种方式处理的数据。第一个是Microsoft Excel的功能之一,即数据分析,第二个是使用SEM PLS分析来检验三个预先确定的假设。假设检验结果表明,信息质量和信息系统安全对用户满意度影响显著,而系统质量对用户满意度影响不显著。本研究的局限性在于金融交易报告和分析中心的工资单部分的员工数量有限。对进一步研究的建议是使用一个更一般的部门,有更多的员工。
{"title":"What Affects User Satisfaction of Payroll Information Systems?","authors":"R. V. Priyanka Hafidz, Amelia Setiawan","doi":"10.1109/ICoDSA55874.2022.9862938","DOIUrl":"https://doi.org/10.1109/ICoDSA55874.2022.9862938","url":null,"abstract":"With digital developments that continue to occur, it can affect several parts of the institution or company by changing the system from manual to online. With this progress, government institutions have also begun to implement a computerized system and payroll is one of the sections that is affected by it. For example, payroll which is usually given in person, can now be sent via bank transfer. This study was conducted to analyze the quality of information, system quality, and information system security on user satisfaction in the payroll section of a government institution, namely, the Center for Financial Transaction Reports and Analysis. This study will use data that has been processed in two ways. The first is one of the functions of Microsoft Excel, namely data analysis and the second uses SEM PLS analysis to test the three pre-determined hypotheses. The results of hypothesis testing indicate that the quality of information and information system security affect user satisfaction significantly, while system quality does not substantially affect user satisfaction. The limitation of this research is the limited number of employees in the payroll section of the Financial Transaction Reports and Analysis Center. Suggestions for further research are to use a more general section that has more employees.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115539285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Data Science and Its Applications (ICoDSA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1