首页 > 最新文献

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)最新文献

英文 中文
Forecasting Acceleration of Data Transfer with Fog Computing for Resource Efficiency in Data Centers 基于雾计算的数据中心资源效率数据传输加速预测
N. Zendrato, M. Zarlis, O. S. Sitompul, E. M. Zamzami
Accelerate of data transfer always be a problem in fog computing especially workload datacenter This research predicts server performance data on fog computing using linear regression methods. Predictions are made on variables that affect the speed of data transfer namely the number of CPU cores, CPU capacity, memory used based on this variable is used as an attribute and data transfer as a label. With this research the performance of data transfer speeds can be predicted before use. This method provides an improvement in the error value compared of other forecasting methods Thus the process of data transfer in fog computing can be more effective and efficient
数据传输速度的加快一直是雾计算特别是工作负载数据中心中存在的问题,本研究采用线性回归方法预测雾计算中服务器性能数据。对影响数据传输速度的变量进行预测,即CPU内核的数量,CPU容量,基于此变量使用的内存作为属性和数据传输作为标签。通过本研究,可以在使用前预测数据传输速度的性能。与其他预测方法相比,该方法的误差值有所改善,从而使雾计算中的数据传输过程更加有效和高效
{"title":"Forecasting Acceleration of Data Transfer with Fog Computing for Resource Efficiency in Data Centers","authors":"N. Zendrato, M. Zarlis, O. S. Sitompul, E. M. Zamzami","doi":"10.1109/DATABIA50434.2020.9190326","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190326","url":null,"abstract":"Accelerate of data transfer always be a problem in fog computing especially workload datacenter This research predicts server performance data on fog computing using linear regression methods. Predictions are made on variables that affect the speed of data transfer namely the number of CPU cores, CPU capacity, memory used based on this variable is used as an attribute and data transfer as a label. With this research the performance of data transfer speeds can be predicted before use. This method provides an improvement in the error value compared of other forecasting methods Thus the process of data transfer in fog computing can be more effective and efficient","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117321978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Analysis of the Kruskal and Boruvka Algorithms in Solving Minimum Spanning Tree on Complete Graph 求解完全图上最小生成树的Kruskal和Boruvka算法的比较分析
D. Rachmawati, Herriyance, Frederik Yan Putra Pakpahan
The problem that is often encountered in daily life is connecting all points in one work domain with a low optimization value, for example, the most economical cost required to connect a water pipe to each house in an area. To solve this problem, a system that can find a path that connects all points in one work domain with the lowest optimization is needed. In this study, the system was built using two algorithms, namely, Kruskal and Boruvka algorithms, and a complete graph is used as a modeling of the problem. Using these two algorithms, the system will find the optimum path that connects all points in the complete graph; then, the system also displays a comparison between the two algorithms in finding the optimum route. The data used is dynamic, meaning the users can enter and change the value of the side of the complete graph as needed. From the tests that have been done, it is found that the Kruskal algorithm is more effective than the Boruvka to find the minimum spanning tree in a complete graph with some nodes, and sides are 15 points and 105 sides.
在日常生活中经常遇到的问题是连接一个工作域中所有点的优化值较低,例如连接一个区域内每个房屋的水管所需的最经济成本。为了解决这一问题,需要一个能够以最低优化度找到连接一个工作域中所有点的路径的系统。本研究采用Kruskal算法和Boruvka算法两种算法构建系统,并采用完全图对问题进行建模。使用这两种算法,系统将找到连接完整图中所有点的最优路径;然后,对两种算法在寻找最优路径方面进行了比较。所使用的数据是动态的,这意味着用户可以根据需要输入和更改完整图形的边值。从已经完成的测试中发现,对于有一些节点,边数为15点和105条边的完全图,Kruskal算法比Boruvka算法更有效地找到最小生成树。
{"title":"Comparative Analysis of the Kruskal and Boruvka Algorithms in Solving Minimum Spanning Tree on Complete Graph","authors":"D. Rachmawati, Herriyance, Frederik Yan Putra Pakpahan","doi":"10.1109/DATABIA50434.2020.9190504","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190504","url":null,"abstract":"The problem that is often encountered in daily life is connecting all points in one work domain with a low optimization value, for example, the most economical cost required to connect a water pipe to each house in an area. To solve this problem, a system that can find a path that connects all points in one work domain with the lowest optimization is needed. In this study, the system was built using two algorithms, namely, Kruskal and Boruvka algorithms, and a complete graph is used as a modeling of the problem. Using these two algorithms, the system will find the optimum path that connects all points in the complete graph; then, the system also displays a comparison between the two algorithms in finding the optimum route. The data used is dynamic, meaning the users can enter and change the value of the side of the complete graph as needed. From the tests that have been done, it is found that the Kruskal algorithm is more effective than the Boruvka to find the minimum spanning tree in a complete graph with some nodes, and sides are 15 points and 105 sides.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127906973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Emotion Analysis and Classification of Movie Reviews Using Data Mining 基于数据挖掘的电影评论情感分析与分类
Kamoltep Moolthaisong, Wararat Songpan
This paper proposes a model for classification of movie reviews by using Data Mining. The paper also proposes the method of creating word cloud from word frequency in movie reviews, for the purpose of partially helping in analyzing for interested topic and opinion of reviewer. The research uses movie review data from Metacritic website. The review data consists of reviews from 21 movies, separated into two parts to be used as training set and test set. Training set have 462 reviews and test set have 238 reviews. The data preparation process started collecting review data by removing special symbols case and preprocessing into Weka program. Change the review text into structured data by using StringToWordVector filter. This process includes removing stop words with Rainbow stop words list, change word that have the same root origin into word stem by using Snowball Stemmer algorithm and then given weight value by using TF-IDF technique. After that, Naïve bayes, Random Forest and J48 algorithms were used to classify the review data into positive and negative groups. The experimental result given is 80.25%, 79.83% and 68.06%, respectively.
本文提出了一种基于数据挖掘的影评分类模型。本文还提出了利用影评中的词频创建词云的方法,以部分帮助影评者分析感兴趣的话题和意见。这项研究使用了Metacritic网站上的电影评论数据。评论数据由21部电影的评论组成,分为两部分作为训练集和测试集。训练集有462次审查,测试集有238次审查。数据准备过程通过去除特殊符号case并预处理到Weka程序中开始收集评审数据。通过使用StringToWordVector过滤器将审查文本更改为结构化数据。该过程包括使用Rainbow停止词列表删除停止词,使用Snowball Stemmer算法将具有相同词根的单词更改为词干,然后使用TF-IDF技术赋予权重值。之后使用Naïve bayes、Random Forest和J48算法将评论数据分为正面和负面两组。实验结果分别为80.25%、79.83%和68.06%。
{"title":"Emotion Analysis and Classification of Movie Reviews Using Data Mining","authors":"Kamoltep Moolthaisong, Wararat Songpan","doi":"10.1109/DATABIA50434.2020.9190363","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190363","url":null,"abstract":"This paper proposes a model for classification of movie reviews by using Data Mining. The paper also proposes the method of creating word cloud from word frequency in movie reviews, for the purpose of partially helping in analyzing for interested topic and opinion of reviewer. The research uses movie review data from Metacritic website. The review data consists of reviews from 21 movies, separated into two parts to be used as training set and test set. Training set have 462 reviews and test set have 238 reviews. The data preparation process started collecting review data by removing special symbols case and preprocessing into Weka program. Change the review text into structured data by using StringToWordVector filter. This process includes removing stop words with Rainbow stop words list, change word that have the same root origin into word stem by using Snowball Stemmer algorithm and then given weight value by using TF-IDF technique. After that, Naïve bayes, Random Forest and J48 algorithms were used to classify the review data into positive and negative groups. The experimental result given is 80.25%, 79.83% and 68.06%, respectively.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"32 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116415323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification 基于fastText的高效文本分类方法对印尼语文档进行分类
A. Amalia, O. S. Sitompul, E. Nababan, T. Mantoro
Text classification using a simple word representation with a linear classifier often considered as strong baselines to gain the best performances. However, a simple word representation like Bag of Word (BOW) has a deficiency of curse dimensionality, so it is only suitable for small datasets. BOW also needs some dependent pre-processing steps like stopwords-removal and stemming. Therefore, the BOW model cannot be implemented automatically because of the dependency in a specific language. On the other hand, deep neural network classifiers can eliminate the pre-processing prerequisite, but this model not efficient in time processing and need a large dataset for the learning process. It becomes a challenge for language that has limitation resources like Bahasa Indonesia. Another novel approach of text classifier is using the fastText model for text classification. This model can minimize pre-processing dependencies and more efficient in training time processing. However, there hasn't been much observation whether the fastText model outperformed the BOW model for small datasets. This paper aims to compare text classification using the TFIDF model as one of the BOW models with a fastText model for 500 news articles in Bahasa Indonesia. The result of this study showed both models gain an outstanding performance, which is 0.97 F-Score. The TFIDF model needs longer pre-processing stages and requiring more training time. Meanwhile, the fastText model only needs to tune some hyperparameters and get similar performance results to the TFIDF model. Based on this study, we can conclude that the fastText model is efficient text classification.
使用简单的词表示和线性分类器进行文本分类通常被认为是获得最佳性能的强基线。然而,像Bag of word (BOW)这样的简单的词表示存在诅咒维数不足的问题,因此它只适用于小数据集。BOW还需要一些相关的预处理步骤,如停词删除和词干提取。因此,由于特定语言的依赖性,BOW模型不能自动实现。另一方面,深度神经网络分类器可以消除预处理的前提条件,但该模型在时间处理上效率不高,并且需要大量的数据集进行学习。对于像印尼语这样资源有限的语言来说,这是一个挑战。文本分类器的另一种新方法是使用fastText模型进行文本分类。该模型可以减少预处理依赖,提高训练时间处理效率。然而,对于小数据集,fastText模型是否优于BOW模型还没有太多的观察。本文旨在比较使用TFIDF模型作为BOW模型之一的文本分类与使用fastText模型的500篇印尼语新闻文章。本研究的结果表明,两种模型都获得了出色的性能,F-Score为0.97。TFIDF模型需要较长的预处理阶段和较长的训练时间。同时,fastText模型只需要调优一些超参数,就可以获得与TFIDF模型相似的性能结果。基于本研究,我们可以得出fastText模型是一种高效的文本分类方法。
{"title":"An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification","authors":"A. Amalia, O. S. Sitompul, E. Nababan, T. Mantoro","doi":"10.1109/DATABIA50434.2020.9190447","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190447","url":null,"abstract":"Text classification using a simple word representation with a linear classifier often considered as strong baselines to gain the best performances. However, a simple word representation like Bag of Word (BOW) has a deficiency of curse dimensionality, so it is only suitable for small datasets. BOW also needs some dependent pre-processing steps like stopwords-removal and stemming. Therefore, the BOW model cannot be implemented automatically because of the dependency in a specific language. On the other hand, deep neural network classifiers can eliminate the pre-processing prerequisite, but this model not efficient in time processing and need a large dataset for the learning process. It becomes a challenge for language that has limitation resources like Bahasa Indonesia. Another novel approach of text classifier is using the fastText model for text classification. This model can minimize pre-processing dependencies and more efficient in training time processing. However, there hasn't been much observation whether the fastText model outperformed the BOW model for small datasets. This paper aims to compare text classification using the TFIDF model as one of the BOW models with a fastText model for 500 news articles in Bahasa Indonesia. The result of this study showed both models gain an outstanding performance, which is 0.97 F-Score. The TFIDF model needs longer pre-processing stages and requiring more training time. Meanwhile, the fastText model only needs to tune some hyperparameters and get similar performance results to the TFIDF model. Based on this study, we can conclude that the fastText model is efficient text classification.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123637166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dijkstra's and A-Star in Finding the Shortest Path: a Tutorial Dijkstra's和a - star在寻找最短路径:教程
Ade Candra, M. A. Budiman, Kevin Hartanto
As one form of the greedy algorithm, Dijkstra's can handle the shortest path search with optimum result in longer search time. Dijkstra's is contrary to A-Star, a best-first search algorithm, which can handle the shortest path search with a faster time but not always optimum. By looking at the advantages and disadvantages of Dijkstra's and A-Star, this tutorial discusses the implementation of the two algorithms in finding the shortest path in routes selection between 24 SPBU (gas stations). The routes are located in Medan City and represented in a directed graph. Moreover, the authors compare Dijkstra's and A-star based on the complexity of Big-Theta (Θ) and running time. The results show that the shortest path search between SPBU can be solved with Dijkstra's and A-Star, where in some cases, the routes produced by the two algorithms are different so that the total distance generated is also different. In this case, the running time of A-Star is proven to be faster than Dijkstra's, and it is following A-Star principle which selects the location point based on the best heuristic value while Dijkstra's does not. For the complexity, Dijkstra's is $Theta(mathrm{n}^{2})$ and A-Star is $Theta(mathrm{m}ast mathrm{n})$, where $0leq mathrm{m}leq mathrm{n}$.
Dijkstra算法作为贪心算法的一种形式,可以在较长的搜索时间内处理最短路径搜索并获得最优结果。Dijkstra算法与a - star算法相反,a - star算法可以更快地处理最短路径搜索,但并不总是最优的。本教程通过分析Dijkstra算法和A-Star算法的优缺点,讨论了这两种算法在24个加油站(SPBU)之间的路线选择中寻找最短路径的实现。这些路线位于棉兰市,用有向图表示。此外,作者还根据Big-Theta的复杂性(Θ)和运行时间对Dijkstra和A-star进行了比较。结果表明,SPBU之间的最短路径搜索可以用Dijkstra算法和A-Star算法求解,但在某些情况下,两种算法产生的路径不同,从而产生的总距离也不同。在这种情况下,证明了A-Star算法的运行时间比Dijkstra算法快,并且遵循了基于最佳启发式值选择定位点的A-Star原则,而Dijkstra算法则没有。对于复杂性,Dijkstra的是$Theta(mathrm{n}^{2})$, A-Star的是$Theta(mathrm{m}ast mathrm{n})$,其中$0leq mathrm{m}leq mathrm{n}$。
{"title":"Dijkstra's and A-Star in Finding the Shortest Path: a Tutorial","authors":"Ade Candra, M. A. Budiman, Kevin Hartanto","doi":"10.1109/DATABIA50434.2020.9190342","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190342","url":null,"abstract":"As one form of the greedy algorithm, Dijkstra's can handle the shortest path search with optimum result in longer search time. Dijkstra's is contrary to A-Star, a best-first search algorithm, which can handle the shortest path search with a faster time but not always optimum. By looking at the advantages and disadvantages of Dijkstra's and A-Star, this tutorial discusses the implementation of the two algorithms in finding the shortest path in routes selection between 24 SPBU (gas stations). The routes are located in Medan City and represented in a directed graph. Moreover, the authors compare Dijkstra's and A-star based on the complexity of Big-Theta (Θ) and running time. The results show that the shortest path search between SPBU can be solved with Dijkstra's and A-Star, where in some cases, the routes produced by the two algorithms are different so that the total distance generated is also different. In this case, the running time of A-Star is proven to be faster than Dijkstra's, and it is following A-Star principle which selects the location point based on the best heuristic value while Dijkstra's does not. For the complexity, Dijkstra's is $Theta(mathrm{n}^{2})$ and A-Star is $Theta(mathrm{m}ast mathrm{n})$, where $0leq mathrm{m}leq mathrm{n}$.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117278955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
The Use of Meteorology Data in Short-Term Prediction of Wind Speed for Wind Turbine Using Elman Recurrent Neural Network Elman递归神经网络在风力机短期风速预报中的应用
R. Dinzi, Muhammad Yusuf, F. Fahmi
Wind energy is one of the promising renewable energy sources that are ideal for daily use, especially in the area with sufficient wind blows like Indonesia. Wind speed caused by wind energy is a driving force for wind turbines to produce electrical power. One problem in wind turbine management is to predict the speed of the wind in the short term for efficiency. In this research, forecasting of short-term wind speed was done in the city of Sibolga by uses an Elman recurrent neural network based on meteorological data: temperature, humidity, and air pressure to predict over the next ten days. Four prediction models were developed for this purpose based on training parameters and dataset used. The wind speed forecasting produces MAPE error values of 20.02% in the first model, 23.31% in the second model, 18.15% in the third model, and 12.51% in the fourth model. The fourth model was capable of predicting with the lowest error and, therefore, considered to be useful for wind turbine management.
风能是一种很有前途的可再生能源,非常适合日常使用,特别是在像印度尼西亚这样风力充足的地区。风能产生的风速是风力发电机发电的动力。风力发电机管理的一个问题是如何预测短期内的风速以提高效率。在这项研究中,利用Elman递归神经网络对Sibolga市的短期风速进行了预测,该网络基于气象数据:温度、湿度和气压来预测未来十天的风速。基于训练参数和使用的数据集,为此开发了四个预测模型。风速预报的MAPE误差值在第一个模型中为20.02%,在第二个模型中为23.31%,在第三个模型中为18.15%,在第四个模型中为12.51%。第四个模型能够以最小的误差进行预测,因此被认为对风力涡轮机管理有用。
{"title":"The Use of Meteorology Data in Short-Term Prediction of Wind Speed for Wind Turbine Using Elman Recurrent Neural Network","authors":"R. Dinzi, Muhammad Yusuf, F. Fahmi","doi":"10.1109/DATABIA50434.2020.9190628","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190628","url":null,"abstract":"Wind energy is one of the promising renewable energy sources that are ideal for daily use, especially in the area with sufficient wind blows like Indonesia. Wind speed caused by wind energy is a driving force for wind turbines to produce electrical power. One problem in wind turbine management is to predict the speed of the wind in the short term for efficiency. In this research, forecasting of short-term wind speed was done in the city of Sibolga by uses an Elman recurrent neural network based on meteorological data: temperature, humidity, and air pressure to predict over the next ten days. Four prediction models were developed for this purpose based on training parameters and dataset used. The wind speed forecasting produces MAPE error values of 20.02% in the first model, 23.31% in the second model, 18.15% in the third model, and 12.51% in the fourth model. The fourth model was capable of predicting with the lowest error and, therefore, considered to be useful for wind turbine management.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121146977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accuracy Analysis on Images Retrieval System using Radial Basis Function Algorithm and Coefficient Correlation 基于径向基函数算法和系数相关的图像检索系统精度分析
Khairul Abdi Sinuraya, S. Suwilo, M. S. Lydia
The image retrieval system is a system used for the process of retrieval of images based on information contained in the image files. Radial Basis Function (RBF) is one of the Neural Network methods used in the image retrieval system, is known for the capability to produce image information search properly. In determining the initial centroid value, the RBF method uses K-Means Clustering. This algorithm has a weakness in determining the right initial centroid value to get proper classification results in image retrieval. In this paper, the Coefficient Correlation (CC) method is used in determining the initial centroid value of the input data following the similarity of the data. Data with the highest degree of similarity compared to other data used as the initial centroid value. Data used in this study are leaf image data of 500 images with 10 categories of leaf types, and each sample contained 50 images. Based on the testing results, an increase in image retrieval accuracy with an average of 90.92% using the RBF and CC methods compared the image retrieval results using the RBF and K-Means Clustering methods gained an average accuracy of 85.96%.
图像检索系统是一种基于图像文件中包含的信息进行图像检索的系统。径向基函数(RBF)是一种应用于图像检索系统的神经网络方法,以其产生图像信息的能力而闻名。在确定初始质心值时,RBF方法使用K-Means聚类。在图像检索中,该算法在确定合适的初始质心值以获得合适的分类结果方面存在不足。本文采用相关系数法(Coefficient Correlation, CC)根据数据的相似度确定输入数据的初始质心值。与其他数据相比,具有最高相似度的数据用作初始质心值。本研究使用的数据是500幅图像的叶片图像数据,分为10类叶片类型,每个样本包含50幅图像。测试结果表明,与RBF和K-Means聚类方法的图像检索结果相比,RBF和CC方法的图像检索准确率平均提高了90.92%,平均准确率为85.96%。
{"title":"Accuracy Analysis on Images Retrieval System using Radial Basis Function Algorithm and Coefficient Correlation","authors":"Khairul Abdi Sinuraya, S. Suwilo, M. S. Lydia","doi":"10.1109/DATABIA50434.2020.9190227","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190227","url":null,"abstract":"The image retrieval system is a system used for the process of retrieval of images based on information contained in the image files. Radial Basis Function (RBF) is one of the Neural Network methods used in the image retrieval system, is known for the capability to produce image information search properly. In determining the initial centroid value, the RBF method uses K-Means Clustering. This algorithm has a weakness in determining the right initial centroid value to get proper classification results in image retrieval. In this paper, the Coefficient Correlation (CC) method is used in determining the initial centroid value of the input data following the similarity of the data. Data with the highest degree of similarity compared to other data used as the initial centroid value. Data used in this study are leaf image data of 500 images with 10 categories of leaf types, and each sample contained 50 images. Based on the testing results, an increase in image retrieval accuracy with an average of 90.92% using the RBF and CC methods compared the image retrieval results using the RBF and K-Means Clustering methods gained an average accuracy of 85.96%.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implementing Cosine Similarity Algorithm to Increase the Flexibility of Hematology Text Report Generation 实现余弦相似度算法,提高血液学文本报表生成的灵活性
Aulia Amirullah, I. Aulia, Dedy Arisandy
The previous hematology textual summary representation system, which applies template based method of Natural Language Generation to produce hematology laboratory test results in natural language representation, was at the cutting edge to generate more detailed hematology reports. The produced reports manage to provide texts which break down the critical components and abnormal components of blood found in conventional hematology test results. The produced reports in natural language representation aimed to help patients to easily define, spot and point out which blood components are acting up. Templates provide slots to generate every single sentence to be replaced by the data that we provide. However, the previous system is only able to produce fixed unflexible slots of blood components which are defined by the system, named T-Gen System. It nearly got off the ground as it is very unflexible because the produced templates cannot hold all of both critical and abnormal components found in a produced laboratory examination result. Therefore, this research project implements cosine similarity algorithm to expand template flexibility. Testing and evaluation were carried out manually by examining given components into the system which will be added consecutively. The testing shows that every blood component which was added consecutively succesfully appeared in the produced texts.
先前的血液学文本摘要表示系统采用基于模板的自然语言生成方法,以自然语言表示血液学实验室检测结果,在生成更详细的血液学报告方面处于领先地位。产生的报告设法提供文本,打破了血液的关键成分和异常成分发现在常规血液学测试结果。生成的报告以自然语言表示,旨在帮助患者轻松定义、发现并指出哪些血液成分出现了问题。模板提供插槽来生成每个句子,这些句子将被我们提供的数据所取代。然而,以前的系统只能产生由系统定义的固定的不灵活的血液成分槽,称为T-Gen系统。由于生产的模板不能容纳在生产的实验室检查结果中发现的所有关键和异常组件,因此它非常不灵活,几乎脱离了地面。因此,本研究项目采用余弦相似度算法来扩展模板的灵活性。测试和评估是通过检查系统中给定的组件来进行的,这些组件将连续添加。实验表明,连续添加的每一种血液成分都成功地出现在生成的文本中。
{"title":"Implementing Cosine Similarity Algorithm to Increase the Flexibility of Hematology Text Report Generation","authors":"Aulia Amirullah, I. Aulia, Dedy Arisandy","doi":"10.1109/DATABIA50434.2020.9190549","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190549","url":null,"abstract":"The previous hematology textual summary representation system, which applies template based method of Natural Language Generation to produce hematology laboratory test results in natural language representation, was at the cutting edge to generate more detailed hematology reports. The produced reports manage to provide texts which break down the critical components and abnormal components of blood found in conventional hematology test results. The produced reports in natural language representation aimed to help patients to easily define, spot and point out which blood components are acting up. Templates provide slots to generate every single sentence to be replaced by the data that we provide. However, the previous system is only able to produce fixed unflexible slots of blood components which are defined by the system, named T-Gen System. It nearly got off the ground as it is very unflexible because the produced templates cannot hold all of both critical and abnormal components found in a produced laboratory examination result. Therefore, this research project implements cosine similarity algorithm to expand template flexibility. Testing and evaluation were carried out manually by examining given components into the system which will be added consecutively. The testing shows that every blood component which was added consecutively succesfully appeared in the produced texts.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115190277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Survey on The Accuracy of Machine Learning Techniques for Intrusion and Anomaly Detection on Public Data Sets 机器学习技术在公共数据集入侵和异常检测中的准确性研究
R. T. Adek, M. Ula
Machine learning (ML) is growing popularity due to their ability to solve the problem in many areas. In digital world including information security, some intrusion detection systems (IDS) are being upgraded with Machine Learning elements for improving the performance of the system. It is known that is very limited real data set available for information security (IS) research. Therefore, many IS researches relies on the public data set. However public data set have many limitations. The aim of this paper is to analyze the accuracy and performance of the Machine Learning in intrusion detection system and to highlight some recommendation for future research. This study involves an academic papers systematic literature review on intrusion detection related to the application of machine learning methods using public data set. This paper elaborates the used of Machine Learning algorithms in intrusion detection system, highlighting the accuracy and the limitations of the methods for detecting attackers. The goal of this research is to provide an academic base for future research in the adoption of machine learning methods for IDS.
机器学习(ML)越来越受欢迎,因为它们能够解决许多领域的问题。在包括信息安全在内的数字世界中,一些入侵检测系统(IDS)正在升级机器学习元素,以提高系统的性能。众所周知,可用于信息安全研究的真实数据集非常有限。因此,许多IS研究依赖于公共数据集。然而,公共数据集有许多局限性。本文的目的是分析机器学习在入侵检测系统中的准确性和性能,并对未来的研究提出一些建议。本研究涉及一篇学术论文,系统地综述了使用公共数据集的机器学习方法应用的入侵检测相关文献。本文阐述了机器学习算法在入侵检测系统中的应用,强调了检测攻击者方法的准确性和局限性。本研究的目的是为未来在IDS中采用机器学习方法的研究提供一个学术基础。
{"title":"A Survey on The Accuracy of Machine Learning Techniques for Intrusion and Anomaly Detection on Public Data Sets","authors":"R. T. Adek, M. Ula","doi":"10.1109/DATABIA50434.2020.9190436","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190436","url":null,"abstract":"Machine learning (ML) is growing popularity due to their ability to solve the problem in many areas. In digital world including information security, some intrusion detection systems (IDS) are being upgraded with Machine Learning elements for improving the performance of the system. It is known that is very limited real data set available for information security (IS) research. Therefore, many IS researches relies on the public data set. However public data set have many limitations. The aim of this paper is to analyze the accuracy and performance of the Machine Learning in intrusion detection system and to highlight some recommendation for future research. This study involves an academic papers systematic literature review on intrusion detection related to the application of machine learning methods using public data set. This paper elaborates the used of Machine Learning algorithms in intrusion detection system, highlighting the accuracy and the limitations of the methods for detecting attackers. The goal of this research is to provide an academic base for future research in the adoption of machine learning methods for IDS.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121932416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Performance Analysis of FIFO and Round Robin Scheduling Process Algorithm in IoT Operating System for Collecting Landslide Data 物联网操作系统中滑坡数据采集FIFO和轮循调度算法的性能分析
Hayatunnufus, M. Riasetiawan, A. Ashari
Scheduling is one of the most important factors used in scheduling processes insideCPU. CPU scheduling is a concept of multiprogramming, where the CPU is used to schedule the incoming processes alternately. Many algorithms can be used to schedule processesinside CPU, but not all can be real-time. Long waiting times and response times often be problems in scheduling processes in realtime. FIFO and Round Robin algorithms can be implemented to schedule the processes in realtime. In this paper, the authors schedule the process of several sensors that are used to collect landslide data. The data processing is sent by FIFO scheduling and Round Robin scheduling separately on the Internet of Things (IoT) device. The author only analyzes the performance of FIFO and Round Robin algorithms in scheduling the incoming processes in real-time on the IoT operating system by considering the waiting time and response time. The analysis is expected to create a quick response time and waiting time so that proper algorithm is decided to complement the IoT architecture in landslide detection. The FIFO and Round Robin algorithms are implementedat the Raspbian and Arch Linux operating systems in the IoT device, the Raspberry Pi 3 Model B, which uses a 64-bit 64-bit ARM Cortex-AS3 64-bit process at 1.2GHz.
调度是调度进程中使用的最重要的因素之一。CPU调度是多道编程的一个概念,其中CPU用于交替调度进入的进程。许多算法可用于调度CPU内的进程,但并非所有算法都是实时的。在实时调度过程中,较长的等待时间和响应时间往往是问题所在。FIFO和轮询算法可以实现实时调度进程。在本文中,作者安排了几种用于收集滑坡数据的传感器的过程。数据处理在物联网设备上分别采用FIFO调度和Round Robin调度发送。本文仅通过考虑等待时间和响应时间,分析了FIFO和Round Robin算法在IoT操作系统上实时调度入站进程的性能。该分析预计将创造快速的响应时间和等待时间,以便决定适当的算法,以补充滑坡检测中的物联网架构。FIFO和Round Robin算法是在物联网设备Raspberry Pi 3 Model B的Raspbian和Arch Linux操作系统上实现的,该设备使用的是1.2GHz的64位ARM Cortex-AS3 64位进程。
{"title":"Performance Analysis of FIFO and Round Robin Scheduling Process Algorithm in IoT Operating System for Collecting Landslide Data","authors":"Hayatunnufus, M. Riasetiawan, A. Ashari","doi":"10.1109/DATABIA50434.2020.9190608","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190608","url":null,"abstract":"Scheduling is one of the most important factors used in scheduling processes insideCPU. CPU scheduling is a concept of multiprogramming, where the CPU is used to schedule the incoming processes alternately. Many algorithms can be used to schedule processesinside CPU, but not all can be real-time. Long waiting times and response times often be problems in scheduling processes in realtime. FIFO and Round Robin algorithms can be implemented to schedule the processes in realtime. In this paper, the authors schedule the process of several sensors that are used to collect landslide data. The data processing is sent by FIFO scheduling and Round Robin scheduling separately on the Internet of Things (IoT) device. The author only analyzes the performance of FIFO and Round Robin algorithms in scheduling the incoming processes in real-time on the IoT operating system by considering the waiting time and response time. The analysis is expected to create a quick response time and waiting time so that proper algorithm is decided to complement the IoT architecture in landslide detection. The FIFO and Round Robin algorithms are implementedat the Raspbian and Arch Linux operating systems in the IoT device, the Raspberry Pi 3 Model B, which uses a 64-bit 64-bit ARM Cortex-AS3 64-bit process at 1.2GHz.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126642261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1