2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)最新文献

英文中文

Forecasting Acceleration of Data Transfer with Fog Computing for Resource Efficiency in Data Centers 基于雾计算的数据中心资源效率数据传输加速预测

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190326

N. Zendrato, M. Zarlis, O. S. Sitompul, E. M. Zamzami

Accelerate of data transfer always be a problem in fog computing especially workload datacenter This research predicts server performance data on fog computing using linear regression methods. Predictions are made on variables that affect the speed of data transfer namely the number of CPU cores, CPU capacity, memory used based on this variable is used as an attribute and data transfer as a label. With this research the performance of data transfer speeds can be predicted before use. This method provides an improvement in the error value compared of other forecasting methods Thus the process of data transfer in fog computing can be more effective and efficient

数据传输速度的加快一直是雾计算特别是工作负载数据中心中存在的问题，本研究采用线性回归方法预测雾计算中服务器性能数据。对影响数据传输速度的变量进行预测，即CPU内核的数量，CPU容量，基于此变量使用的内存作为属性和数据传输作为标签。通过本研究，可以在使用前预测数据传输速度的性能。与其他预测方法相比，该方法的误差值有所改善，从而使雾计算中的数据传输过程更加有效和高效

引用次数: 0

Comparative Analysis of the Kruskal and Boruvka Algorithms in Solving Minimum Spanning Tree on Complete Graph 求解完全图上最小生成树的Kruskal和Boruvka算法的比较分析

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190504

D. Rachmawati, Herriyance, Frederik Yan Putra Pakpahan

The problem that is often encountered in daily life is connecting all points in one work domain with a low optimization value, for example, the most economical cost required to connect a water pipe to each house in an area. To solve this problem, a system that can find a path that connects all points in one work domain with the lowest optimization is needed. In this study, the system was built using two algorithms, namely, Kruskal and Boruvka algorithms, and a complete graph is used as a modeling of the problem. Using these two algorithms, the system will find the optimum path that connects all points in the complete graph; then, the system also displays a comparison between the two algorithms in finding the optimum route. The data used is dynamic, meaning the users can enter and change the value of the side of the complete graph as needed. From the tests that have been done, it is found that the Kruskal algorithm is more effective than the Boruvka to find the minimum spanning tree in a complete graph with some nodes, and sides are 15 points and 105 sides.

在日常生活中经常遇到的问题是连接一个工作域中所有点的优化值较低，例如连接一个区域内每个房屋的水管所需的最经济成本。为了解决这一问题，需要一个能够以最低优化度找到连接一个工作域中所有点的路径的系统。本研究采用Kruskal算法和Boruvka算法两种算法构建系统，并采用完全图对问题进行建模。使用这两种算法，系统将找到连接完整图中所有点的最优路径;然后，对两种算法在寻找最优路径方面进行了比较。所使用的数据是动态的，这意味着用户可以根据需要输入和更改完整图形的边值。从已经完成的测试中发现，对于有一些节点，边数为15点和105条边的完全图，Kruskal算法比Boruvka算法更有效地找到最小生成树。

{"title":"Comparative Analysis of the Kruskal and Boruvka Algorithms in Solving Minimum Spanning Tree on Complete Graph","authors":"D. Rachmawati, Herriyance, Frederik Yan Putra Pakpahan","doi":"10.1109/DATABIA50434.2020.9190504","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190504","url":null,"abstract":"The problem that is often encountered in daily life is connecting all points in one work domain with a low optimization value, for example, the most economical cost required to connect a water pipe to each house in an area. To solve this problem, a system that can find a path that connects all points in one work domain with the lowest optimization is needed. In this study, the system was built using two algorithms, namely, Kruskal and Boruvka algorithms, and a complete graph is used as a modeling of the problem. Using these two algorithms, the system will find the optimum path that connects all points in the complete graph; then, the system also displays a comparison between the two algorithms in finding the optimum route. The data used is dynamic, meaning the users can enter and change the value of the side of the complete graph as needed. From the tests that have been done, it is found that the Kruskal algorithm is more effective than the Boruvka to find the minimum spanning tree in a complete graph with some nodes, and sides are 15 points and 105 sides.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127906973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Emotion Analysis and Classification of Movie Reviews Using Data Mining 基于数据挖掘的电影评论情感分析与分类

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190363

Kamoltep Moolthaisong, Wararat Songpan

This paper proposes a model for classification of movie reviews by using Data Mining. The paper also proposes the method of creating word cloud from word frequency in movie reviews, for the purpose of partially helping in analyzing for interested topic and opinion of reviewer. The research uses movie review data from Metacritic website. The review data consists of reviews from 21 movies, separated into two parts to be used as training set and test set. Training set have 462 reviews and test set have 238 reviews. The data preparation process started collecting review data by removing special symbols case and preprocessing into Weka program. Change the review text into structured data by using StringToWordVector filter. This process includes removing stop words with Rainbow stop words list, change word that have the same root origin into word stem by using Snowball Stemmer algorithm and then given weight value by using TF-IDF technique. After that, Naïve bayes, Random Forest and J48 algorithms were used to classify the review data into positive and negative groups. The experimental result given is 80.25%, 79.83% and 68.06%, respectively.

本文提出了一种基于数据挖掘的影评分类模型。本文还提出了利用影评中的词频创建词云的方法，以部分帮助影评者分析感兴趣的话题和意见。这项研究使用了Metacritic网站上的电影评论数据。评论数据由21部电影的评论组成，分为两部分作为训练集和测试集。训练集有462次审查，测试集有238次审查。数据准备过程通过去除特殊符号case并预处理到Weka程序中开始收集评审数据。通过使用StringToWordVector过滤器将审查文本更改为结构化数据。该过程包括使用Rainbow停止词列表删除停止词，使用Snowball Stemmer算法将具有相同词根的单词更改为词干，然后使用TF-IDF技术赋予权重值。之后使用Naïve bayes、Random Forest和J48算法将评论数据分为正面和负面两组。实验结果分别为80.25%、79.83%和68.06%。

{"title":"Emotion Analysis and Classification of Movie Reviews Using Data Mining","authors":"Kamoltep Moolthaisong, Wararat Songpan","doi":"10.1109/DATABIA50434.2020.9190363","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190363","url":null,"abstract":"This paper proposes a model for classification of movie reviews by using Data Mining. The paper also proposes the method of creating word cloud from word frequency in movie reviews, for the purpose of partially helping in analyzing for interested topic and opinion of reviewer. The research uses movie review data from Metacritic website. The review data consists of reviews from 21 movies, separated into two parts to be used as training set and test set. Training set have 462 reviews and test set have 238 reviews. The data preparation process started collecting review data by removing special symbols case and preprocessing into Weka program. Change the review text into structured data by using StringToWordVector filter. This process includes removing stop words with Rainbow stop words list, change word that have the same root origin into word stem by using Snowball Stemmer algorithm and then given weight value by using TF-IDF technique. After that, Naïve bayes, Random Forest and J48 algorithms were used to classify the review data into positive and negative groups. The experimental result given is 80.25%, 79.83% and 68.06%, respectively.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"32 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116415323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification 基于fastText的高效文本分类方法对印尼语文档进行分类

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190447

A. Amalia, O. S. Sitompul, E. Nababan, T. Mantoro

Text classification using a simple word representation with a linear classifier often considered as strong baselines to gain the best performances. However, a simple word representation like Bag of Word (BOW) has a deficiency of curse dimensionality, so it is only suitable for small datasets. BOW also needs some dependent pre-processing steps like stopwords-removal and stemming. Therefore, the BOW model cannot be implemented automatically because of the dependency in a specific language. On the other hand, deep neural network classifiers can eliminate the pre-processing prerequisite, but this model not efficient in time processing and need a large dataset for the learning process. It becomes a challenge for language that has limitation resources like Bahasa Indonesia. Another novel approach of text classifier is using the fastText model for text classification. This model can minimize pre-processing dependencies and more efficient in training time processing. However, there hasn't been much observation whether the fastText model outperformed the BOW model for small datasets. This paper aims to compare text classification using the TFIDF model as one of the BOW models with a fastText model for 500 news articles in Bahasa Indonesia. The result of this study showed both models gain an outstanding performance, which is 0.97 F-Score. The TFIDF model needs longer pre-processing stages and requiring more training time. Meanwhile, the fastText model only needs to tune some hyperparameters and get similar performance results to the TFIDF model. Based on this study, we can conclude that the fastText model is efficient text classification.

使用简单的词表示和线性分类器进行文本分类通常被认为是获得最佳性能的强基线。然而，像Bag of word (BOW)这样的简单的词表示存在诅咒维数不足的问题，因此它只适用于小数据集。BOW还需要一些相关的预处理步骤，如停词删除和词干提取。因此，由于特定语言的依赖性，BOW模型不能自动实现。另一方面，深度神经网络分类器可以消除预处理的前提条件，但该模型在时间处理上效率不高，并且需要大量的数据集进行学习。对于像印尼语这样资源有限的语言来说，这是一个挑战。文本分类器的另一种新方法是使用fastText模型进行文本分类。该模型可以减少预处理依赖，提高训练时间处理效率。然而，对于小数据集，fastText模型是否优于BOW模型还没有太多的观察。本文旨在比较使用TFIDF模型作为BOW模型之一的文本分类与使用fastText模型的500篇印尼语新闻文章。本研究的结果表明，两种模型都获得了出色的性能，F-Score为0.97。TFIDF模型需要较长的预处理阶段和较长的训练时间。同时，fastText模型只需要调优一些超参数，就可以获得与TFIDF模型相似的性能结果。基于本研究，我们可以得出fastText模型是一种高效的文本分类方法。

{"title":"An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification","authors":"A. Amalia, O. S. Sitompul, E. Nababan, T. Mantoro","doi":"10.1109/DATABIA50434.2020.9190447","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190447","url":null,"abstract":"Text classification using a simple word representation with a linear classifier often considered as strong baselines to gain the best performances. However, a simple word representation like Bag of Word (BOW) has a deficiency of curse dimensionality, so it is only suitable for small datasets. BOW also needs some dependent pre-processing steps like stopwords-removal and stemming. Therefore, the BOW model cannot be implemented automatically because of the dependency in a specific language. On the other hand, deep neural network classifiers can eliminate the pre-processing prerequisite, but this model not efficient in time processing and need a large dataset for the learning process. It becomes a challenge for language that has limitation resources like Bahasa Indonesia. Another novel approach of text classifier is using the fastText model for text classification. This model can minimize pre-processing dependencies and more efficient in training time processing. However, there hasn't been much observation whether the fastText model outperformed the BOW model for small datasets. This paper aims to compare text classification using the TFIDF model as one of the BOW models with a fastText model for 500 news articles in Bahasa Indonesia. The result of this study showed both models gain an outstanding performance, which is 0.97 F-Score. The TFIDF model needs longer pre-processing stages and requiring more training time. Meanwhile, the fastText model only needs to tune some hyperparameters and get similar performance results to the TFIDF model. Based on this study, we can conclude that the fastText model is efficient text classification.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123637166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Dijkstra's and A-Star in Finding the Shortest Path: a Tutorial Dijkstra's和a - star在寻找最短路径:教程

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190342

Ade Candra, M. A. Budiman, Kevin Hartanto

As one form of the greedy algorithm, Dijkstra's can handle the shortest path search with optimum result in longer search time. Dijkstra's is contrary to A-Star, a best-first search algorithm, which can handle the shortest path search with a faster time but not always optimum. By looking at the advantages and disadvantages of Dijkstra's and A-Star, this tutorial discusses the implementation of the two algorithms in finding the shortest path in routes selection between 24 SPBU (gas stations). The routes are located in Medan City and represented in a directed graph. Moreover, the authors compare Dijkstra's and A-star based on the complexity of Big-Theta (Θ) and running time. The results show that the shortest path search between SPBU can be solved with Dijkstra's and A-Star, where in some cases, the routes produced by the two algorithms are different so that the total distance generated is also different. In this case, the running time of A-Star is proven to be faster than Dijkstra's, and it is following A-Star principle which selects the location point based on the best heuristic value while Dijkstra's does not. For the complexity, Dijkstra's is $Theta(mathrm{n}^{2})$ and A-Star is $Theta(mathrm{m}ast mathrm{n})$, where $0leq mathrm{m}leq mathrm{n}$.

Dijkstra算法作为贪心算法的一种形式，可以在较长的搜索时间内处理最短路径搜索并获得最优结果。Dijkstra算法与a - star算法相反，a - star算法可以更快地处理最短路径搜索，但并不总是最优的。本教程通过分析Dijkstra算法和A-Star算法的优缺点，讨论了这两种算法在24个加油站(SPBU)之间的路线选择中寻找最短路径的实现。这些路线位于棉兰市，用有向图表示。此外，作者还根据Big-Theta的复杂性(Θ)和运行时间对Dijkstra和A-star进行了比较。结果表明，SPBU之间的最短路径搜索可以用Dijkstra算法和A-Star算法求解，但在某些情况下，两种算法产生的路径不同，从而产生的总距离也不同。在这种情况下，证明了A-Star算法的运行时间比Dijkstra算法快，并且遵循了基于最佳启发式值选择定位点的A-Star原则，而Dijkstra算法则没有。对于复杂性，Dijkstra的是$Theta(mathrm{n}^{2})$, A-Star的是$Theta(mathrm{m}ast mathrm{n})$，其中$0leq mathrm{m}leq mathrm{n}$。

{"title":"Dijkstra's and A-Star in Finding the Shortest Path: a Tutorial","authors":"Ade Candra, M. A. Budiman, Kevin Hartanto","doi":"10.1109/DATABIA50434.2020.9190342","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190342","url":null,"abstract":"As one form of the greedy algorithm, Dijkstra's can handle the shortest path search with optimum result in longer search time. Dijkstra's is contrary to A-Star, a best-first search algorithm, which can handle the shortest path search with a faster time but not always optimum. By looking at the advantages and disadvantages of Dijkstra's and A-Star, this tutorial discusses the implementation of the two algorithms in finding the shortest path in routes selection between 24 SPBU (gas stations). The routes are located in Medan City and represented in a directed graph. Moreover, the authors compare Dijkstra's and A-star based on the complexity of Big-Theta (Θ) and running time. The results show that the shortest path search between SPBU can be solved with Dijkstra's and A-Star, where in some cases, the routes produced by the two algorithms are different so that the total distance generated is also different. In this case, the running time of A-Star is proven to be faster than Dijkstra's, and it is following A-Star principle which selects the location point based on the best heuristic value while Dijkstra's does not. For the complexity, Dijkstra's is $Theta(mathrm{n}^{2})$ and A-Star is $Theta(mathrm{m}ast mathrm{n})$, where $0leq mathrm{m}leq mathrm{n}$.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117278955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

The Use of Meteorology Data in Short-Term Prediction of Wind Speed for Wind Turbine Using Elman Recurrent Neural Network Elman递归神经网络在风力机短期风速预报中的应用

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190628

R. Dinzi, Muhammad Yusuf, F. Fahmi

Wind energy is one of the promising renewable energy sources that are ideal for daily use, especially in the area with sufficient wind blows like Indonesia. Wind speed caused by wind energy is a driving force for wind turbines to produce electrical power. One problem in wind turbine management is to predict the speed of the wind in the short term for efficiency. In this research, forecasting of short-term wind speed was done in the city of Sibolga by uses an Elman recurrent neural network based on meteorological data: temperature, humidity, and air pressure to predict over the next ten days. Four prediction models were developed for this purpose based on training parameters and dataset used. The wind speed forecasting produces MAPE error values of 20.02% in the first model, 23.31% in the second model, 18.15% in the third model, and 12.51% in the fourth model. The fourth model was capable of predicting with the lowest error and, therefore, considered to be useful for wind turbine management.

风能是一种很有前途的可再生能源，非常适合日常使用，特别是在像印度尼西亚这样风力充足的地区。风能产生的风速是风力发电机发电的动力。风力发电机管理的一个问题是如何预测短期内的风速以提高效率。在这项研究中，利用Elman递归神经网络对Sibolga市的短期风速进行了预测，该网络基于气象数据:温度、湿度和气压来预测未来十天的风速。基于训练参数和使用的数据集，为此开发了四个预测模型。风速预报的MAPE误差值在第一个模型中为20.02%，在第二个模型中为23.31%，在第三个模型中为18.15%，在第四个模型中为12.51%。第四个模型能够以最小的误差进行预测，因此被认为对风力涡轮机管理有用。

引用次数: 1

Accuracy Analysis on Images Retrieval System using Radial Basis Function Algorithm and Coefficient Correlation 基于径向基函数算法和系数相关的图像检索系统精度分析

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190227

Khairul Abdi Sinuraya, S. Suwilo, M. S. Lydia

The image retrieval system is a system used for the process of retrieval of images based on information contained in the image files. Radial Basis Function (RBF) is one of the Neural Network methods used in the image retrieval system, is known for the capability to produce image information search properly. In determining the initial centroid value, the RBF method uses K-Means Clustering. This algorithm has a weakness in determining the right initial centroid value to get proper classification results in image retrieval. In this paper, the Coefficient Correlation (CC) method is used in determining the initial centroid value of the input data following the similarity of the data. Data with the highest degree of similarity compared to other data used as the initial centroid value. Data used in this study are leaf image data of 500 images with 10 categories of leaf types, and each sample contained 50 images. Based on the testing results, an increase in image retrieval accuracy with an average of 90.92% using the RBF and CC methods compared the image retrieval results using the RBF and K-Means Clustering methods gained an average accuracy of 85.96%.

图像检索系统是一种基于图像文件中包含的信息进行图像检索的系统。径向基函数(RBF)是一种应用于图像检索系统的神经网络方法，以其产生图像信息的能力而闻名。在确定初始质心值时，RBF方法使用K-Means聚类。在图像检索中，该算法在确定合适的初始质心值以获得合适的分类结果方面存在不足。本文采用相关系数法(Coefficient Correlation, CC)根据数据的相似度确定输入数据的初始质心值。与其他数据相比，具有最高相似度的数据用作初始质心值。本研究使用的数据是500幅图像的叶片图像数据，分为10类叶片类型，每个样本包含50幅图像。测试结果表明，与RBF和K-Means聚类方法的图像检索结果相比，RBF和CC方法的图像检索准确率平均提高了90.92%，平均准确率为85.96%。

{"title":"Accuracy Analysis on Images Retrieval System using Radial Basis Function Algorithm and Coefficient Correlation","authors":"Khairul Abdi Sinuraya, S. Suwilo, M. S. Lydia","doi":"10.1109/DATABIA50434.2020.9190227","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190227","url":null,"abstract":"The image retrieval system is a system used for the process of retrieval of images based on information contained in the image files. Radial Basis Function (RBF) is one of the Neural Network methods used in the image retrieval system, is known for the capability to produce image information search properly. In determining the initial centroid value, the RBF method uses K-Means Clustering. This algorithm has a weakness in determining the right initial centroid value to get proper classification results in image retrieval. In this paper, the Coefficient Correlation (CC) method is used in determining the initial centroid value of the input data following the similarity of the data. Data with the highest degree of similarity compared to other data used as the initial centroid value. Data used in this study are leaf image data of 500 images with 10 categories of leaf types, and each sample contained 50 images. Based on the testing results, an increase in image retrieval accuracy with an average of 90.92% using the RBF and CC methods compared the image retrieval results using the RBF and K-Means Clustering methods gained an average accuracy of 85.96%.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Implementing Cosine Similarity Algorithm to Increase the Flexibility of Hematology Text Report Generation 实现余弦相似度算法，提高血液学文本报表生成的灵活性

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190549

Aulia Amirullah, I. Aulia, Dedy Arisandy

The previous hematology textual summary representation system, which applies template based method of Natural Language Generation to produce hematology laboratory test results in natural language representation, was at the cutting edge to generate more detailed hematology reports. The produced reports manage to provide texts which break down the critical components and abnormal components of blood found in conventional hematology test results. The produced reports in natural language representation aimed to help patients to easily define, spot and point out which blood components are acting up. Templates provide slots to generate every single sentence to be replaced by the data that we provide. However, the previous system is only able to produce fixed unflexible slots of blood components which are defined by the system, named T-Gen System. It nearly got off the ground as it is very unflexible because the produced templates cannot hold all of both critical and abnormal components found in a produced laboratory examination result. Therefore, this research project implements cosine similarity algorithm to expand template flexibility. Testing and evaluation were carried out manually by examining given components into the system which will be added consecutively. The testing shows that every blood component which was added consecutively succesfully appeared in the produced texts.

先前的血液学文本摘要表示系统采用基于模板的自然语言生成方法，以自然语言表示血液学实验室检测结果，在生成更详细的血液学报告方面处于领先地位。产生的报告设法提供文本，打破了血液的关键成分和异常成分发现在常规血液学测试结果。生成的报告以自然语言表示，旨在帮助患者轻松定义、发现并指出哪些血液成分出现了问题。模板提供插槽来生成每个句子，这些句子将被我们提供的数据所取代。然而，以前的系统只能产生由系统定义的固定的不灵活的血液成分槽，称为T-Gen系统。由于生产的模板不能容纳在生产的实验室检查结果中发现的所有关键和异常组件，因此它非常不灵活，几乎脱离了地面。因此，本研究项目采用余弦相似度算法来扩展模板的灵活性。测试和评估是通过检查系统中给定的组件来进行的，这些组件将连续添加。实验表明，连续添加的每一种血液成分都成功地出现在生成的文本中。

{"title":"Implementing Cosine Similarity Algorithm to Increase the Flexibility of Hematology Text Report Generation","authors":"Aulia Amirullah, I. Aulia, Dedy Arisandy","doi":"10.1109/DATABIA50434.2020.9190549","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190549","url":null,"abstract":"The previous hematology textual summary representation system, which applies template based method of Natural Language Generation to produce hematology laboratory test results in natural language representation, was at the cutting edge to generate more detailed hematology reports. The produced reports manage to provide texts which break down the critical components and abnormal components of blood found in conventional hematology test results. The produced reports in natural language representation aimed to help patients to easily define, spot and point out which blood components are acting up. Templates provide slots to generate every single sentence to be replaced by the data that we provide. However, the previous system is only able to produce fixed unflexible slots of blood components which are defined by the system, named T-Gen System. It nearly got off the ground as it is very unflexible because the produced templates cannot hold all of both critical and abnormal components found in a produced laboratory examination result. Therefore, this research project implements cosine similarity algorithm to expand template flexibility. Testing and evaluation were carried out manually by examining given components into the system which will be added consecutively. The testing shows that every blood component which was added consecutively succesfully appeared in the produced texts.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115190277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Survey on The Accuracy of Machine Learning Techniques for Intrusion and Anomaly Detection on Public Data Sets 机器学习技术在公共数据集入侵和异常检测中的准确性研究

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/DATABIA50434.2020.9190436

R. T. Adek, M. Ula

Machine learning (ML) is growing popularity due to their ability to solve the problem in many areas. In digital world including information security, some intrusion detection systems (IDS) are being upgraded with Machine Learning elements for improving the performance of the system. It is known that is very limited real data set available for information security (IS) research. Therefore, many IS researches relies on the public data set. However public data set have many limitations. The aim of this paper is to analyze the accuracy and performance of the Machine Learning in intrusion detection system and to highlight some recommendation for future research. This study involves an academic papers systematic literature review on intrusion detection related to the application of machine learning methods using public data set. This paper elaborates the used of Machine Learning algorithms in intrusion detection system, highlighting the accuracy and the limitations of the methods for detecting attackers. The goal of this research is to provide an academic base for future research in the adoption of machine learning methods for IDS.

机器学习(ML)越来越受欢迎，因为它们能够解决许多领域的问题。在包括信息安全在内的数字世界中，一些入侵检测系统(IDS)正在升级机器学习元素，以提高系统的性能。众所周知，可用于信息安全研究的真实数据集非常有限。因此，许多IS研究依赖于公共数据集。然而，公共数据集有许多局限性。本文的目的是分析机器学习在入侵检测系统中的准确性和性能，并对未来的研究提出一些建议。本研究涉及一篇学术论文，系统地综述了使用公共数据集的机器学习方法应用的入侵检测相关文献。本文阐述了机器学习算法在入侵检测系统中的应用，强调了检测攻击者方法的准确性和局限性。本研究的目的是为未来在IDS中采用机器学习方法的研究提供一个学术基础。

{"title":"A Survey on The Accuracy of Machine Learning Techniques for Intrusion and Anomaly Detection on Public Data Sets","authors":"R. T. Adek, M. Ula","doi":"10.1109/DATABIA50434.2020.9190436","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190436","url":null,"abstract":"Machine learning (ML) is growing popularity due to their ability to solve the problem in many areas. In digital world including information security, some intrusion detection systems (IDS) are being upgraded with Machine Learning elements for improving the performance of the system. It is known that is very limited real data set available for information security (IS) research. Therefore, many IS researches relies on the public data set. However public data set have many limitations. The aim of this paper is to analyze the accuracy and performance of the Machine Learning in intrusion detection system and to highlight some recommendation for future research. This study involves an academic papers systematic literature review on intrusion detection related to the application of machine learning methods using public data set. This paper elaborates the used of Machine Learning algorithms in intrusion detection system, highlighting the accuracy and the limitations of the methods for detecting attackers. The goal of this research is to provide an academic base for future research in the adoption of machine learning methods for IDS.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121932416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Welcome Message from the Chair 主席致欢迎辞

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

Pub Date : 2020-07-01 DOI: 10.1109/databia50434.2020.9190587

E. Yiridoe

The Korea Basic Science Institute (KBSI) is very happy and especially honored to be hosting the 22 International Workshop on ECR ion sources, ECRIS2016, which is the first workshop held in Korea. Due to the effort on the development of 28GHz superconducting ECRIS, we have been decided a host institution of the ECRIS2016, at the IAC of ECRIS2014. Following that the ignition of the first ECR plasma was generated in 2014; recently, we have successfully extracted the various ion beams from KBSI-ECRIS. For further performance improvement of our system, it is now on the overhaul after 2 years operation. For the optimization of the system, some modification of plasma chamber and so on are ongoing that will be provided better performance of the system.

韩国基础科学研究院(KBSI)非常荣幸地主办了第22届ECR离子源国际研讨会(ECRIS2016)，这是在韩国举办的第一次研讨会。由于对28GHz超导ECRIS的开发努力，我们已被确定为ECRIS2016的主办机构，在ECRIS2014的IAC上。2014年，首个ECR等离子体点火;最近，我们成功地从KBSI-ECRIS中提取了各种离子束。为了进一步提高系统的性能，我们的系统在运行了2年之后，现在正在进行大修。为了对系统进行优化，对等离子体腔等进行了改造，以提高系统的性能。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀