首页 > 最新文献

JOIN Jurnal Online Informatika最新文献

英文 中文
Malware Image Classification Using Deep Learning InceptionResNet-V2 and VGG-16 Method 基于深度学习InceptionResNet-V2和VGG-16方法的恶意软件图像分类
Pub Date : 2023-06-28 DOI: 10.15575/join.v8i1.1051
Didih Rizki Chandranegara, Jafar Shodiq Djawas, Faiq Azmi Nurfaizi, Zamah Sari
Malware is intentionally designed to damage computers, servers, clients or computer networks. Malware is a general term used to describe any program designed to harm a computer or server. The goal is to commit a crime, such as gaining unauthorized access to a particular system, so as to compromise user security. Most malware still uses the same code to produce another different form of malware variants. Therefore, the ability to classify similar malware variant characteristics into malware families is a good strategy to stop malware. The research is useful for classifying malware on malware samples presented as bytemap grayscale images. The malware classification research focused on 25 malware classes with a total of 9,029 images from the Malimg dataset. This research implements the VGG-16 and InceptionResNet-V2 architectures by running 2 different scenarios, scenario 1 uses the original dataset and the other scenario uses the undersampled dataset. After building the model, each scenario will get an evaluation form such as accuracy, precision, recall, and f1-score. The highest score was obtained in scenario 2 on the VGG-16 method with a score of 94.8% and the lowest in scenario 2 on the InceptionResNet-V2 method with a score of 85.1%.
恶意软件是故意设计来破坏计算机、服务器、客户端或计算机网络的。恶意软件是一个通用术语,用于描述任何旨在损害计算机或服务器的程序。其目标是实施犯罪,例如获得对特定系统的未经授权的访问,从而危及用户安全。大多数恶意软件仍然使用相同的代码来生成另一种不同形式的恶意软件变体。因此,将相似的恶意软件变体特征分类到恶意软件家族中是一种很好的阻止恶意软件的策略。该研究有助于对以字节图灰度图像形式呈现的恶意软件样本进行分类。恶意软件分类研究集中在25个恶意软件类别上,总共有来自Malimg数据集的9029张图像。本研究通过运行2种不同的场景来实现VGG-16和InceptionResNet-V2架构,场景1使用原始数据集,另一个场景使用欠采样数据集。在构建模型之后,每个场景将获得一个评估表单,如准确性、精度、召回率和f1-score。场景2中VGG-16方法得分最高,为94.8%;场景2中InceptionResNet-V2方法得分最低,为85.1%。
{"title":"Malware Image Classification Using Deep Learning InceptionResNet-V2 and VGG-16 Method","authors":"Didih Rizki Chandranegara, Jafar Shodiq Djawas, Faiq Azmi Nurfaizi, Zamah Sari","doi":"10.15575/join.v8i1.1051","DOIUrl":"https://doi.org/10.15575/join.v8i1.1051","url":null,"abstract":"Malware is intentionally designed to damage computers, servers, clients or computer networks. Malware is a general term used to describe any program designed to harm a computer or server. The goal is to commit a crime, such as gaining unauthorized access to a particular system, so as to compromise user security. Most malware still uses the same code to produce another different form of malware variants. Therefore, the ability to classify similar malware variant characteristics into malware families is a good strategy to stop malware. The research is useful for classifying malware on malware samples presented as bytemap grayscale images. The malware classification research focused on 25 malware classes with a total of 9,029 images from the Malimg dataset. This research implements the VGG-16 and InceptionResNet-V2 architectures by running 2 different scenarios, scenario 1 uses the original dataset and the other scenario uses the undersampled dataset. After building the model, each scenario will get an evaluation form such as accuracy, precision, recall, and f1-score. The highest score was obtained in scenario 2 on the VGG-16 method with a score of 94.8% and the lowest in scenario 2 on the InceptionResNet-V2 method with a score of 85.1%.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72588516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Run Length Encoding Compresion on Virtual Tour Campus to Enhance Load Access Performance 虚拟漫游校园的运行长度编码压缩提高负载访问性能
Pub Date : 2023-06-28 DOI: 10.15575/join.v8i1.1000
A. Bastian, Ardi Mardiana, Mega Berliani, Mochammad Bagasnanda Firmansyah
Virtual tour is one of the rapidly growing applications of multimedia technology which is used for various purposes, including the dissemination of information in an interesting way. The education sector is also not spared from using virtual tour media for promotional purposes, and campuses are no exception to this rule. Large virtual tour content causes high access speed, ultimately reducing the level of comfort experienced by users. This study aims to compress panoramic images displayed on a campus virtual tour using a lossless compression method and the Run Length Encoding (RLE) algorithm. First, panoramic images are combined into one, then individual images are compressed. When recreating a virtual campus tour, compressed images are used so that the amount of data transferred is smaller. The load access speed index increases from 7,233 seconds to 3,789 seconds when images are compressed from 64 bits to 8 bits, with a compression percentage of 27%. The findings from this research are that the RLE algorithm has not been able to compress large files effectively even though it is quite successful in increasing the load access of the virtual tour website.
虚拟漫游是多媒体技术的一个快速发展的应用,它被用于各种目的,包括以一种有趣的方式传播信息。教育部门也不能避免使用虚拟旅游媒体进行宣传,校园也不例外。虚拟导览内容过多导致访问速度过快,最终降低了用户体验的舒适度。本研究的目的是利用无损压缩方法和运行长度编码(RLE)算法对校园虚拟游览中显示的全景图像进行压缩。首先,将全景图像合并为一张图像,然后对单个图像进行压缩。在重新创建虚拟校园游览时,使用压缩图像,以便传输的数据量更小。当图像从64位压缩到8位时,加载访问速度指数从7,233秒增加到3,789秒,压缩百分比为27%。本研究发现,尽管RLE算法在增加虚拟旅游网站的加载访问方面取得了相当大的成功,但它并不能有效地压缩大文件。
{"title":"Run Length Encoding Compresion on Virtual Tour Campus to Enhance Load Access Performance","authors":"A. Bastian, Ardi Mardiana, Mega Berliani, Mochammad Bagasnanda Firmansyah","doi":"10.15575/join.v8i1.1000","DOIUrl":"https://doi.org/10.15575/join.v8i1.1000","url":null,"abstract":"Virtual tour is one of the rapidly growing applications of multimedia technology which is used for various purposes, including the dissemination of information in an interesting way. The education sector is also not spared from using virtual tour media for promotional purposes, and campuses are no exception to this rule. Large virtual tour content causes high access speed, ultimately reducing the level of comfort experienced by users. This study aims to compress panoramic images displayed on a campus virtual tour using a lossless compression method and the Run Length Encoding (RLE) algorithm. First, panoramic images are combined into one, then individual images are compressed. When recreating a virtual campus tour, compressed images are used so that the amount of data transferred is smaller. The load access speed index increases from 7,233 seconds to 3,789 seconds when images are compressed from 64 bits to 8 bits, with a compression percentage of 27%. The findings from this research are that the RLE algorithm has not been able to compress large files effectively even though it is quite successful in increasing the load access of the virtual tour website.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75614422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression Analysis for Crop Production Using CLARANS Algorithm 基于CLARANS算法的作物生产回归分析
Pub Date : 2023-06-28 DOI: 10.15575/join.v8i1.1031
A. Vatresia, Ruvita Faurina, Yanti Simanjuntak
Crop production rate relies on rainfall over Rejang Lebong district. Data showed a discrepancy between increased crop production and rainfall in Rejang Lebong District. However, the spatiotemporal distribution of the crop variable's dependencies remains unclear. This study analyses the relationship between rainfall and crop production rate in the Rejang Lebong district based on the performance of the machine learning method. In addition, this research also performed regression analysis to carry out rainfall clusters and crop production. This order provides information in the form of cluster results to determine how much the rainfall variable influences the crop production rate  in each cluster. Harnessing the Elbow, CLARANS, Simple Linear Regression, and Silhouette Coefficient methods, this study used 231 rainfall data sourced from the Bengkulu BMKG and 110 data for plant production obtained from BPS Bengkulu Province from 2000-2022. This research found that the optimal clusters were 3 clusters. C1 contains 106 data with the largest regression value for chili = 0.127, C2 contains 15 data with the largest regression value for mustard greens = 0.135, and C3 contains 110 data with the largest regression value for cabbage = 0.408, eggplant = 0.197, and carrots = 0.201. Furthermore, this research also found that the biggest correlation of crops with highly significant improvement would be cabbage commodity (Y=0.4114X+0.2013) and chili plantation with high RSME (0.9897).
作物产量取决于雷江勒邦地区的降雨量。数据显示,Rejang Lebong地区的作物产量增加与降雨量之间存在差异。然而,作物变量依赖关系的时空分布仍不清楚。本研究基于机器学习方法的性能分析了reang Lebong地区降雨与作物产量之间的关系。此外,本研究还进行了回归分析,以开展降雨集群与作物产量的关系。该顺序以集群结果的形式提供信息,以确定降雨变量对每个集群中作物产量的影响程度。利用肘部、CLARANS、简单线性回归和廓形系数方法,本研究使用了来自Bengkulu BMKG的231个降雨数据和来自BPS Bengkulu省2000-2022年的110个植物生产数据。本研究发现,最优集群为3个集群。C1包含106个数据,辣椒的最大回归值为0.127,C2包含15个数据,芥菜的最大回归值为0.135,C3包含110个数据,卷心菜的最大回归值为0.408,茄子的最大回归值为0.197,胡萝卜的最大回归值为0.201。此外,本研究还发现,显著改良作物相关性最大的是白菜商品(Y=0.4114X+0.2013)和RSME较高的辣椒种植区(0.9897)。
{"title":"Regression Analysis for Crop Production Using CLARANS Algorithm","authors":"A. Vatresia, Ruvita Faurina, Yanti Simanjuntak","doi":"10.15575/join.v8i1.1031","DOIUrl":"https://doi.org/10.15575/join.v8i1.1031","url":null,"abstract":"Crop production rate relies on rainfall over Rejang Lebong district. Data showed a discrepancy between increased crop production and rainfall in Rejang Lebong District. However, the \u0000spatiotemporal distribution of the crop variable's dependencies remains unclear. This study analyses the relationship between rainfall and crop production rate in the Rejang Lebong district based on the performance of the machine learning method. In addition, this research also performed regression analysis to carry out rainfall clusters and crop production. This order provides information in the form of cluster results to determine how much the rainfall variable influences the crop production rate  in each cluster. Harnessing the Elbow, CLARANS, Simple Linear Regression, and Silhouette Coefficient methods, this study used 231 rainfall data sourced from the Bengkulu BMKG and 110 data for plant production obtained from BPS Bengkulu Province from 2000-2022. This research found that the optimal clusters were 3 clusters. C1 contains 106 data with the largest regression value for chili = 0.127, C2 contains 15 data with the largest regression value for mustard greens = 0.135, and C3 contains 110 data with the largest regression value for cabbage = 0.408, eggplant = 0.197, and carrots = 0.201. Furthermore, this research also found that the biggest correlation of crops with highly significant improvement would be cabbage commodity (Y=0.4114X+0.2013) and chili plantation with high RSME (0.9897).","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86187419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Implementation of Restricted Boltzmann Machine in Choosing a Specialization for Informatics Students 受限玻尔兹曼机在信息学专业选择中的实现
Pub Date : 2023-06-28 DOI: 10.15575/join.v8i1.917
Vinna Rahmayanti Setyaning Nastiti, Zamah Sari, Bella Chintia Eka Merita
Choosing a specialization was not an easy task for some students, especially for those who lacked confidence in their skill and ability. Specialization in tertiary education became the benchmark and key to success for students’ future careers. This study was conducted to provide the learning outcomes record, which showed the specialization classification for the Informatics students by using the data from the students of 2013-2015 who had graduated. The total data was 319 students. The classification method used for this study was the Restricted Boltzmann Machine (RBM). However, the data showed imbalanced class distribution because the number of each field differed greatly. Therefore, SMOTE was added to classify the imbalanced class. The accuracy obtained from the combination of RBM and SMOTE was 70% with a 0.4 mean squared error.
对一些学生来说,选择专业并不是一件容易的事,尤其是对那些对自己的技能和能力缺乏信心的学生。高等教育的专业化成为学生未来事业成功的基准和关键。本研究利用2013-2015届已毕业学生的数据,提供信息学专业学生专业分类的学习成果记录。总数据为319名学生。本研究使用的分类方法是受限玻尔兹曼机(RBM)。然而,由于每个领域的数量差异很大,数据显示班级分布不平衡。因此,加入SMOTE对不平衡类进行分类。RBM和SMOTE联合使用的准确率为70%,均方误差为0.4。
{"title":"The Implementation of Restricted Boltzmann Machine in Choosing a Specialization for Informatics Students","authors":"Vinna Rahmayanti Setyaning Nastiti, Zamah Sari, Bella Chintia Eka Merita","doi":"10.15575/join.v8i1.917","DOIUrl":"https://doi.org/10.15575/join.v8i1.917","url":null,"abstract":"Choosing a specialization was not an easy task for some students, especially for those who lacked confidence in their skill and ability. Specialization in tertiary education became the benchmark and key to success for students’ future careers. This study was conducted to provide the learning outcomes record, which showed the specialization classification for the Informatics students by using the data from the students of 2013-2015 who had graduated. The total data was 319 students. The classification method used for this study was the Restricted Boltzmann Machine (RBM). However, the data showed imbalanced class distribution because the number of each field differed greatly. Therefore, SMOTE was added to classify the imbalanced class. The accuracy obtained from the combination of RBM and SMOTE was 70% with a 0.4 mean squared error.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75268030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Ant Colony Optimization – Artificial Neural Network in Predicting the Activity of Indenopyrazole Derivative as Anti-Cancer Agent 蚁群优化-人工神经网络在茚吡唑衍生物抗癌活性预测中的应用
Pub Date : 2023-06-28 DOI: 10.15575/join.v8i1.1055
I. Kurniawan, N. Kamil, A. Aditsania, E. B. Setiawan
Cancer is a disease induced by the abnormal growth of cells in body tissues. This disease is commonly treated by chemotherapy. However, at first, cancer cells can respond to the activity of chemotherapy over time, but over time, resistance to cancer cells appears. Therefore, it is required to develop new anti-cancer drugs. Indenopyrazole and its derivative have been investigated to be a potential drug to treat cancer. This study aims to predict indenopyrazole derivative compounds as anti-cancer drugs by using Ant Colony Optimization (ACO) and Artificial Neural Network (ANN) methods. We used 93 compounds of indenopyrazole derivative with a total of 1876 descriptors. Then, the descriptors were reduced by using the Pearson Correlation Coefficient (PCC) and followed by the ACO algorithm to get the most relevant features. We found that the best number of descriptors obtained from ACO is ten descriptors. The ANN prediction model was developed with three architectures, which are different in hidden layer number, i.e., 1, 2, and 3 hidden layers. Based on the results, we found that the model with three hidden layers gives the best performance, with the value of the R2 test, R2 train, and Q2 train being 0.8822, 0.8495, and 0.8472, respectively.
癌症是由身体组织中细胞异常生长引起的疾病。这种疾病通常用化疗治疗。然而,一开始,随着时间的推移,癌细胞可以对化疗的活性做出反应,但随着时间的推移,对癌细胞的耐药性出现了。因此,需要开发新的抗癌药物。茚吡唑及其衍生物已被研究为一种潜在的治疗癌症的药物。本研究旨在利用蚁群优化(Ant Colony Optimization, ACO)和人工神经网络(Artificial Neural Network, ANN)方法预测独立吡唑衍生物作为抗癌药物的应用前景。我们使用了93个独立吡唑衍生物,共有1876个描述符。然后,使用Pearson相关系数(PCC)对描述符进行约简,然后使用蚁群算法获得最相关的特征。我们发现从蚁群算法中得到的最佳描述子数是10个。该人工神经网络预测模型采用三种结构,隐层数不同,分别为1层、2层和3层。基于结果,我们发现具有三个隐藏层的模型表现最好,R2检验、R2训练和Q2训练的值分别为0.8822、0.8495和0.8472。
{"title":"Implementation of Ant Colony Optimization – Artificial Neural Network in Predicting the Activity of Indenopyrazole Derivative as Anti-Cancer Agent","authors":"I. Kurniawan, N. Kamil, A. Aditsania, E. B. Setiawan","doi":"10.15575/join.v8i1.1055","DOIUrl":"https://doi.org/10.15575/join.v8i1.1055","url":null,"abstract":"Cancer is a disease induced by the abnormal growth of cells in body tissues. This disease is commonly treated by chemotherapy. However, at first, cancer cells can respond to the activity of chemotherapy over time, but over time, resistance to cancer cells appears. Therefore, it is required to develop new anti-cancer drugs. Indenopyrazole and its derivative have been investigated to be a potential drug to treat cancer. This study aims to predict indenopyrazole derivative compounds as anti-cancer drugs by using Ant Colony Optimization (ACO) and Artificial Neural Network (ANN) methods. We used 93 compounds of indenopyrazole derivative with a total of 1876 descriptors. Then, the descriptors were reduced by using the Pearson Correlation Coefficient (PCC) and followed by the ACO algorithm to get the most relevant features. We found that the best number of descriptors obtained from ACO is ten descriptors. The ANN prediction model was developed with three architectures, which are different in hidden layer number, i.e., 1, 2, and 3 hidden layers. Based on the results, we found that the model with three hidden layers gives the best performance, with the value of the R2 test, R2 train, and Q2 train being 0.8822, 0.8495, and 0.8472, respectively.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74752625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Delineation of The Early 2024 Election Map: Sentiment Analysis Approach to Twitter Data 2024年早期选举地图的描绘:Twitter数据的情感分析方法
Pub Date : 2022-12-29 DOI: 10.15575/join.v7i2.925
Nur Ulum Rahmanulloh, Ibnu Santoso
As a democratic country, the people hold an important role in determining power in Indonesia. The closest political agenda in Indonesia is the 2024 Election. A survey has been conducted by several private survey agencies regarding the 2024 political map which has revealed the top five names, namely Prabowo Subianto, Ganjar Pranowo, Anies Baswedan, Sandiaga Uno, and Ridwan Kamil. This study aims to describe the initial map of the 2024 Election through a sentiment analysis approach to Twitter data. This study uses tweet data that mentions five political figures during 2021. In general, the demographic condition of Twitter users that pros or cons to five political figures, among them: located on the Java, in the age group 19–29 years old, and male.  The sentiment analysis method used is supervised learning with different methods for each figure. The difference in methods adjusts the best evaluation value given in each figure. The results showed that the highest positive sentimental tweets and the highest number of pro accounts was about Ganjar Pranowo. On the other hand, the highest negative sentiment and the highest number of contra accounts was about Prabowo Subianto. Many words that often appear on a figure's positive sentiment are expressions of hope, prayer, and support. On negative tweets, the word that comes up a lot relating to the work field or work region of the figures. 
印度尼西亚是一个民主国家,人民在决定权力方面发挥着重要作用。印尼最近的政治议程是2024年的选举。几家民间调查机构对2024年政治版图进行了一项调查,结果显示了排名前五的名字,分别是Prabowo Subianto、Ganjar Pranowo、Anies Baswedan、Sandiaga Uno和Ridwan Kamil。本研究旨在通过对Twitter数据的情感分析方法来描述2024年大选的初始地图。这项研究使用了推特数据,其中提到了2021年的五位政治人物。总的来说,推特用户的人口统计状况表明,赞成或反对的政治人物有五种,其中:位于爪哇岛上,年龄在19 - 29岁之间,且为男性。Â使用的情感分析方法是监督学习,每个图形使用不同的方法。方法的差异调整了每个图中给出的最佳评价值。结果显示,积极情感推文和支持账户数量最多的是关于Ganjar Pranowo的。另一方面,负面情绪最高、反对言论最多的是普拉博沃·苏比安托。在人物的积极情绪中经常出现的许多词语都是希望、祈祷和支持的表达。在负面推特上,这个词经常出现在与figures.Â的工作领域或工作区域有关的地方
{"title":"Delineation of The Early 2024 Election Map: Sentiment Analysis Approach to Twitter Data","authors":"Nur Ulum Rahmanulloh, Ibnu Santoso","doi":"10.15575/join.v7i2.925","DOIUrl":"https://doi.org/10.15575/join.v7i2.925","url":null,"abstract":"As a democratic country, the people hold an important role in determining power in Indonesia. The closest political agenda in Indonesia is the 2024 Election. A survey has been conducted by several private survey agencies regarding the 2024 political map which has revealed the top five names, namely Prabowo Subianto, Ganjar Pranowo, Anies Baswedan, Sandiaga Uno, and Ridwan Kamil. This study aims to describe the initial map of the 2024 Election through a sentiment analysis approach to Twitter data. This study uses tweet data that mentions five political figures during 2021. In general, the demographic condition of Twitter users that pros or cons to five political figures, among them: located on the Java, in the age group 19–29 years old, and male.  The sentiment analysis method used is supervised learning with different methods for each figure. The difference in methods adjusts the best evaluation value given in each figure. The results showed that the highest positive sentimental tweets and the highest number of pro accounts was about Ganjar Pranowo. On the other hand, the highest negative sentiment and the highest number of contra accounts was about Prabowo Subianto. Many words that often appear on a figure's positive sentiment are expressions of hope, prayer, and support. On negative tweets, the word that comes up a lot relating to the work field or work region of the figures. ","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73559416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Analysis of Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) Algorithms for Classification of Heart Disease Patients 朴素贝叶斯、k近邻(KNN)和支持向量机(SVM)算法在心脏病患者分类中的比较分析
Pub Date : 2022-12-29 DOI: 10.15575/join.v7i2.919
Aina Damayunita, R. Fuadi, C. Juliane
Heart disease is still the leading cause of death. In this study, we tried to test several factors that can identify patients with heart disease using 3 classification algorithms: Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM).  The purpose of this study is to find out which algorithm can produce the highest accuracy in classifying, analyzing, and obtaining confusion matrix values along with the accuracy of predicting heart disease based on several factors or other comorbidities that the patient has, ranging from BMI to the patient's skin cancer status.  From the results of trials conducted by the SVM algorithm, it has the highest accuracy value, which is 92% while the Naive Bayes algorithm is the lowest with an accuracy value of 88%.
心脏病仍然是导致死亡的主要原因。在本研究中,我们尝试使用朴素贝叶斯、k近邻(KNN)和支持向量机(SVM)三种分类算法来测试可以识别心脏病患者的几个因素。Â本研究的目的是找出哪种算法在分类、分析和获得混淆矩阵值方面的准确率最高,以及基于患者具有的多种因素或其他合并症(从BMI到患者的皮肤癌状态)预测心脏病的准确率最高。Â从SVM算法的试验结果来看,SVM算法的准确率最高,达到92%,而朴素贝叶斯算法的准确率最低,只有88%。
{"title":"Comparative Analysis of Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) Algorithms for Classification of Heart Disease Patients","authors":"Aina Damayunita, R. Fuadi, C. Juliane","doi":"10.15575/join.v7i2.919","DOIUrl":"https://doi.org/10.15575/join.v7i2.919","url":null,"abstract":"Heart disease is still the leading cause of death. In this study, we tried to test several factors that can identify patients with heart disease using 3 classification algorithms: Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM).  The purpose of this study is to find out which algorithm can produce the highest accuracy in classifying, analyzing, and obtaining confusion matrix values along with the accuracy of predicting heart disease based on several factors or other comorbidities that the patient has, ranging from BMI to the patient's skin cancer status.  From the results of trials conducted by the SVM algorithm, it has the highest accuracy value, which is 92% while the Naive Bayes algorithm is the lowest with an accuracy value of 88%.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81968370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Diabetes Risk Prediction Using Extreme Gradient Boosting (XGBoost) 利用极限梯度增强(XGBoost)预测糖尿病风险
Pub Date : 2022-12-29 DOI: 10.15575/join.v7i2.970
Kartina Diah Kesuma Wardhani, Memen Akbar
One of the uses of medical data from diabetes patients is to produce models that can be used by medical personnel to predict and identify diabetes in patients. Various techniques are used to be able to provide a diabetes model as early as possible based on the symptoms experienced by diabetic patients, including using machine learning. The machine learning technique used to predict diabetes in this study is extreme gradient boosting (XGBoost). XGBoost is an advanced implementation of gradient boosting along with multiple regularization factors to accurately predict target variables by combining simpler and weaker model set estimations. Errors made by the previous model are tried to be corrected by the next model by adding some weight to the model. The diabetes prediction model using XGBoost is shown in the form of a tree, with the accuracy of the model produced in this study of 98.71%
来自糖尿病患者的医疗数据的用途之一是产生可供医务人员用于预测和识别糖尿病患者的模型。为了能够根据糖尿病患者所经历的症状尽早提供糖尿病模型,使用了各种技术,包括使用机器学习。在这项研究中,用于预测糖尿病的机器学习技术是极端梯度增强(XGBoost)。XGBoost是一种梯度增强的高级实现,结合多个正则化因子,通过结合更简单和更弱的模型集估计来准确预测目标变量。前一个模型所犯的错误试图由下一个模型通过给模型增加一些权重来纠正。利用XGBoost建立的糖尿病预测模型呈树形,本研究建立的模型准确率为98.71%
{"title":"Diabetes Risk Prediction Using Extreme Gradient Boosting (XGBoost)","authors":"Kartina Diah Kesuma Wardhani, Memen Akbar","doi":"10.15575/join.v7i2.970","DOIUrl":"https://doi.org/10.15575/join.v7i2.970","url":null,"abstract":"One of the uses of medical data from diabetes patients is to produce models that can be used by medical personnel to predict and identify diabetes in patients. Various techniques are used to be able to provide a diabetes model as early as possible based on the symptoms experienced by diabetic patients, including using machine learning. The machine learning technique used to predict diabetes in this study is extreme gradient boosting (XGBoost). XGBoost is an advanced implementation of gradient boosting along with multiple regularization factors to accurately predict target variables by combining simpler and weaker model set estimations. Errors made by the previous model are tried to be corrected by the next model by adding some weight to the model. The diabetes prediction model using XGBoost is shown in the form of a tree, with the accuracy of the model produced in this study of 98.71%","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80662654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Internet of Things (IoT) for Soil Moisture Detection Using Time Series Model 基于时间序列模型的土壤湿度检测物联网
Pub Date : 2022-12-29 DOI: 10.15575/join.v7i2.951
Iman Setiawan, J. Junaidi, Fadjryani Fadjryani, Fika Reski Amaliah
Technology in agriculture has been widely and massively applied. One of them is automation technology and the use of big data through the Internet of Things (IoT). The use of IoT allows a process to run automatically without human intervention. Extreme weather changes and narrow land use are one of the main problems in agriculture. The development of IoT devices has been widely developed regarding this subject. One of them is a soil moisture detection system. This study aims to build an IoT soil moisture detection system. The system will use a sensor as input which is then processed in a microcontroller device and the prediction results are sent to the IoT cloud platform. Prediction results are obtained using a time series model and then its performance is evaluated using RMSE. This model was chosen because the structure of the observed soil moisture data is based on time. The results of this study indicate that the soil moisture IoT system can work well. This is supported by the results of the prediction evaluation value of the RMSE = 1.175682x10-5 model which is very small.
农业技术得到了广泛而大规模的应用。其中之一是自动化技术和通过物联网(IoT)使用大数据。物联网的使用允许流程在没有人为干预的情况下自动运行。极端天气变化和狭窄的土地利用是农业的主要问题之一。围绕这一主题,物联网设备的发展得到了广泛的发展。其中之一是土壤湿度检测系统。本研究旨在构建物联网土壤湿度检测系统。该系统将使用传感器作为输入,然后在微控制器设备中进行处理,并将预测结果发送到物联网云平台。首先利用时间序列模型获得预测结果,然后利用RMSE对其性能进行评价。由于土壤水分观测数据的结构是基于时间的,所以选择了该模型。研究结果表明,土壤水分物联网系统可以很好地发挥作用。RMSE = 1.175682 × 10-5模型的预测评价值非常小,结果也支持了这一点。
{"title":"Internet of Things (IoT) for Soil Moisture Detection Using Time Series Model","authors":"Iman Setiawan, J. Junaidi, Fadjryani Fadjryani, Fika Reski Amaliah","doi":"10.15575/join.v7i2.951","DOIUrl":"https://doi.org/10.15575/join.v7i2.951","url":null,"abstract":"Technology in agriculture has been widely and massively applied. One of them is automation technology and the use of big data through the Internet of Things (IoT). The use of IoT allows a process to run automatically without human intervention. Extreme weather changes and narrow land use are one of the main problems in agriculture. The development of IoT devices has been widely developed regarding this subject. One of them is a soil moisture detection system. This study aims to build an IoT soil moisture detection system. The system will use a sensor as input which is then processed in a microcontroller device and the prediction results are sent to the IoT cloud platform. Prediction results are obtained using a time series model and then its performance is evaluated using RMSE. This model was chosen because the structure of the observed soil moisture data is based on time. The results of this study indicate that the soil moisture IoT system can work well. This is supported by the results of the prediction evaluation value of the RMSE = 1.175682x10-5 model which is very small.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90952487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anti-Corruption Disclosure Prediction Using Deep Learning 基于深度学习的反腐败信息披露预测
Pub Date : 2022-12-29 DOI: 10.15575/join.v7i2.840
V. Utomo, Tirta Yurista Kumkamdhani, Galih Setiarso
Corruption gives major problem to many countries. It gives negative impact to a nation economy. People also realized that corruption comes from two sides, demand from the authority and supply from corporate. On that regard, corporates may have their part in fight against corruption in the form of anti- corruption disclosure (ACD). This study proposes new method of ACD prediction in corporate using deep learning. The data in this study are taken from every companies listed in Indonesia Stock Exchange (IDX) from the year 2017 to 2019. The companies can be categorized in 9 categories and the data set has 8 features. The overall data has 1826 items in which 1032 items are ACD and the other 794 items are non-ACD. In this study, the deep neural network or deep learning is composed from input layer, output layer and 3 hidden layers. The deep neural network uses Adam optimizer with learning rate 0.0010, batch size 16 and epochs 500. The drop out is set to 0.05. The accuracy result from deep learning in predicting ACD is considered good with the average training accuracy is 74.76% and average testing accuracy is 76.37%. However, the loss result isn’t good with average training loss and testing loss are respectively 51.76% and 50.96%. Since the aim of the study to find the possibility of deep learning as alternative of logistic regression in ACD prediction, accuracy comparison from deep learning and logistic regression is held. Deep learning has average prediction accuracy of 76.37% is better than logistic regression with average accuracy of 67.15%. Deep learning also has higher minimum accuracy and maximum accuracy compared to logistic regression. This study concludes that deep learning may give alternatives in ACD prediction compared the more common method of logistic regression.
腐败是许多国家面临的主要问题。它给一个国家的经济带来负面影响。人们也意识到腐败来自两个方面,即来自当局的需求和来自企业的供给。在这方面,企业可以透过反贪污披露(ACD)的形式,参与打击贪污。本研究提出了利用深度学习进行企业ACD预测的新方法。本研究中的数据来自2017年至2019年在印度尼西亚证券交易所(IDX)上市的所有公司。这些公司可以分为9类,数据集有8个特征。整体数据有1826项,其中1032项为ACD,其余794项为非ACD。在本研究中,深度神经网络或深度学习由输入层、输出层和3个隐藏层组成。深度神经网络使用Adam优化器,学习率为0.0010,批大小为16,epoch为500。drop out设置为0.05。深度学习预测ACD的准确率较好,平均训练准确率为74.76%,平均测试准确率为76.37%。然而,损失效果并不好,平均训练损失和测试损失分别为51.76%和50.96%。由于研究的目的是寻找深度学习替代逻辑回归在ACD预测中的可能性,因此进行了深度学习和逻辑回归的准确性比较。深度学习的平均预测准确率为76.37%,优于逻辑回归的平均预测准确率为67.15%。与逻辑回归相比,深度学习也具有更高的最小精度和最大精度。本研究的结论是,与更常见的逻辑回归方法相比,深度学习可以为ACD预测提供替代方法。
{"title":"Anti-Corruption Disclosure Prediction Using Deep Learning","authors":"V. Utomo, Tirta Yurista Kumkamdhani, Galih Setiarso","doi":"10.15575/join.v7i2.840","DOIUrl":"https://doi.org/10.15575/join.v7i2.840","url":null,"abstract":"Corruption gives major problem to many countries. It gives negative impact to a nation economy. People also realized that corruption comes from two sides, demand from the authority and supply from corporate. On that regard, corporates may have their part in fight against corruption in the form of anti- corruption disclosure (ACD). This study proposes new method of ACD prediction in corporate using deep learning. The data in this study are taken from every companies listed in Indonesia Stock Exchange (IDX) from the year 2017 to 2019. The companies can be categorized in 9 categories and the data set has 8 features. The overall data has 1826 items in which 1032 items are ACD and the other 794 items are non-ACD. In this study, the deep neural network or deep learning is composed from input layer, output layer and 3 hidden layers. The deep neural network uses Adam optimizer with learning rate 0.0010, batch size 16 and epochs 500. The drop out is set to 0.05. The accuracy result from deep learning in predicting ACD is considered good with the average training accuracy is 74.76% and average testing accuracy is 76.37%. However, the loss result isn’t good with average training loss and testing loss are respectively 51.76% and 50.96%. Since the aim of the study to find the possibility of deep learning as alternative of logistic regression in ACD prediction, accuracy comparison from deep learning and logistic regression is held. Deep learning has average prediction accuracy of 76.37% is better than logistic regression with average accuracy of 67.15%. Deep learning also has higher minimum accuracy and maximum accuracy compared to logistic regression. This study concludes that deep learning may give alternatives in ACD prediction compared the more common method of logistic regression.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89326710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JOIN Jurnal Online Informatika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1