Malware is intentionally designed to damage computers, servers, clients or computer networks. Malware is a general term used to describe any program designed to harm a computer or server. The goal is to commit a crime, such as gaining unauthorized access to a particular system, so as to compromise user security. Most malware still uses the same code to produce another different form of malware variants. Therefore, the ability to classify similar malware variant characteristics into malware families is a good strategy to stop malware. The research is useful for classifying malware on malware samples presented as bytemap grayscale images. The malware classification research focused on 25 malware classes with a total of 9,029 images from the Malimg dataset. This research implements the VGG-16 and InceptionResNet-V2 architectures by running 2 different scenarios, scenario 1 uses the original dataset and the other scenario uses the undersampled dataset. After building the model, each scenario will get an evaluation form such as accuracy, precision, recall, and f1-score. The highest score was obtained in scenario 2 on the VGG-16 method with a score of 94.8% and the lowest in scenario 2 on the InceptionResNet-V2 method with a score of 85.1%.
{"title":"Malware Image Classification Using Deep Learning InceptionResNet-V2 and VGG-16 Method","authors":"Didih Rizki Chandranegara, Jafar Shodiq Djawas, Faiq Azmi Nurfaizi, Zamah Sari","doi":"10.15575/join.v8i1.1051","DOIUrl":"https://doi.org/10.15575/join.v8i1.1051","url":null,"abstract":"Malware is intentionally designed to damage computers, servers, clients or computer networks. Malware is a general term used to describe any program designed to harm a computer or server. The goal is to commit a crime, such as gaining unauthorized access to a particular system, so as to compromise user security. Most malware still uses the same code to produce another different form of malware variants. Therefore, the ability to classify similar malware variant characteristics into malware families is a good strategy to stop malware. The research is useful for classifying malware on malware samples presented as bytemap grayscale images. The malware classification research focused on 25 malware classes with a total of 9,029 images from the Malimg dataset. This research implements the VGG-16 and InceptionResNet-V2 architectures by running 2 different scenarios, scenario 1 uses the original dataset and the other scenario uses the undersampled dataset. After building the model, each scenario will get an evaluation form such as accuracy, precision, recall, and f1-score. The highest score was obtained in scenario 2 on the VGG-16 method with a score of 94.8% and the lowest in scenario 2 on the InceptionResNet-V2 method with a score of 85.1%.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72588516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bastian, Ardi Mardiana, Mega Berliani, Mochammad Bagasnanda Firmansyah
Virtual tour is one of the rapidly growing applications of multimedia technology which is used for various purposes, including the dissemination of information in an interesting way. The education sector is also not spared from using virtual tour media for promotional purposes, and campuses are no exception to this rule. Large virtual tour content causes high access speed, ultimately reducing the level of comfort experienced by users. This study aims to compress panoramic images displayed on a campus virtual tour using a lossless compression method and the Run Length Encoding (RLE) algorithm. First, panoramic images are combined into one, then individual images are compressed. When recreating a virtual campus tour, compressed images are used so that the amount of data transferred is smaller. The load access speed index increases from 7,233 seconds to 3,789 seconds when images are compressed from 64 bits to 8 bits, with a compression percentage of 27%. The findings from this research are that the RLE algorithm has not been able to compress large files effectively even though it is quite successful in increasing the load access of the virtual tour website.
{"title":"Run Length Encoding Compresion on Virtual Tour Campus to Enhance Load Access Performance","authors":"A. Bastian, Ardi Mardiana, Mega Berliani, Mochammad Bagasnanda Firmansyah","doi":"10.15575/join.v8i1.1000","DOIUrl":"https://doi.org/10.15575/join.v8i1.1000","url":null,"abstract":"Virtual tour is one of the rapidly growing applications of multimedia technology which is used for various purposes, including the dissemination of information in an interesting way. The education sector is also not spared from using virtual tour media for promotional purposes, and campuses are no exception to this rule. Large virtual tour content causes high access speed, ultimately reducing the level of comfort experienced by users. This study aims to compress panoramic images displayed on a campus virtual tour using a lossless compression method and the Run Length Encoding (RLE) algorithm. First, panoramic images are combined into one, then individual images are compressed. When recreating a virtual campus tour, compressed images are used so that the amount of data transferred is smaller. The load access speed index increases from 7,233 seconds to 3,789 seconds when images are compressed from 64 bits to 8 bits, with a compression percentage of 27%. The findings from this research are that the RLE algorithm has not been able to compress large files effectively even though it is quite successful in increasing the load access of the virtual tour website.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75614422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Crop production rate relies on rainfall over Rejang Lebong district. Data showed a discrepancy between increased crop production and rainfall in Rejang Lebong District. However, the spatiotemporal distribution of the crop variable's dependencies remains unclear. This study analyses the relationship between rainfall and crop production rate in the Rejang Lebong district based on the performance of the machine learning method. In addition, this research also performed regression analysis to carry out rainfall clusters and crop production. This order provides information in the form of cluster results to determine how much the rainfall variable influences the crop production rate in each cluster. Harnessing the Elbow, CLARANS, Simple Linear Regression, and Silhouette Coefficient methods, this study used 231 rainfall data sourced from the Bengkulu BMKG and 110 data for plant production obtained from BPS Bengkulu Province from 2000-2022. This research found that the optimal clusters were 3 clusters. C1 contains 106 data with the largest regression value for chili = 0.127, C2 contains 15 data with the largest regression value for mustard greens = 0.135, and C3 contains 110 data with the largest regression value for cabbage = 0.408, eggplant = 0.197, and carrots = 0.201. Furthermore, this research also found that the biggest correlation of crops with highly significant improvement would be cabbage commodity (Y=0.4114X+0.2013) and chili plantation with high RSME (0.9897).
{"title":"Regression Analysis for Crop Production Using CLARANS Algorithm","authors":"A. Vatresia, Ruvita Faurina, Yanti Simanjuntak","doi":"10.15575/join.v8i1.1031","DOIUrl":"https://doi.org/10.15575/join.v8i1.1031","url":null,"abstract":"Crop production rate relies on rainfall over Rejang Lebong district. Data showed a discrepancy between increased crop production and rainfall in Rejang Lebong District. However, the \u0000spatiotemporal distribution of the crop variable's dependencies remains unclear. This study analyses the relationship between rainfall and crop production rate in the Rejang Lebong district based on the performance of the machine learning method. In addition, this research also performed regression analysis to carry out rainfall clusters and crop production. This order provides information in the form of cluster results to determine how much the rainfall variable influences the crop production rate in each cluster. Harnessing the Elbow, CLARANS, Simple Linear Regression, and Silhouette Coefficient methods, this study used 231 rainfall data sourced from the Bengkulu BMKG and 110 data for plant production obtained from BPS Bengkulu Province from 2000-2022. This research found that the optimal clusters were 3 clusters. C1 contains 106 data with the largest regression value for chili = 0.127, C2 contains 15 data with the largest regression value for mustard greens = 0.135, and C3 contains 110 data with the largest regression value for cabbage = 0.408, eggplant = 0.197, and carrots = 0.201. Furthermore, this research also found that the biggest correlation of crops with highly significant improvement would be cabbage commodity (Y=0.4114X+0.2013) and chili plantation with high RSME (0.9897).","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86187419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vinna Rahmayanti Setyaning Nastiti, Zamah Sari, Bella Chintia Eka Merita
Choosing a specialization was not an easy task for some students, especially for those who lacked confidence in their skill and ability. Specialization in tertiary education became the benchmark and key to success for students’ future careers. This study was conducted to provide the learning outcomes record, which showed the specialization classification for the Informatics students by using the data from the students of 2013-2015 who had graduated. The total data was 319 students. The classification method used for this study was the Restricted Boltzmann Machine (RBM). However, the data showed imbalanced class distribution because the number of each field differed greatly. Therefore, SMOTE was added to classify the imbalanced class. The accuracy obtained from the combination of RBM and SMOTE was 70% with a 0.4 mean squared error.
{"title":"The Implementation of Restricted Boltzmann Machine in Choosing a Specialization for Informatics Students","authors":"Vinna Rahmayanti Setyaning Nastiti, Zamah Sari, Bella Chintia Eka Merita","doi":"10.15575/join.v8i1.917","DOIUrl":"https://doi.org/10.15575/join.v8i1.917","url":null,"abstract":"Choosing a specialization was not an easy task for some students, especially for those who lacked confidence in their skill and ability. Specialization in tertiary education became the benchmark and key to success for students’ future careers. This study was conducted to provide the learning outcomes record, which showed the specialization classification for the Informatics students by using the data from the students of 2013-2015 who had graduated. The total data was 319 students. The classification method used for this study was the Restricted Boltzmann Machine (RBM). However, the data showed imbalanced class distribution because the number of each field differed greatly. Therefore, SMOTE was added to classify the imbalanced class. The accuracy obtained from the combination of RBM and SMOTE was 70% with a 0.4 mean squared error.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75268030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Kurniawan, N. Kamil, A. Aditsania, E. B. Setiawan
Cancer is a disease induced by the abnormal growth of cells in body tissues. This disease is commonly treated by chemotherapy. However, at first, cancer cells can respond to the activity of chemotherapy over time, but over time, resistance to cancer cells appears. Therefore, it is required to develop new anti-cancer drugs. Indenopyrazole and its derivative have been investigated to be a potential drug to treat cancer. This study aims to predict indenopyrazole derivative compounds as anti-cancer drugs by using Ant Colony Optimization (ACO) and Artificial Neural Network (ANN) methods. We used 93 compounds of indenopyrazole derivative with a total of 1876 descriptors. Then, the descriptors were reduced by using the Pearson Correlation Coefficient (PCC) and followed by the ACO algorithm to get the most relevant features. We found that the best number of descriptors obtained from ACO is ten descriptors. The ANN prediction model was developed with three architectures, which are different in hidden layer number, i.e., 1, 2, and 3 hidden layers. Based on the results, we found that the model with three hidden layers gives the best performance, with the value of the R2 test, R2 train, and Q2 train being 0.8822, 0.8495, and 0.8472, respectively.
{"title":"Implementation of Ant Colony Optimization – Artificial Neural Network in Predicting the Activity of Indenopyrazole Derivative as Anti-Cancer Agent","authors":"I. Kurniawan, N. Kamil, A. Aditsania, E. B. Setiawan","doi":"10.15575/join.v8i1.1055","DOIUrl":"https://doi.org/10.15575/join.v8i1.1055","url":null,"abstract":"Cancer is a disease induced by the abnormal growth of cells in body tissues. This disease is commonly treated by chemotherapy. However, at first, cancer cells can respond to the activity of chemotherapy over time, but over time, resistance to cancer cells appears. Therefore, it is required to develop new anti-cancer drugs. Indenopyrazole and its derivative have been investigated to be a potential drug to treat cancer. This study aims to predict indenopyrazole derivative compounds as anti-cancer drugs by using Ant Colony Optimization (ACO) and Artificial Neural Network (ANN) methods. We used 93 compounds of indenopyrazole derivative with a total of 1876 descriptors. Then, the descriptors were reduced by using the Pearson Correlation Coefficient (PCC) and followed by the ACO algorithm to get the most relevant features. We found that the best number of descriptors obtained from ACO is ten descriptors. The ANN prediction model was developed with three architectures, which are different in hidden layer number, i.e., 1, 2, and 3 hidden layers. Based on the results, we found that the model with three hidden layers gives the best performance, with the value of the R2 test, R2 train, and Q2 train being 0.8822, 0.8495, and 0.8472, respectively.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74752625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a democratic country, the people hold an important role in determining power in Indonesia. The closest political agenda in Indonesia is the 2024 Election. A survey has been conducted by several private survey agencies regarding the 2024 political map which has revealed the top five names, namely Prabowo Subianto, Ganjar Pranowo, Anies Baswedan, Sandiaga Uno, and Ridwan Kamil. This study aims to describe the initial map of the 2024 Election through a sentiment analysis approach to Twitter data. This study uses tweet data that mentions five political figures during 2021. In general, the demographic condition of Twitter users that pros or cons to five political figures, among them: located on the Java, in the age group 19–29 years old, and male. The sentiment analysis method used is supervised learning with different methods for each figure. The difference in methods adjusts the best evaluation value given in each figure. The results showed that the highest positive sentimental tweets and the highest number of pro accounts was about Ganjar Pranowo. On the other hand, the highest negative sentiment and the highest number of contra accounts was about Prabowo Subianto. Many words that often appear on a figure's positive sentiment are expressions of hope, prayer, and support. On negative tweets, the word that comes up a lot relating to the work field or work region of the figures.Â
{"title":"Delineation of The Early 2024 Election Map: Sentiment Analysis Approach to Twitter Data","authors":"Nur Ulum Rahmanulloh, Ibnu Santoso","doi":"10.15575/join.v7i2.925","DOIUrl":"https://doi.org/10.15575/join.v7i2.925","url":null,"abstract":"As a democratic country, the people hold an important role in determining power in Indonesia. The closest political agenda in Indonesia is the 2024 Election. A survey has been conducted by several private survey agencies regarding the 2024 political map which has revealed the top five names, namely Prabowo Subianto, Ganjar Pranowo, Anies Baswedan, Sandiaga Uno, and Ridwan Kamil. This study aims to describe the initial map of the 2024 Election through a sentiment analysis approach to Twitter data. This study uses tweet data that mentions five political figures during 2021. In general, the demographic condition of Twitter users that pros or cons to five political figures, among them: located on the Java, in the age group 19–29 years old, and male. The sentiment analysis method used is supervised learning with different methods for each figure. The difference in methods adjusts the best evaluation value given in each figure. The results showed that the highest positive sentimental tweets and the highest number of pro accounts was about Ganjar Pranowo. On the other hand, the highest negative sentiment and the highest number of contra accounts was about Prabowo Subianto. Many words that often appear on a figure's positive sentiment are expressions of hope, prayer, and support. On negative tweets, the word that comes up a lot relating to the work field or work region of the figures. ","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73559416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heart disease is still the leading cause of death. In this study, we tried to test several factors that can identify patients with heart disease using 3 classification algorithms: Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The purpose of this study is to find out which algorithm can produce the highest accuracy in classifying, analyzing, and obtaining confusion matrix values along with the accuracy of predicting heart disease based on several factors or other comorbidities that the patient has, ranging from BMI to the patient's skin cancer status. From the results of trials conducted by the SVM algorithm, it has the highest accuracy value, which is 92% while the Naive Bayes algorithm is the lowest with an accuracy value of 88%.
{"title":"Comparative Analysis of Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) Algorithms for Classification of Heart Disease Patients","authors":"Aina Damayunita, R. Fuadi, C. Juliane","doi":"10.15575/join.v7i2.919","DOIUrl":"https://doi.org/10.15575/join.v7i2.919","url":null,"abstract":"Heart disease is still the leading cause of death. In this study, we tried to test several factors that can identify patients with heart disease using 3 classification algorithms: Naive Bayes, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). The purpose of this study is to find out which algorithm can produce the highest accuracy in classifying, analyzing, and obtaining confusion matrix values along with the accuracy of predicting heart disease based on several factors or other comorbidities that the patient has, ranging from BMI to the patient's skin cancer status. From the results of trials conducted by the SVM algorithm, it has the highest accuracy value, which is 92% while the Naive Bayes algorithm is the lowest with an accuracy value of 88%.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81968370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the uses of medical data from diabetes patients is to produce models that can be used by medical personnel to predict and identify diabetes in patients. Various techniques are used to be able to provide a diabetes model as early as possible based on the symptoms experienced by diabetic patients, including using machine learning. The machine learning technique used to predict diabetes in this study is extreme gradient boosting (XGBoost). XGBoost is an advanced implementation of gradient boosting along with multiple regularization factors to accurately predict target variables by combining simpler and weaker model set estimations. Errors made by the previous model are tried to be corrected by the next model by adding some weight to the model. The diabetes prediction model using XGBoost is shown in the form of a tree, with the accuracy of the model produced in this study of 98.71%
{"title":"Diabetes Risk Prediction Using Extreme Gradient Boosting (XGBoost)","authors":"Kartina Diah Kesuma Wardhani, Memen Akbar","doi":"10.15575/join.v7i2.970","DOIUrl":"https://doi.org/10.15575/join.v7i2.970","url":null,"abstract":"One of the uses of medical data from diabetes patients is to produce models that can be used by medical personnel to predict and identify diabetes in patients. Various techniques are used to be able to provide a diabetes model as early as possible based on the symptoms experienced by diabetic patients, including using machine learning. The machine learning technique used to predict diabetes in this study is extreme gradient boosting (XGBoost). XGBoost is an advanced implementation of gradient boosting along with multiple regularization factors to accurately predict target variables by combining simpler and weaker model set estimations. Errors made by the previous model are tried to be corrected by the next model by adding some weight to the model. The diabetes prediction model using XGBoost is shown in the form of a tree, with the accuracy of the model produced in this study of 98.71%","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80662654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iman Setiawan, J. Junaidi, Fadjryani Fadjryani, Fika Reski Amaliah
Technology in agriculture has been widely and massively applied. One of them is automation technology and the use of big data through the Internet of Things (IoT). The use of IoT allows a process to run automatically without human intervention. Extreme weather changes and narrow land use are one of the main problems in agriculture. The development of IoT devices has been widely developed regarding this subject. One of them is a soil moisture detection system. This study aims to build an IoT soil moisture detection system. The system will use a sensor as input which is then processed in a microcontroller device and the prediction results are sent to the IoT cloud platform. Prediction results are obtained using a time series model and then its performance is evaluated using RMSE. This model was chosen because the structure of the observed soil moisture data is based on time. The results of this study indicate that the soil moisture IoT system can work well. This is supported by the results of the prediction evaluation value of the RMSE = 1.175682x10-5 model which is very small.
{"title":"Internet of Things (IoT) for Soil Moisture Detection Using Time Series Model","authors":"Iman Setiawan, J. Junaidi, Fadjryani Fadjryani, Fika Reski Amaliah","doi":"10.15575/join.v7i2.951","DOIUrl":"https://doi.org/10.15575/join.v7i2.951","url":null,"abstract":"Technology in agriculture has been widely and massively applied. One of them is automation technology and the use of big data through the Internet of Things (IoT). The use of IoT allows a process to run automatically without human intervention. Extreme weather changes and narrow land use are one of the main problems in agriculture. The development of IoT devices has been widely developed regarding this subject. One of them is a soil moisture detection system. This study aims to build an IoT soil moisture detection system. The system will use a sensor as input which is then processed in a microcontroller device and the prediction results are sent to the IoT cloud platform. Prediction results are obtained using a time series model and then its performance is evaluated using RMSE. This model was chosen because the structure of the observed soil moisture data is based on time. The results of this study indicate that the soil moisture IoT system can work well. This is supported by the results of the prediction evaluation value of the RMSE = 1.175682x10-5 model which is very small.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90952487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Utomo, Tirta Yurista Kumkamdhani, Galih Setiarso
Corruption gives major problem to many countries. It gives negative impact to a nation economy. People also realized that corruption comes from two sides, demand from the authority and supply from corporate. On that regard, corporates may have their part in fight against corruption in the form of anti- corruption disclosure (ACD). This study proposes new method of ACD prediction in corporate using deep learning. The data in this study are taken from every companies listed in Indonesia Stock Exchange (IDX) from the year 2017 to 2019. The companies can be categorized in 9 categories and the data set has 8 features. The overall data has 1826 items in which 1032 items are ACD and the other 794 items are non-ACD. In this study, the deep neural network or deep learning is composed from input layer, output layer and 3 hidden layers. The deep neural network uses Adam optimizer with learning rate 0.0010, batch size 16 and epochs 500. The drop out is set to 0.05. The accuracy result from deep learning in predicting ACD is considered good with the average training accuracy is 74.76% and average testing accuracy is 76.37%. However, the loss result isn’t good with average training loss and testing loss are respectively 51.76% and 50.96%. Since the aim of the study to find the possibility of deep learning as alternative of logistic regression in ACD prediction, accuracy comparison from deep learning and logistic regression is held. Deep learning has average prediction accuracy of 76.37% is better than logistic regression with average accuracy of 67.15%. Deep learning also has higher minimum accuracy and maximum accuracy compared to logistic regression. This study concludes that deep learning may give alternatives in ACD prediction compared the more common method of logistic regression.
{"title":"Anti-Corruption Disclosure Prediction Using Deep Learning","authors":"V. Utomo, Tirta Yurista Kumkamdhani, Galih Setiarso","doi":"10.15575/join.v7i2.840","DOIUrl":"https://doi.org/10.15575/join.v7i2.840","url":null,"abstract":"Corruption gives major problem to many countries. It gives negative impact to a nation economy. People also realized that corruption comes from two sides, demand from the authority and supply from corporate. On that regard, corporates may have their part in fight against corruption in the form of anti- corruption disclosure (ACD). This study proposes new method of ACD prediction in corporate using deep learning. The data in this study are taken from every companies listed in Indonesia Stock Exchange (IDX) from the year 2017 to 2019. The companies can be categorized in 9 categories and the data set has 8 features. The overall data has 1826 items in which 1032 items are ACD and the other 794 items are non-ACD. In this study, the deep neural network or deep learning is composed from input layer, output layer and 3 hidden layers. The deep neural network uses Adam optimizer with learning rate 0.0010, batch size 16 and epochs 500. The drop out is set to 0.05. The accuracy result from deep learning in predicting ACD is considered good with the average training accuracy is 74.76% and average testing accuracy is 76.37%. However, the loss result isn’t good with average training loss and testing loss are respectively 51.76% and 50.96%. Since the aim of the study to find the possibility of deep learning as alternative of logistic regression in ACD prediction, accuracy comparison from deep learning and logistic regression is held. Deep learning has average prediction accuracy of 76.37% is better than logistic regression with average accuracy of 67.15%. Deep learning also has higher minimum accuracy and maximum accuracy compared to logistic regression. This study concludes that deep learning may give alternatives in ACD prediction compared the more common method of logistic regression.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89326710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}