首页 > 最新文献

Public Research Journal of Engineering, Data Technology and Computer Science最新文献

英文 中文
Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients 随机森林、支持向量机和神经网络在中风患者健康分类中的性能比较
Pub Date : 2024-04-21 DOI: 10.57152/predatecs.v2i1.1119
Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam
Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.
中风是全球第二大常见死因,每年约占健康相关死亡总数的 11%,病情从轻到重不等,可能造成永久性或暂时性损害,由非创伤性脑循环障碍引起。这项研究首先通过从 Kaggle 获取由 5110 条记录组成的中风患者健康数据集来了解数据。预处理阶段包括转换数据以优化处理、将数字属性转换为名义属性以及准备训练和测试数据。然后,重点转向使用随机森林、支持向量机和神经网络算法进行中风疾病分类。来自 Kaggle 数据集的数据处理结果显示了很高的性能,随机森林的准确率达到 98.58%,支持向量机达到 94.11%,神经网络达到 95.72%。虽然 SVM 的召回率最高(99.41%),但随机森林和 ANN 的召回率也很高,分别为 98.58% 和 95.72%,但略低于 SVM。模型的选择取决于应用的需要,既可以注重精确度,也可以注重召回率,或者两者兼顾。这项研究有助于进一步了解中风诊断,并为疾病分类引入了新的潜力。
{"title":"Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients","authors":"Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam","doi":"10.57152/predatecs.v2i1.1119","DOIUrl":"https://doi.org/10.57152/predatecs.v2i1.1119","url":null,"abstract":"Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140679209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree 利用 K-近邻、奈夫贝叶斯和判定树对糖尿病患者的饮食模式进行分类
Pub Date : 2024-04-21 DOI: 10.57152/predatecs.v2i1.1103
Ayuni Fachrunisa Lubis, Hilmi Zalnel Haq, Indah Lestari, Muhammad Iltizam, Nitasnim Samae, Muhammad Aufi Rofiqi, Sakhi Hasan Abdurrahman, Balqis Hamasatiy Tambusai, Puja Khalwa Salsilah
The study investigates three classification algorithms, namely K-Nearest Neighbor (K-NN), Naïve Bayes, and Decision Tree, for the classification of Diabetes Mellitus using a dataset from Kaggle. K-NN relies on distance calculations between test and training data, using the Euclidean distance formula. The choice of k, representing the nearest neighbor, significantly influences K-NN's effectiveness. Naïve Bayes, a probabilistic method, predicts class probabilities based on past events, and it employs the Gaussian distribution method for continuous data. Decision Trees, form prediction models with easily implementable rules. Data collection involves obtaining a Diabetes Mellitus dataset with eight attributes. Data preprocessing includes cleaning and normalization to minimize inconsistencies and incomplete data. The classification algorithms are applied using the Rapidminer tool, and the results are compared for accuracy. Naïve Bayes yields 77.34% accuracy, K-NN performance depends on the chosen k value, and Decision Trees generate rules for classification. The study provides insights into the strengths and weaknesses of each algorithm for diabetes classification
本研究使用来自 Kaggle 的数据集研究了三种分类算法,即 K-Nearest Neighbor (K-NN)、Naïve Bayes 和决策树,用于糖尿病分类。K-NN 依靠欧氏距离公式计算测试数据和训练数据之间的距离。k 代表最近的邻居,它的选择对 K-NN 的有效性有很大影响。奈夫贝叶斯是一种概率方法,根据过去的事件预测类别概率,它采用高斯分布法来处理连续数据。决策树通过易于实施的规则形成预测模型。数据收集包括获取具有八个属性的糖尿病数据集。数据预处理包括清理和归一化,以尽量减少不一致和不完整的数据。使用 Rapidminer 工具应用分类算法,并比较结果的准确性。Naïve Bayes 的准确率为 77.34%,K-NN 的性能取决于所选的 k 值,而决策树则生成分类规则。这项研究深入探讨了每种算法在糖尿病分类方面的优缺点。
{"title":"Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree","authors":"Ayuni Fachrunisa Lubis, Hilmi Zalnel Haq, Indah Lestari, Muhammad Iltizam, Nitasnim Samae, Muhammad Aufi Rofiqi, Sakhi Hasan Abdurrahman, Balqis Hamasatiy Tambusai, Puja Khalwa Salsilah","doi":"10.57152/predatecs.v2i1.1103","DOIUrl":"https://doi.org/10.57152/predatecs.v2i1.1103","url":null,"abstract":"The study investigates three classification algorithms, namely K-Nearest Neighbor (K-NN), Naïve Bayes, and Decision Tree, for the classification of Diabetes Mellitus using a dataset from Kaggle. K-NN relies on distance calculations between test and training data, using the Euclidean distance formula. The choice of k, representing the nearest neighbor, significantly influences K-NN's effectiveness. Naïve Bayes, a probabilistic method, predicts class probabilities based on past events, and it employs the Gaussian distribution method for continuous data. Decision Trees, form prediction models with easily implementable rules. Data collection involves obtaining a Diabetes Mellitus dataset with eight attributes. Data preprocessing includes cleaning and normalization to minimize inconsistencies and incomplete data. The classification algorithms are applied using the Rapidminer tool, and the results are compared for accuracy. Naïve Bayes yields 77.34% accuracy, K-NN performance depends on the chosen k value, and Decision Trees generate rules for classification. The study provides insights into the strengths and weaknesses of each algorithm for diabetes classification","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140678467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Recurrent Neural Network Bi-Long Short-Term Memory, Gated Recurrent Unit and Bi-Gated Recurrent Unit for Forecasting Rupiah Against Dollar (USD) Exchange Rate 循环神经网络双长短时记忆、门控循环单元和双门控循环单元在预测印尼盾兑美元汇率中的应用
Pub Date : 2024-04-21 DOI: 10.57152/predatecs.v2i1.1094
Muhammad Fauzi Fayyad, Viki Kurniawan, Muhammad Ridho Anugrah, Baıhaqı Hılmı Estanto, Tasnim Bilal
Foreign exchange rates have a crucial role in a country's economic development, influencing long-term investment decisions. This research aims to forecast the exchange rate of Rupiah to the United States Dollar (USD) by using deep learning models of Recurrent Neural Network (RNN) architecture, especially Bi-Long Short-Term Memory (Bi-LSTM), Gated Recurrent Unit (GRU), and Bi-Gated Recurrent Unit (Bi-GRU). Historical daily exchange rate data from January 1, 2013 to November 3, 2023, obtained from Yahoo Finance, was used as the dataset. The model training and evaluation process was performed based on various parameters such as optimizer, batch size, and time step. The best model was identified by minimizing the Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Among the models tested, the GRU model with Nadam optimizer, batch size 16, and timestep 30 showed the best performance, with MSE 3741.6999, RMSE 61.1694, MAE 45.6246, and MAPE 0.3054%. The forecast results indicate a strengthening trend of the Rupiah exchange rate against the USD in the next 30 days, which has the potential to be taken into consideration in making investment decisions and shows promising economic growth prospects for Indonesia.
外汇汇率对一个国家的经济发展起着至关重要的作用,影响着长期的投资决策。本研究旨在利用递归神经网络(RNN)架构的深度学习模型,尤其是双长短期记忆(Bi-LSTM)、门控递归单元(GRU)和双门控递归单元(Bi-GRU),预测印尼盾兑美元(USD)的汇率。数据集采用了从雅虎财经获取的 2013 年 1 月 1 日至 2023 年 11 月 3 日的每日汇率历史数据。模型的训练和评估过程是根据优化器、批量大小和时间步长等不同参数进行的。通过最小化均方误差 (MSE)、均方根误差 (RMSE)、平均绝对误差 (MAE) 和平均绝对百分比误差 (MAPE) 来确定最佳模型。在测试的模型中,采用 Nadam 优化器、批量大小为 16、时间步长为 30 的 GRU 模型表现最佳,MSE 为 3741.6999,RMSE 为 61.1694,MAE 为 45.6246,MAPE 为 0.3054%。预测结果表明,在未来 30 天内,印尼盾对美元的汇率将呈走强趋势,这有可能成为投资决策的考虑因素,并显示出印尼经济增长的美好前景。
{"title":"Application of Recurrent Neural Network Bi-Long Short-Term Memory, Gated Recurrent Unit and Bi-Gated Recurrent Unit for Forecasting Rupiah Against Dollar (USD) Exchange Rate","authors":"Muhammad Fauzi Fayyad, Viki Kurniawan, Muhammad Ridho Anugrah, Baıhaqı Hılmı Estanto, Tasnim Bilal","doi":"10.57152/predatecs.v2i1.1094","DOIUrl":"https://doi.org/10.57152/predatecs.v2i1.1094","url":null,"abstract":"Foreign exchange rates have a crucial role in a country's economic development, influencing long-term investment decisions. This research aims to forecast the exchange rate of Rupiah to the United States Dollar (USD) by using deep learning models of Recurrent Neural Network (RNN) architecture, especially Bi-Long Short-Term Memory (Bi-LSTM), Gated Recurrent Unit (GRU), and Bi-Gated Recurrent Unit (Bi-GRU). Historical daily exchange rate data from January 1, 2013 to November 3, 2023, obtained from Yahoo Finance, was used as the dataset. The model training and evaluation process was performed based on various parameters such as optimizer, batch size, and time step. The best model was identified by minimizing the Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Among the models tested, the GRU model with Nadam optimizer, batch size 16, and timestep 30 showed the best performance, with MSE 3741.6999, RMSE 61.1694, MAE 45.6246, and MAPE 0.3054%. The forecast results indicate a strengthening trend of the Rupiah exchange rate against the USD in the next 30 days, which has the potential to be taken into consideration in making investment decisions and shows promising economic growth prospects for Indonesia.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140678260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of the Effectiveness of Neural Network Models for Analyzing Customer Review Sentiments on Marketplace 评估神经网络模型分析市场上客户评论情绪的效果
Pub Date : 2024-04-21 DOI: 10.57152/predatecs.v2i1.1100
Kana Karunia, Aprilya Eka Putri, May Dila Fachriani, Muhammad Hilman Rois
According to the 2019 report, Tokopedia is the most visited marketplace with 140,000,000 visitors per month, making it one of the most popular marketplaces in Indonesia. Customers have the opportunity to write reviews about the products they purchase at the end of the transaction process on Tokopedia. The aim of this research is to conduct sentiment analysis on product reviews on Tokopedia. Three neural networks that will be used for text classification are Bi-GRU, GRU, and LSTM. The data processing technique is divided into training and testing samples, split into 80%:20% using the holdout technique. The BI-GRU algorithm has an accuracy of 0.93% and precision of 0.96, better than the other two methods LSTM and GRU, which each have an accuracy of 0.92 and recall of 0.91.
根据 2019 年的报告,Tokopedia 是访问量最大的市场,月访问量达 140,000,000 人次,是印度尼西亚最受欢迎的市场之一。客户有机会在 Tokopedia 上的交易过程结束时撰写对所购产品的评论。本研究旨在对 Tokopedia 上的产品评论进行情感分析。用于文本分类的三个神经网络是 Bi-GRU、GRU 和 LSTM。数据处理技术分为训练样本和测试样本,使用保持技术将训练样本和测试样本分成 80%:20%。BI-GRU 算法的准确率为 0.93%,精度为 0.96,优于其他两种方法 LSTM 和 GRU,前者的准确率为 0.92,召回率为 0.91。
{"title":"Evaluation of the Effectiveness of Neural Network Models for Analyzing Customer Review Sentiments on Marketplace","authors":"Kana Karunia, Aprilya Eka Putri, May Dila Fachriani, Muhammad Hilman Rois","doi":"10.57152/predatecs.v2i1.1100","DOIUrl":"https://doi.org/10.57152/predatecs.v2i1.1100","url":null,"abstract":"According to the 2019 report, Tokopedia is the most visited marketplace with 140,000,000 visitors per month, making it one of the most popular marketplaces in Indonesia. Customers have the opportunity to write reviews about the products they purchase at the end of the transaction process on Tokopedia. The aim of this research is to conduct sentiment analysis on product reviews on Tokopedia. Three neural networks that will be used for text classification are Bi-GRU, GRU, and LSTM. The data processing technique is divided into training and testing samples, split into 80%:20% using the holdout technique. The BI-GRU algorithm has an accuracy of 0.93% and precision of 0.96, better than the other two methods LSTM and GRU, which each have an accuracy of 0.92 and recall of 0.91.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140678470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of The Fuzzy Mamdani Method in Determining KIP-Kuliah Recipients for New Students 模糊马姆达尼法在确定新生 KIP-Kuliah 受益人中的应用
Pub Date : 2024-04-21 DOI: 10.57152/predatecs.v2i1.1087
Yoga Ardiansah, Nanda Try Luchia, Delvi Hastari, T. M. F. Rifat, Rendhy Rachfaizi, Nanda Aulia Putri, Ella Silvana Ginting
Lectures are the last level of education passed. However, the opportunity to obtain further education cannot be owned just like that by everyone because of the economic factors they experience. Therefore, an assessment method is needed to support the decision of KIP-Kuliah recipients at the lecture level for new students within the Faculty of Science and Technology, Sultan Syarif Kasim Riau State Islamic University. This research applies the Fuzzy Mamdani algorithm with Fuzzy Logic and is expected to be able to provide recommendations for worthy scholarship recipients so that the assistance provided is right on target. The results showed that 26,7% of students received the rejected status. Several experiments conducted, illustrate the performance of Fuzzy Logic in this research is very powerful in determining policies and as decision support. The implementation of the research results recommends the best selection from a series of decisions making.
授课是教育的最后一个阶段。然而,由于经济因素的影响,并不是每个人都能拥有继续深造的机会。因此,需要一种评估方法来支持廖内苏丹国立伊斯兰大学科技学院新生在授课阶段的 KIP-Kuliah 受益人决策。这项研究将模糊马姆达尼算法与模糊逻辑学结合起来,希望能够为有价值的奖学金获得者提供建议,从而使所提供的援助准确无误。结果显示,26.7% 的学生获得了被拒绝的资格。进行的几项实验表明,模糊逻辑在这项研究中的表现在确定政策和作为决策支持方面非常强大。研究成果的实施建议从一系列决策中做出最佳选择。
{"title":"Application of The Fuzzy Mamdani Method in Determining KIP-Kuliah Recipients for New Students","authors":"Yoga Ardiansah, Nanda Try Luchia, Delvi Hastari, T. M. F. Rifat, Rendhy Rachfaizi, Nanda Aulia Putri, Ella Silvana Ginting","doi":"10.57152/predatecs.v2i1.1087","DOIUrl":"https://doi.org/10.57152/predatecs.v2i1.1087","url":null,"abstract":"Lectures are the last level of education passed. However, the opportunity to obtain further education cannot be owned just like that by everyone because of the economic factors they experience. Therefore, an assessment method is needed to support the decision of KIP-Kuliah recipients at the lecture level for new students within the Faculty of Science and Technology, Sultan Syarif Kasim Riau State Islamic University. This research applies the Fuzzy Mamdani algorithm with Fuzzy Logic and is expected to be able to provide recommendations for worthy scholarship recipients so that the assistance provided is right on target. The results showed that 26,7% of students received the rejected status. Several experiments conducted, illustrate the performance of Fuzzy Logic in this research is very powerful in determining policies and as decision support. The implementation of the research results recommends the best selection from a series of decisions making.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140678341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of K-Nearest Neighbors, Naïve Bayes Classifier, Support Vector Machine and Decision Tree Algorithms for Obesity Risk Prediction 实施 K-近邻、奈夫贝叶斯分类器、支持向量机和决策树算法预测肥胖风险
Pub Date : 2024-04-21 DOI: 10.57152/predatecs.v2i1.1110
Amanda Iksanul Putri, Nur Alfa Husna, Neha Mella Cia, Muhammad Abdillah Arba, Nasywa Rihadatul Aisyi, Chintya Harum Pramesthi, Abidaharbya Salsa Irdayusman
An abnormal or excessive build-up of fat that can negatively impact one's health as a result of an imbalance in energy between calories consumed and burnt is known as obesity. The majority of ailments, such as diabetes, heart disease, cancer, osteoarthritis, chronic renal disease, stroke, hypertension, and other fatal conditions, are linked to obesity. Information technology has therefore been the subject of several studies aimed at diagnosing and treating obesity. Because there is a wealth of information on obesity, data mining techniques such as the K-Nearest Neighbors (K-NN) algorithm, Naïve Bayes Classifier, Support Vector Machine (SVM), and Decision Tree can be used to classify the data. The 2111 records and 17 characteristics of obesity data that were received from Kaggle will be used in this study. The four algorithms are to be compared in this study. In other words, using the dataset used in this study, the Decision Tree algorithm's accuracy outperforms that of the other three algorithms K-NN, Naïve Bayes, and SVM. Using the Decision Tree algorithm, the accuracy was 84.98%; the K-NN algorithm came in second with an accuracy value of 83.55%; the Naïve Bayes algorithm came in third with an accuracy rate of 77.48%; and the SVM algorithm came in last with the lowest accuracy value in this study, at 77.32%.
由于摄入和消耗的热量不平衡,导致脂肪异常或过度堆积,从而对人体健康产生负面影响,这就是肥胖症。大多数疾病,如糖尿病、心脏病、癌症、骨关节炎、慢性肾病、中风、高血压和其他致命疾病,都与肥胖有关。因此,信息技术已成为旨在诊断和治疗肥胖症的多项研究的主题。由于有关肥胖症的信息非常丰富,因此可以使用 K-近邻(K-NN)算法、奈夫贝叶斯分类器、支持向量机(SVM)和决策树等数据挖掘技术对数据进行分类。本研究将使用从 Kaggle 收到的 2111 条记录和 17 种肥胖特征数据。本研究将对这四种算法进行比较。换句话说,使用本研究中使用的数据集,决策树算法的准确性优于其他三种算法 K-NN、Naïve Bayes 和 SVM。使用决策树算法的准确率为 84.98%;K-NN 算法位居第二,准确率为 83.55%;Naïve Bayes 算法位居第三,准确率为 77.48%;SVM 算法位居最后,准确率为 77.32%,是本研究中准确率最低的算法。
{"title":"Implementation of K-Nearest Neighbors, Naïve Bayes Classifier, Support Vector Machine and Decision Tree Algorithms for Obesity Risk Prediction","authors":"Amanda Iksanul Putri, Nur Alfa Husna, Neha Mella Cia, Muhammad Abdillah Arba, Nasywa Rihadatul Aisyi, Chintya Harum Pramesthi, Abidaharbya Salsa Irdayusman","doi":"10.57152/predatecs.v2i1.1110","DOIUrl":"https://doi.org/10.57152/predatecs.v2i1.1110","url":null,"abstract":"An abnormal or excessive build-up of fat that can negatively impact one's health as a result of an imbalance in energy between calories consumed and burnt is known as obesity. The majority of ailments, such as diabetes, heart disease, cancer, osteoarthritis, chronic renal disease, stroke, hypertension, and other fatal conditions, are linked to obesity. Information technology has therefore been the subject of several studies aimed at diagnosing and treating obesity. Because there is a wealth of information on obesity, data mining techniques such as the K-Nearest Neighbors (K-NN) algorithm, Naïve Bayes Classifier, Support Vector Machine (SVM), and Decision Tree can be used to classify the data. The 2111 records and 17 characteristics of obesity data that were received from Kaggle will be used in this study. The four algorithms are to be compared in this study. In other words, using the dataset used in this study, the Decision Tree algorithm's accuracy outperforms that of the other three algorithms K-NN, Naïve Bayes, and SVM. Using the Decision Tree algorithm, the accuracy was 84.98%; the K-NN algorithm came in second with an accuracy value of 83.55%; the Naïve Bayes algorithm came in third with an accuracy rate of 77.48%; and the SVM algorithm came in last with the lowest accuracy value in this study, at 77.32%.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140679008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifications of Offline Shopping Trends and Patterns with Machine Learning Algorithms 利用机器学习算法对线下购物趋势和模式进行分类
Pub Date : 2024-04-21 DOI: 10.57152/predatecs.v2i1.1099
Muta'alimah Muta'alimah, Cindy Kirana Zarry, Atha Kurniawan, Hauriya Hasysya, Muhammad Farhan Firas, Nurin Nadhirah
Advancements in technology have made online shopping popular among many. However, the use of offline marketing models is still considered a profitable and important way of business development. This can be seen in the 2022 Association of Retail Entrepreneurs of Indonesia (APRINDO), which states that  60% of Indonesians shop offline, and in 2023, more than 75% of continental European consumers will prefer to shop offline. This is because many benefits can be achieved through offline marketing that cannot be obtained from online marketing. Therefore, classification of patterns and trends is performed to compare the results of the algorithms under study. Furthermore, this research was conducted to help offline retailers understand consumption patterns and trends that affect purchases. The algorithms analyzed in this study are K-Nearest Neighbor (K-NN), Naive Bayes, and Artificial Neural Network (ANN). As a result, the ANN algorithm obtained the highest confusion matrix results with an Accuracy value of 96.38%, Precision of 100.00%, and Recall of 100.00%. Meanwhile, when the Naive Bayes algorithm was used, the lowest Accuracy value was 57.39%, the Precision value was 57.86%, and when the K-NN algorithm was used, the Recall value was as low as 92.00%. These results indicate that the ANN algorithm is better at classifying offline shopping image data than the K-NN and Naive Bayes algorithms
技术的进步使网上购物在许多人中流行起来。然而,线下营销模式仍被认为是一种有利可图的重要商业发展方式。2022 年印尼零售企业家协会(APRINDO)的数据表明,60% 的印尼人在线下购物,而在 2023 年,超过 75% 的欧洲大陆消费者将更愿意在线下购物。这是因为通过线下营销可以获得许多线上营销无法获得的好处。因此,要对模式和趋势进行分类,以比较所研究算法的结果。此外,这项研究还有助于线下零售商了解影响购买的消费模式和趋势。本研究分析的算法包括 K-Nearest Neighbor (K-NN)、Naive Bayes 和人工神经网络 (ANN)。结果,人工神经网络算法获得了最高的混淆矩阵结果,准确率为 96.38%,精确率为 100.00%,召回率为 100.00%。与此同时,当使用 Naive Bayes 算法时,准确度值最低,为 57.39%,精确度值为 57.86%,而当使用 K-NN 算法时,召回值低至 92.00%。这些结果表明,与 K-NN 算法和 Naive Bayes 算法相比,ANN 算法能更好地对离线购物图像数据进行分类。
{"title":"Classifications of Offline Shopping Trends and Patterns with Machine Learning Algorithms","authors":"Muta'alimah Muta'alimah, Cindy Kirana Zarry, Atha Kurniawan, Hauriya Hasysya, Muhammad Farhan Firas, Nurin Nadhirah","doi":"10.57152/predatecs.v2i1.1099","DOIUrl":"https://doi.org/10.57152/predatecs.v2i1.1099","url":null,"abstract":"Advancements in technology have made online shopping popular among many. However, the use of offline marketing models is still considered a profitable and important way of business development. This can be seen in the 2022 Association of Retail Entrepreneurs of Indonesia (APRINDO), which states that  60% of Indonesians shop offline, and in 2023, more than 75% of continental European consumers will prefer to shop offline. This is because many benefits can be achieved through offline marketing that cannot be obtained from online marketing. Therefore, classification of patterns and trends is performed to compare the results of the algorithms under study. Furthermore, this research was conducted to help offline retailers understand consumption patterns and trends that affect purchases. The algorithms analyzed in this study are K-Nearest Neighbor (K-NN), Naive Bayes, and Artificial Neural Network (ANN). As a result, the ANN algorithm obtained the highest confusion matrix results with an Accuracy value of 96.38%, Precision of 100.00%, and Recall of 100.00%. Meanwhile, when the Naive Bayes algorithm was used, the lowest Accuracy value was 57.39%, the Precision value was 57.86%, and when the K-NN algorithm was used, the Recall value was as low as 92.00%. These results indicate that the ANN algorithm is better at classifying offline shopping image data than the K-NN and Naive Bayes algorithms","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140678420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Association Rules Algorithm to Identify Popular Topping Combinations in Orders 实施关联规则算法识别订单中的热门顶部组合
Pub Date : 2024-02-01 DOI: 10.57152/predatecs.v1i2.863
Rizki Aulia Putra, Margareta Amalia Miranti Putri, Sri Maharani Sinaga, Sania Fitri Octavia, Raihan Catur Rachman
Association rule is a data mining technique to find associative rules between a combination of items. This research aims to apply association rules algorithm in identifying popular topping combinations in food orders. This application aims to help restaurant owners or food businesses understand their customers' preferences and optimize their menu offerings. Data obtained from kaggle, the association rules algorithm is applied to this dataset to identify patterns or combinations of toppings that often appear together in orders. The results of this study show toppings with chocolate as a popular item in orders. These findings can provide valuable insights for food business owners in structuring their menus and determining attractive offers for customers. This study also applied a comparison between the apriori, fp- growth and eclat algorithms, with the result that the best item transaction rule was found: a combination of dill & unicorn toppings with chocolate with 60% confidence. Overall, the application of eclat algorithm in this study provides the best performance with higher execution speed, thus providing insight into customer preferences regarding topping combinations in food orders. Despite the shortcomings of the data form from this study, it is expected to help business owners in optimizing their offerings, increasing customer satisfaction, and improving their business performance.
关联规则是一种数据挖掘技术,用于寻找项目组合之间的关联规则。本研究旨在应用关联规则算法来识别订餐中受欢迎的配料组合。该应用旨在帮助餐馆老板或食品企业了解顾客的喜好,优化菜单产品。数据来自 kaggle,关联规则算法应用于该数据集,以识别订单中经常出现的配料模式或组合。研究结果表明,巧克力配料是订单中最受欢迎的配料。这些发现可以为食品企业主提供有价值的见解,帮助他们设计菜单和确定对顾客有吸引力的产品。本研究还对 apriori、fp- growth 和 eclat 算法进行了比较,结果发现了最佳项目交易规则:莳萝和独角兽配料与巧克力的组合,置信度为 60%。总体而言,本研究中应用的 eclat 算法性能最佳,执行速度也更快,因此可以深入了解顾客对食品订单中配料组合的偏好。尽管这项研究的数据还存在不足,但它有望帮助企业主优化产品,提高客户满意度,改善企业业绩。
{"title":"Implementation of Association Rules Algorithm to Identify Popular Topping Combinations in Orders","authors":"Rizki Aulia Putra, Margareta Amalia Miranti Putri, Sri Maharani Sinaga, Sania Fitri Octavia, Raihan Catur Rachman","doi":"10.57152/predatecs.v1i2.863","DOIUrl":"https://doi.org/10.57152/predatecs.v1i2.863","url":null,"abstract":"Association rule is a data mining technique to find associative rules between a combination of items. This research aims to apply association rules algorithm in identifying popular topping combinations in food orders. This application aims to help restaurant owners or food businesses understand their customers' preferences and optimize their menu offerings. Data obtained from kaggle, the association rules algorithm is applied to this dataset to identify patterns or combinations of toppings that often appear together in orders. The results of this study show toppings with chocolate as a popular item in orders. These findings can provide valuable insights for food business owners in structuring their menus and determining attractive offers for customers. This study also applied a comparison between the apriori, fp- growth and eclat algorithms, with the result that the best item transaction rule was found: a combination of dill & unicorn toppings with chocolate with 60% confidence. Overall, the application of eclat algorithm in this study provides the best performance with higher execution speed, thus providing insight into customer preferences regarding topping combinations in food orders. Despite the shortcomings of the data form from this study, it is expected to help business owners in optimizing their offerings, increasing customer satisfaction, and improving their business performance.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139897088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Classification of Translated Qur'anic Verses Using Supervised Learning Algorithm 使用监督学习算法对《古兰经》译文进行文本分类
Pub Date : 2024-02-01 DOI: 10.57152/predatecs.v1i2.870
Dhea Ananda, Syahida Nurhidayarnis, Tiara Afrah Afifah, Muhammad Anang Ramadhan, Ilvan Mahendra
The Quran, comprising Allah's absolute divine messages, serves as guidance. Although reading the Quran with tafsir proves beneficial, it may not offer a comprehensive understanding of the entire message conveyed by the Al-Quran. This is due to the Quran addressing diverse topics within each surah, necessitating readers to reference interconnected verses throughout the entire chapter for a holistic interpretation. However, given the extensive and varied verses, obtaining accurate translations for each verse can be a complex and time-consuming endeavor. Therefore, it becomes imperative to categorize the translated text of Quranic verses into distinct classes based on their primary content, utilizing Fuzzy C-Means, Random Forest, and Support Vector Machine. The analysis, considering the obtained Davies-Bouldin Index (DBI) value, reveals that cluster 9 emerges as the optimal cluster for classifying QS An-Nisa data, exhibiting the lowest DBI value of 4.30. Notably, the Random Forest algorithm demonstrates higher accuracy compared to the SVM algorithm, achieving an accuracy rate of 66.37%, while the SVM algorithm attains an accuracy of 50.56%.
古兰经》由真主的绝对神谕组成,具有指导作用。虽然阅读《古兰经》的塔夫西尔(tafsir)证明是有益的,但它可能无法全面理解《古兰经》传达的全部信息。这是因为《古兰经》在每个经节中都涉及不同的主题,读者必须参考整个章节中相互关联的经文,才能获得全面的解释。然而,由于经文内容广泛且多种多样,为每段经文获取准确的译文可能是一项复杂而耗时的工作。因此,当务之急是利用模糊 C-Means、随机森林和支持向量机,根据《古兰经》经文的主要内容将其翻译文本分为不同的类别。考虑到所获得的戴维斯-博尔丁指数(DBI)值,分析表明第 9 组是对《古兰经》An-Nisa 数据进行分类的最佳组群,其 DBI 值最低,为 4.30。值得注意的是,与 SVM 算法相比,随机森林算法的准确率更高,达到了 66.37%,而 SVM 算法的准确率为 50.56%。
{"title":"Text Classification of Translated Qur'anic Verses Using Supervised Learning Algorithm","authors":"Dhea Ananda, Syahida Nurhidayarnis, Tiara Afrah Afifah, Muhammad Anang Ramadhan, Ilvan Mahendra","doi":"10.57152/predatecs.v1i2.870","DOIUrl":"https://doi.org/10.57152/predatecs.v1i2.870","url":null,"abstract":"The Quran, comprising Allah's absolute divine messages, serves as guidance. Although reading the Quran with tafsir proves beneficial, it may not offer a comprehensive understanding of the entire message conveyed by the Al-Quran. This is due to the Quran addressing diverse topics within each surah, necessitating readers to reference interconnected verses throughout the entire chapter for a holistic interpretation. However, given the extensive and varied verses, obtaining accurate translations for each verse can be a complex and time-consuming endeavor. Therefore, it becomes imperative to categorize the translated text of Quranic verses into distinct classes based on their primary content, utilizing Fuzzy C-Means, Random Forest, and Support Vector Machine. The analysis, considering the obtained Davies-Bouldin Index (DBI) value, reveals that cluster 9 emerges as the optimal cluster for classifying QS An-Nisa data, exhibiting the lowest DBI value of 4.30. Notably, the Random Forest algorithm demonstrates higher accuracy compared to the SVM algorithm, achieving an accuracy rate of 66.37%, while the SVM algorithm attains an accuracy of 50.56%.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139897265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Association Rules Algorithm to Identify Popular Topping Combinations in Orders 实施关联规则算法识别订单中的热门顶部组合
Pub Date : 2024-02-01 DOI: 10.57152/predatecs.v1i2.863
Rizki Aulia Putra, Margareta Amalia Miranti Putri, Sri Maharani Sinaga, Sania Fitri Octavia, Raihan Catur Rachman
Association rule is a data mining technique to find associative rules between a combination of items. This research aims to apply association rules algorithm in identifying popular topping combinations in food orders. This application aims to help restaurant owners or food businesses understand their customers' preferences and optimize their menu offerings. Data obtained from kaggle, the association rules algorithm is applied to this dataset to identify patterns or combinations of toppings that often appear together in orders. The results of this study show toppings with chocolate as a popular item in orders. These findings can provide valuable insights for food business owners in structuring their menus and determining attractive offers for customers. This study also applied a comparison between the apriori, fp- growth and eclat algorithms, with the result that the best item transaction rule was found: a combination of dill & unicorn toppings with chocolate with 60% confidence. Overall, the application of eclat algorithm in this study provides the best performance with higher execution speed, thus providing insight into customer preferences regarding topping combinations in food orders. Despite the shortcomings of the data form from this study, it is expected to help business owners in optimizing their offerings, increasing customer satisfaction, and improving their business performance.
关联规则是一种数据挖掘技术,用于寻找项目组合之间的关联规则。本研究旨在应用关联规则算法来识别订餐中受欢迎的配料组合。该应用旨在帮助餐馆老板或食品企业了解顾客的喜好,优化菜单产品。数据来自 kaggle,关联规则算法应用于该数据集,以识别订单中经常出现的配料模式或组合。研究结果表明,巧克力配料是订单中最受欢迎的配料。这些发现可以为食品企业主提供有价值的见解,帮助他们设计菜单和确定对顾客有吸引力的产品。本研究还对 apriori、fp- growth 和 eclat 算法进行了比较,结果发现了最佳项目交易规则:莳萝和独角兽配料与巧克力的组合,置信度为 60%。总体而言,本研究中应用的 eclat 算法性能最佳,执行速度也更快,因此可以深入了解顾客对食品订单中配料组合的偏好。尽管这项研究的数据还存在不足,但它有望帮助企业主优化产品,提高客户满意度,改善企业业绩。
{"title":"Implementation of Association Rules Algorithm to Identify Popular Topping Combinations in Orders","authors":"Rizki Aulia Putra, Margareta Amalia Miranti Putri, Sri Maharani Sinaga, Sania Fitri Octavia, Raihan Catur Rachman","doi":"10.57152/predatecs.v1i2.863","DOIUrl":"https://doi.org/10.57152/predatecs.v1i2.863","url":null,"abstract":"Association rule is a data mining technique to find associative rules between a combination of items. This research aims to apply association rules algorithm in identifying popular topping combinations in food orders. This application aims to help restaurant owners or food businesses understand their customers' preferences and optimize their menu offerings. Data obtained from kaggle, the association rules algorithm is applied to this dataset to identify patterns or combinations of toppings that often appear together in orders. The results of this study show toppings with chocolate as a popular item in orders. These findings can provide valuable insights for food business owners in structuring their menus and determining attractive offers for customers. This study also applied a comparison between the apriori, fp- growth and eclat algorithms, with the result that the best item transaction rule was found: a combination of dill & unicorn toppings with chocolate with 60% confidence. Overall, the application of eclat algorithm in this study provides the best performance with higher execution speed, thus providing insight into customer preferences regarding topping combinations in food orders. Despite the shortcomings of the data form from this study, it is expected to help business owners in optimizing their offerings, increasing customer satisfaction, and improving their business performance.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139893950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Public Research Journal of Engineering, Data Technology and Computer Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1