Pub Date : 2023-01-01DOI: 10.12720/jait.14.3.454-462
Christofer Satria, Anthony Anggrawan, Mayadi
— Currently, the famous restaurant visited by many people is a roadside stall. Generally, the roadside stall sells multiple kinds of food, drink, and snacks. The problem is that roadside stalls have difficulty determining what food items are best-selling to be used as menu packages of choice from almost hundreds of menu items. That is why it needs data mining of roadside stall sales data to explore correlation information and sales transaction patterns for food items that most often become food pairs sold. Therefore, this study aims to analyze the frequency of the most item sets from data sales in food stalls using the Frequent Pattern Growth (FP-Growth) and Apriori data mining methods to recommend which foods/beverages are the best-selling menu packages. The research and development results show that with 980 transaction data with a minimum support value of 20% and a trust value of at least 50% for FP-Growth, it produces eight valid rules. For Apriori, it has five valid rules as a menu package recommendation. The results of the sales trial of the recommended menu package for two months showed that the total sales increased significantly up to 2.37 times greater than the previous sales .
{"title":"Recommendation System of Food Package Using Apriori and FP-Growth Data Mining Methods","authors":"Christofer Satria, Anthony Anggrawan, Mayadi","doi":"10.12720/jait.14.3.454-462","DOIUrl":"https://doi.org/10.12720/jait.14.3.454-462","url":null,"abstract":"— Currently, the famous restaurant visited by many people is a roadside stall. Generally, the roadside stall sells multiple kinds of food, drink, and snacks. The problem is that roadside stalls have difficulty determining what food items are best-selling to be used as menu packages of choice from almost hundreds of menu items. That is why it needs data mining of roadside stall sales data to explore correlation information and sales transaction patterns for food items that most often become food pairs sold. Therefore, this study aims to analyze the frequency of the most item sets from data sales in food stalls using the Frequent Pattern Growth (FP-Growth) and Apriori data mining methods to recommend which foods/beverages are the best-selling menu packages. The research and development results show that with 980 transaction data with a minimum support value of 20% and a trust value of at least 50% for FP-Growth, it produces eight valid rules. For Apriori, it has five valid rules as a menu package recommendation. The results of the sales trial of the recommended menu package for two months showed that the total sales increased significantly up to 2.37 times greater than the previous sales .","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66331489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.3.501-509
B. Purahong, J. Sithiyopasakul, P. Sithiyopasakul, A. Lasakul, C. Benjangkaprasert
—The goal of this article is to study and analyze the container orchestration technology Kubernetes, Docker Swarm, and Apache Mesos by performing performance evaluations and inspecting how many requests and responses the server can handle. Due to the fact that managing information system resources is a challenge in terms of performance, usability, reliability, and the cost of information resources. Some orchestration tools cannot automatically allocate resources depending on the scope of the information system resource management. This leads to allocating resources more than the needs of system requirements, resulting in excessive costs. Therefore, this article proposed testing the system by measuring its effectiveness using a structured process by examining measurement variables such as the number of requests per second, number of responses to requests, and resource extension period using all three-orchestration technology. From the testing and analysis of all three variables as mentioned, it is possible to know the efficiency of the Kubernetes technology in such a similar environment and compared it with other orchestration tools like Docker Swarm and Apache Mesos orchestrator. For Kubernetes, Docker Swarm, and Apache Mesos, the mean value of its handling average request per minute is 30,677.25/min, 33,688.67/min, and 29,682.6/min, respectively. Swarm performed better in aspects of handling requests per minute by 9.35% of the difference when compared to Kubernetes and by 12.64% when compared to Apache Mesos. However, there are several things which should be taken into consideration because each orchestration tool has its own strong and weak points. The testing experiment could display a piece of information on the dashboard for visualization and analytic purposes and there is an elaboration at the end of when to use which container orchestration tool to suit the business proposes the most .
{"title":"Automated Resource Management System Based upon Container Orchestration Tools Comparison","authors":"B. Purahong, J. Sithiyopasakul, P. Sithiyopasakul, A. Lasakul, C. Benjangkaprasert","doi":"10.12720/jait.14.3.501-509","DOIUrl":"https://doi.org/10.12720/jait.14.3.501-509","url":null,"abstract":"—The goal of this article is to study and analyze the container orchestration technology Kubernetes, Docker Swarm, and Apache Mesos by performing performance evaluations and inspecting how many requests and responses the server can handle. Due to the fact that managing information system resources is a challenge in terms of performance, usability, reliability, and the cost of information resources. Some orchestration tools cannot automatically allocate resources depending on the scope of the information system resource management. This leads to allocating resources more than the needs of system requirements, resulting in excessive costs. Therefore, this article proposed testing the system by measuring its effectiveness using a structured process by examining measurement variables such as the number of requests per second, number of responses to requests, and resource extension period using all three-orchestration technology. From the testing and analysis of all three variables as mentioned, it is possible to know the efficiency of the Kubernetes technology in such a similar environment and compared it with other orchestration tools like Docker Swarm and Apache Mesos orchestrator. For Kubernetes, Docker Swarm, and Apache Mesos, the mean value of its handling average request per minute is 30,677.25/min, 33,688.67/min, and 29,682.6/min, respectively. Swarm performed better in aspects of handling requests per minute by 9.35% of the difference when compared to Kubernetes and by 12.64% when compared to Apache Mesos. However, there are several things which should be taken into consideration because each orchestration tool has its own strong and weak points. The testing experiment could display a piece of information on the dashboard for visualization and analytic purposes and there is an elaboration at the end of when to use which container orchestration tool to suit the business proposes the most .","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66331597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.630-638
E. Blancaflor, Joselito Lizer C. Daluz, Roduel Adrian G. Garcia, Nathan Gadiel S. Monton, Jhoana Marie S. Vergara
N.G.S.M
N.G.S.M
{"title":"A Literature Review on the Pervasiveness of Ransomware Threats and Attacks in the Philippines","authors":"E. Blancaflor, Joselito Lizer C. Daluz, Roduel Adrian G. Garcia, Nathan Gadiel S. Monton, Jhoana Marie S. Vergara","doi":"10.12720/jait.14.4.630-638","DOIUrl":"https://doi.org/10.12720/jait.14.4.630-638","url":null,"abstract":"N.G.S.M","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66332597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.656-667
Muljono, M. Nababan, R. A. Nugroho, Kevin Djajadinata
—This article is based on text summarization research model, also referred to as “text summarization”, which is the act of summarizing materials in a way that directly communicates the intent or message of a document. Hierarchical Attention SumRuNNer (HASumRuNNer), an extractive text summary model based on the Indonesian language is the text summary model suggested in this study. This is a novelty for the extractive text summary model based on the Indonesian language, as there is currently very few related research, both in terms of the approach and dataset. Three primary methods—BiGRU, CharCNN, and hierarchical attention mechanisms—were used to create the model for this study. The optimization in this suggested model is likewise carried out using a variety of gradient-based methods, and the ROUGE-N approach is used to assess the outcomes of text synthesis. The test results demonstrate that Adam’s gradient-based approach is the most effective for extracting text summarization using the HASumRuNNer model. As can be seen, the values of RED-1 (70.7), RED-2 (64.33), and RED-L (68.14) are greater than those of other methods employed as references. The approach used in the suggested HASumRuNNer Model, which combines BiGRU with CharCNN, can result in more accurate word and sentence representations at word and sentence levels. Additionally, the word and sentence-level hierarchical attention mechanisms aid in preventing the loss of information on each word in documents that are typically brought on by the length of the input model word or sentence.
{"title":"HASumRuNNer: An Extractive Text Summarization Optimization Model Based on a Gradient-Based Algorithm","authors":"Muljono, M. Nababan, R. A. Nugroho, Kevin Djajadinata","doi":"10.12720/jait.14.4.656-667","DOIUrl":"https://doi.org/10.12720/jait.14.4.656-667","url":null,"abstract":"—This article is based on text summarization research model, also referred to as “text summarization”, which is the act of summarizing materials in a way that directly communicates the intent or message of a document. Hierarchical Attention SumRuNNer (HASumRuNNer), an extractive text summary model based on the Indonesian language is the text summary model suggested in this study. This is a novelty for the extractive text summary model based on the Indonesian language, as there is currently very few related research, both in terms of the approach and dataset. Three primary methods—BiGRU, CharCNN, and hierarchical attention mechanisms—were used to create the model for this study. The optimization in this suggested model is likewise carried out using a variety of gradient-based methods, and the ROUGE-N approach is used to assess the outcomes of text synthesis. The test results demonstrate that Adam’s gradient-based approach is the most effective for extracting text summarization using the HASumRuNNer model. As can be seen, the values of RED-1 (70.7), RED-2 (64.33), and RED-L (68.14) are greater than those of other methods employed as references. The approach used in the suggested HASumRuNNer Model, which combines BiGRU with CharCNN, can result in more accurate word and sentence representations at word and sentence levels. Additionally, the word and sentence-level hierarchical attention mechanisms aid in preventing the loss of information on each word in documents that are typically brought on by the length of the input model word or sentence.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66332757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.749-757
Sonia C Mantilla, C. Peña, G. G. Moreno
—This article compares the results obtained from applying two methods of measuring the range of motion in the upper extremity. From a descriptive, cross-sectional study, measurements were made using the most widely used and traditional method in the health area known as goniometry as well as a semi-automatic method developed using new technological tools. The results demonstrated that the technique has a high sensitivity and can analyze static body positions and movement evolutions, expanding its applicability range
{"title":"Comparison of Shoulder Range of Motion Evaluation by Traditional and Semi-Automatic Methods","authors":"Sonia C Mantilla, C. Peña, G. G. Moreno","doi":"10.12720/jait.14.4.749-757","DOIUrl":"https://doi.org/10.12720/jait.14.4.749-757","url":null,"abstract":"—This article compares the results obtained from applying two methods of measuring the range of motion in the upper extremity. From a descriptive, cross-sectional study, measurements were made using the most widely used and traditional method in the health area known as goniometry as well as a semi-automatic method developed using new technological tools. The results demonstrated that the technique has a high sensitivity and can analyze static body positions and movement evolutions, expanding its applicability range","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66333320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.758-768
Randa Ahmad Al-Wadi, Adi Maaita
—In a microservice-based system, each microservice is a stand-alone application that may be targeted individually to obtain unauthorized access. Consequently, it is necessary to include authentication and authorization features. However, a set of related design decisions needs to be taken in a way that accommodates the scale of a developed system. To illustrate, a user may be authenticated depending on a password and authorized based on roles. In such a case, one integrated authentication and role-based authorization microservice can be added. Besides, the Application Programming Interfaces (APIs) that are associated with roles may be hard-coded as static API-level role authorization checks. Nevertheless, static relation between roles and APIs hinders the ease of modification of their associations when a massive number of APIs exist in a microservice system. To transform the relation into dynamic relation, this paper presents a generic microservice-based architectural design with a separate role-based authorization microservice that contains role/API database records. Moreover, it shows experimentation for performance optimization that was carried out on authentication and role-based authorization databases to utilize the suggested architectural design. The obtained results of password-based authentication encouraged employing not only Structured Query Language (NoSQL) databases with small microservice-based systems, which deal with 1500 users or less while employing Structured Query Language (SQL) databases with medium to large systems. Furthermore, the results indicated that there is no difference between the two database types in the role-based authorization process for all API-based system scale levels.
{"title":"Authentication and Role-Based Authorization in Microservice Architecture: A Generic Performance-Centric Design","authors":"Randa Ahmad Al-Wadi, Adi Maaita","doi":"10.12720/jait.14.4.758-768","DOIUrl":"https://doi.org/10.12720/jait.14.4.758-768","url":null,"abstract":"—In a microservice-based system, each microservice is a stand-alone application that may be targeted individually to obtain unauthorized access. Consequently, it is necessary to include authentication and authorization features. However, a set of related design decisions needs to be taken in a way that accommodates the scale of a developed system. To illustrate, a user may be authenticated depending on a password and authorized based on roles. In such a case, one integrated authentication and role-based authorization microservice can be added. Besides, the Application Programming Interfaces (APIs) that are associated with roles may be hard-coded as static API-level role authorization checks. Nevertheless, static relation between roles and APIs hinders the ease of modification of their associations when a massive number of APIs exist in a microservice system. To transform the relation into dynamic relation, this paper presents a generic microservice-based architectural design with a separate role-based authorization microservice that contains role/API database records. Moreover, it shows experimentation for performance optimization that was carried out on authentication and role-based authorization databases to utilize the suggested architectural design. The obtained results of password-based authentication encouraged employing not only Structured Query Language (NoSQL) databases with small microservice-based systems, which deal with 1500 users or less while employing Structured Query Language (SQL) databases with medium to large systems. Furthermore, the results indicated that there is no difference between the two database types in the role-based authorization process for all API-based system scale levels.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66333388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.796-802
Daniela M. Schönle, Christoph Reich, D. Abdeslam
.
.
{"title":"Linguistic Driven Feature Selection for Text Classification as Stop Word Replacement","authors":"Daniela M. Schönle, Christoph Reich, D. Abdeslam","doi":"10.12720/jait.14.4.796-802","DOIUrl":"https://doi.org/10.12720/jait.14.4.796-802","url":null,"abstract":".","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66333686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.4.788-795
Rafael Bolívar León, C. Peña, G. G. Moreno
—This article presents the development of an algorithm for identifying Egyptian hieroglyphs written on papyri. For the development of the algorithm, the implementation of parametric artificial vision techniques allowed the reduction of computational power required. A study of the main morphological characteristics used in artificial vision was carried out, some relevant ones were selected, and others were adapted to be normalized and quantified quickly. It was shown that the established characteristics allow the differentiation and identification of the hieroglyphs of the ancient Egyptian alphabet. The developed algorithm has the advantage that it allows to differentiate characters, regardless of their initial size.
{"title":"Advances in the Development of an Algorithm for Parametric Identification of Egyptian Hieroglyphs Using Artificial Vision","authors":"Rafael Bolívar León, C. Peña, G. G. Moreno","doi":"10.12720/jait.14.4.788-795","DOIUrl":"https://doi.org/10.12720/jait.14.4.788-795","url":null,"abstract":"—This article presents the development of an algorithm for identifying Egyptian hieroglyphs written on papyri. For the development of the algorithm, the implementation of parametric artificial vision techniques allowed the reduction of computational power required. A study of the main morphological characteristics used in artificial vision was carried out, some relevant ones were selected, and others were adapted to be normalized and quantified quickly. It was shown that the established characteristics allow the differentiation and identification of the hieroglyphs of the ancient Egyptian alphabet. The developed algorithm has the advantage that it allows to differentiate characters, regardless of their initial size.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66333926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.5.941-949
Mustafa Sabah Mustafa, Mohammed Hasan Ali, Mustafa Musa Jaber, Amjad Rehman Khan, Narmine ElHakim, Tanzila Saba
—The healthcare sector has used cyber-physical systems to provide high-quality patient treatment. Many attack surfaces need sophisticated security solutions because of the wide range of medical devices, mobile devices, and body sensor nodes. Cyber-physical systems have various processing technologies, which means these technical methods are as varied. To reduce fraud and medical mistakes, restricted access to these data and fault authentication must be implemented. Because these procedures require information management about problem identification and diagnosis at a complex level distinct from technology, existing technologies must be better suited. This paper suggests a Computer Vision Technology-based Fault Detection (CVT-FD) framework for securely sharing healthcare data. When utilizing a trusted device like a mobile phone, end-users can rest assured that their data is secure. Cyber-attack behaviour can be predicted using an Artificial Neural Network (ANN), and analyzing this data can assist healthcare professionals in making decisions. The experimental findings show that the model outperforms current detection accuracy (98.3%), energy consumption (97.2%), attack prediction (96.6%), efficiency (97.9%), and delay ratios (35.6%) over existing approaches.
{"title":"Secure and Smart Teleradiology Framework Integrated with Technology-Based Fault Detection (CVT-FD)","authors":"Mustafa Sabah Mustafa, Mohammed Hasan Ali, Mustafa Musa Jaber, Amjad Rehman Khan, Narmine ElHakim, Tanzila Saba","doi":"10.12720/jait.14.5.941-949","DOIUrl":"https://doi.org/10.12720/jait.14.5.941-949","url":null,"abstract":"—The healthcare sector has used cyber-physical systems to provide high-quality patient treatment. Many attack surfaces need sophisticated security solutions because of the wide range of medical devices, mobile devices, and body sensor nodes. Cyber-physical systems have various processing technologies, which means these technical methods are as varied. To reduce fraud and medical mistakes, restricted access to these data and fault authentication must be implemented. Because these procedures require information management about problem identification and diagnosis at a complex level distinct from technology, existing technologies must be better suited. This paper suggests a Computer Vision Technology-based Fault Detection (CVT-FD) framework for securely sharing healthcare data. When utilizing a trusted device like a mobile phone, end-users can rest assured that their data is secure. Cyber-attack behaviour can be predicted using an Artificial Neural Network (ANN), and analyzing this data can assist healthcare professionals in making decisions. The experimental findings show that the model outperforms current detection accuracy (98.3%), energy consumption (97.2%), attack prediction (96.6%), efficiency (97.9%), and delay ratios (35.6%) over existing approaches.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135649641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-01DOI: 10.12720/jait.14.5.934-940
Jyoti Shetty, G. Shobha
—Class imbalance is a classical problem in data mining, where the classes in a dataset have a disproportionate number of instances. Most machine learning tasks fail to work properly with an imbalanced dataset. There exist various approaches to balance a dataset, but suffer from issues such as overfitting and information loss. This manuscript proposes a novel and improved cluster-based undersampling method for handling two and multi-class imbalanced dataset. Ensemble learning algorithm integrated with the pre-processing technique is used to address the class imbalance problem. The proposed approach is tested using a publicly available imbalanced Google cluster dataset, in case of imbalanced dataset the F1-score value for each class has to be checked, it is observed that the existing approaches F1-score for class 0 was not good, whereas the proposed algorithm had a balanced F1-score of 0.97 for class 0 and 0.96 for class 1. There is an improvement in F1-score of about 2% compared to the existing technique. Similarly for multi-class problem the proposed novel algorithm gave balanced AUC values of 0.87, 0.83 and 0.97 for class 0, class 1 and class 2 respectively.
{"title":"Handling Class Imbalance in Google Cluster Dataset Using a New Hybrid Sampling Approach","authors":"Jyoti Shetty, G. Shobha","doi":"10.12720/jait.14.5.934-940","DOIUrl":"https://doi.org/10.12720/jait.14.5.934-940","url":null,"abstract":"—Class imbalance is a classical problem in data mining, where the classes in a dataset have a disproportionate number of instances. Most machine learning tasks fail to work properly with an imbalanced dataset. There exist various approaches to balance a dataset, but suffer from issues such as overfitting and information loss. This manuscript proposes a novel and improved cluster-based undersampling method for handling two and multi-class imbalanced dataset. Ensemble learning algorithm integrated with the pre-processing technique is used to address the class imbalance problem. The proposed approach is tested using a publicly available imbalanced Google cluster dataset, in case of imbalanced dataset the F1-score value for each class has to be checked, it is observed that the existing approaches F1-score for class 0 was not good, whereas the proposed algorithm had a balanced F1-score of 0.97 for class 0 and 0.96 for class 1. There is an improvement in F1-score of about 2% compared to the existing technique. Similarly for multi-class problem the proposed novel algorithm gave balanced AUC values of 0.87, 0.83 and 0.97 for class 0, class 1 and class 2 respectively.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135649643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}