Annisa Ulizulfa, R. Kusumaningrum, K. Khadijah, Rismiyati Rismiyati
Deep learning has shown promising results in various text-based classification tasks. However, deep learning performance is affected by the number of data, i.e., when the number of data is small, deep learning algorithms do not perform well, and vice versa. Classical machine learning algorithms commonly work well for a few data, and their performance reaches an optimal value and does not increase with the increase in sample data. Therefore, this study aimed to compare the performance of classical machine learning and deep learning methods to detect temperament based on Indonesian Twitter. In this study, the proposed Indonesian Linguistic Inquiry and Word Count were employed to analyze the context of Twitter. The classical machine learning methods implemented were support vector machine and K-nearest neighbor, whereas the deep learning method employed was a convolutional neural network (CNN) with three different architectures. Both learning methods were implemented using multiclass classification and one versus all (OVA) multiclass classification. The highest average f-measure was 58.73%, obtained by CNN OVA with a pool size of 3, a dropout value of 0.7, and a learning rate value of 0.0007.
{"title":"Temperament detection based on Twitter data: classical machine learning versus deep learning","authors":"Annisa Ulizulfa, R. Kusumaningrum, K. Khadijah, Rismiyati Rismiyati","doi":"10.26555/ijain.v8i1.692","DOIUrl":"https://doi.org/10.26555/ijain.v8i1.692","url":null,"abstract":"Deep learning has shown promising results in various text-based classification tasks. However, deep learning performance is affected by the number of data, i.e., when the number of data is small, deep learning algorithms do not perform well, and vice versa. Classical machine learning algorithms commonly work well for a few data, and their performance reaches an optimal value and does not increase with the increase in sample data. Therefore, this study aimed to compare the performance of classical machine learning and deep learning methods to detect temperament based on Indonesian Twitter. In this study, the proposed Indonesian Linguistic Inquiry and Word Count were employed to analyze the context of Twitter. The classical machine learning methods implemented were support vector machine and K-nearest neighbor, whereas the deep learning method employed was a convolutional neural network (CNN) with three different architectures. Both learning methods were implemented using multiclass classification and one versus all (OVA) multiclass classification. The highest average f-measure was 58.73%, obtained by CNN OVA with a pool size of 3, a dropout value of 0.7, and a learning rate value of 0.0007.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81587249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Norhidayah Othman, Cik Feresa Mohd Foozy, Aida Mustapha, S. Mostafa, Shamala Palaniappan, Shafiza Ariffin Kashinath
Traffic summons, also known as traffic tickets, is a notice issued by a law enforcement official to a motorist, who is a person who drives a car, lorry, or bus, and a person who rides a motorcycle. This study is set to perform a comparative experiment to compare the performance of three classification algorithms (Naive Bayes, Gradient Boosted Trees, and Deep Learning algorithm) in classifying the traffic violation types. The performance of all the three classification models developed in this work is measured and compared. The results show that the Gradient Boosted Trees and Deep Learning algorithm have the best value in accuracy and recall but low precision. Naïve Bayes, on the other hand, has high recall since it is a picky classifier that only performs well in a dataset that is high in precision. This paper’s results could serve as baseline results for investigations related to the classification of traffic violation types. It is also helpful for authorities to strategize and plan ways to reduce traffic violations among road users by studying the most common traffic violation types in an area, whether a citation, a warning, or an ESERO (Electronic Safety Equipment Repair Order).
{"title":"A data mining approach for classification of traffic violations types","authors":"Norhidayah Othman, Cik Feresa Mohd Foozy, Aida Mustapha, S. Mostafa, Shamala Palaniappan, Shafiza Ariffin Kashinath","doi":"10.26555/ijain.v7i3.708","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.708","url":null,"abstract":"Traffic summons, also known as traffic tickets, is a notice issued by a law enforcement official to a motorist, who is a person who drives a car, lorry, or bus, and a person who rides a motorcycle. This study is set to perform a comparative experiment to compare the performance of three classification algorithms (Naive Bayes, Gradient Boosted Trees, and Deep Learning algorithm) in classifying the traffic violation types. The performance of all the three classification models developed in this work is measured and compared. The results show that the Gradient Boosted Trees and Deep Learning algorithm have the best value in accuracy and recall but low precision. Naïve Bayes, on the other hand, has high recall since it is a picky classifier that only performs well in a dataset that is high in precision. This paper’s results could serve as baseline results for investigations related to the classification of traffic violation types. It is also helpful for authorities to strategize and plan ways to reduce traffic violations among road users by studying the most common traffic violation types in an area, whether a citation, a warning, or an ESERO (Electronic Safety Equipment Repair Order).","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90835286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Azri Azrul Azmer, Norlida Hassan, Shihab Hamad Khaleefah, S. Mostafa, A. A. Ramli
The texture is the object’s appearance with different surfaces and sizes. It is mainly helpful for different applications, including object recognition, fingerprinting, and surface analysis. The goal of this research is to investigate the best classification models among the Naive Bayes (NB), Random Forest (DF), and k-Nearest Neighbor (k-NN) algorithms in performing texture classification. The algorithms classify the leaves and urban land cover of texture using several evaluation criteria. This research project aims to prove that the accuracy can be used on data of texture that have turned in a group of different types of data target based on the texture’s characteristic and find out which classification algorithm has better performance when analyzing texture patterns. The test results show that the NB algorithm has the best overall accuracy of 78.67% for the leaves dataset and 93.60% overall accuracy for the urban land cover dataset.
{"title":"Comparative analysis of classification techniques for leaves and land cover texture","authors":"Azri Azrul Azmer, Norlida Hassan, Shihab Hamad Khaleefah, S. Mostafa, A. A. Ramli","doi":"10.26555/ijain.v7i3.706","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.706","url":null,"abstract":"The texture is the object’s appearance with different surfaces and sizes. It is mainly helpful for different applications, including object recognition, fingerprinting, and surface analysis. The goal of this research is to investigate the best classification models among the Naive Bayes (NB), Random Forest (DF), and k-Nearest Neighbor (k-NN) algorithms in performing texture classification. The algorithms classify the leaves and urban land cover of texture using several evaluation criteria. This research project aims to prove that the accuracy can be used on data of texture that have turned in a group of different types of data target based on the texture’s characteristic and find out which classification algorithm has better performance when analyzing texture patterns. The test results show that the NB algorithm has the best overall accuracy of 78.67% for the leaves dataset and 93.60% overall accuracy for the urban land cover dataset.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77938154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Kusumaningrum, Iffa Zainan Nisa, Rizka Putri Nawangsari, A. Wibowo
Currently, there are a large number of hotel reviews on the Internet that need to be evaluated to turn the data into practicable information. Deep learning has excellent capabilities for recognizing this type of data. With the advances in deep learning paradigms, many algorithms have been developed that can be used in sentiment analysis tasks. In this study, we aim to compare the performance of classical machine learning algorithms—logistic regression (LR), naïve Bayes (NB), and support vector machine (SVM) using the Word2Vec model in conjunction with deep learning algorithms such as a convolutional neural network (CNN) to classify hotel reviews on the Traveloka website into positive or negative classes. Both learning methods apply hyperparameter tuning to determine the parameters that produce the best model. Furthermore, the Word2Vec model parameters use the skip-gram model, hierarchical softmax evaluation, and the value of 100 vector dimensions. The highest average accuracy obtained was 98.08% by using the CNN with a dropout of 0.2, Tanh as convolution activation, softmax as output activation, and Adam as the optimizer. The findings from the study demonstrate that the integration of the Word2Vec model and the CNN model obtains significantly better accuracy than other classical machine learning methods.
{"title":"Sentiment analysis of Indonesian hotel reviews: from classical machine learning to deep learning","authors":"R. Kusumaningrum, Iffa Zainan Nisa, Rizka Putri Nawangsari, A. Wibowo","doi":"10.26555/ijain.v7i3.737","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.737","url":null,"abstract":"Currently, there are a large number of hotel reviews on the Internet that need to be evaluated to turn the data into practicable information. Deep learning has excellent capabilities for recognizing this type of data. With the advances in deep learning paradigms, many algorithms have been developed that can be used in sentiment analysis tasks. In this study, we aim to compare the performance of classical machine learning algorithms—logistic regression (LR), naïve Bayes (NB), and support vector machine (SVM) using the Word2Vec model in conjunction with deep learning algorithms such as a convolutional neural network (CNN) to classify hotel reviews on the Traveloka website into positive or negative classes. Both learning methods apply hyperparameter tuning to determine the parameters that produce the best model. Furthermore, the Word2Vec model parameters use the skip-gram model, hierarchical softmax evaluation, and the value of 100 vector dimensions. The highest average accuracy obtained was 98.08% by using the CNN with a dropout of 0.2, Tanh as convolution activation, softmax as output activation, and Adam as the optimizer. The findings from the study demonstrate that the integration of the Word2Vec model and the CNN model obtains significantly better accuracy than other classical machine learning methods.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"6 3-4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72569660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The blast furnace is the principal method of producing cast iron. In the production of cast iron, the control of silicon is vital because this impurity is harmful to almost all steels. Artificial neural networks with Bayesian regularization are more robust than traditional back-propagation networks and can reduce or eliminate the need for tedious cross-validation. Bayesian regularization is a mathematical process that converts a nonlinear regression into a "well-posed" statistical problem in the manner of ridge regression. The main objective of this work was to develop an artificial neural network to predict silicon content in hot metal by varying the number of neurons in the hidden layer by 10, 20, 25, 30, 40, 50, 75, and 100 neurons. The results show that all neural networks converged and presented reliable results, neural networks with 20, 25, and 30 neurons showed the best overall results. However, In short, Bayesian neural networks can be used in practice because the actual values correlate excellently with the values calculated by the neural network.
{"title":"Prediction of silicon content in the hot metal using Bayesian networks and probabilistic reasoning","authors":"W. Cardoso, R. Felice","doi":"10.26555/ijain.v7i3.771","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.771","url":null,"abstract":"The blast furnace is the principal method of producing cast iron. In the production of cast iron, the control of silicon is vital because this impurity is harmful to almost all steels. Artificial neural networks with Bayesian regularization are more robust than traditional back-propagation networks and can reduce or eliminate the need for tedious cross-validation. Bayesian regularization is a mathematical process that converts a nonlinear regression into a \"well-posed\" statistical problem in the manner of ridge regression. The main objective of this work was to develop an artificial neural network to predict silicon content in hot metal by varying the number of neurons in the hidden layer by 10, 20, 25, 30, 40, 50, 75, and 100 neurons. The results show that all neural networks converged and presented reliable results, neural networks with 20, 25, and 30 neurons showed the best overall results. However, In short, Bayesian neural networks can be used in practice because the actual values correlate excellently with the values calculated by the neural network.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"3 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75375252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nurul Bashirah Ghazali, Dang Fillatina Hashim, F. Che Seman, K. Isa, K. N. Ramli, Z. Abidin, S. Mustam, Mohammed Al Haek
Asymmetrical Digital Subscriber Line (ADSL) is the technology widely deployed worldwide, but its performance may be limited with respect to its intrinsic. The nature of the copper cable causes it to be more susceptible to signal degradation and faulty line. Common ADSL line faults are short-wired fault, open-wired fault, bridge taps, and uneven pair. However, ADSL technology is still one of the most established networks, and users in the suburban area still depend on the technology to access the internet service. This paper discussed and compared a machine learning algorithm based on Decision Trees (J48), K-Nearest Neighbor, Multi-level Perceptron, Naïve Bayes, Random Forest, and Sequential Minimal Optimization (SMO) for ADSL line impairment that affects the line operation performance concerning their percentage of accuracy. Resulting from classifications done using algorithms as mentioned above, the random forest algorithm gives the highest overall accuracy for the ADSL line impairment dataset. The best algorithm for classifying DSL line impairment is chosen based on the highest accuracy percentage. The accomplishment classification of fault type in the ADSL copper access network project may benefit the telecommunication network provider by remotely assessing the network condition rather than on-site.
{"title":"Cable fault classification in ADSL copper access network using machine learning","authors":"Nurul Bashirah Ghazali, Dang Fillatina Hashim, F. Che Seman, K. Isa, K. N. Ramli, Z. Abidin, S. Mustam, Mohammed Al Haek","doi":"10.26555/ijain.v7i3.488","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.488","url":null,"abstract":"Asymmetrical Digital Subscriber Line (ADSL) is the technology widely deployed worldwide, but its performance may be limited with respect to its intrinsic. The nature of the copper cable causes it to be more susceptible to signal degradation and faulty line. Common ADSL line faults are short-wired fault, open-wired fault, bridge taps, and uneven pair. However, ADSL technology is still one of the most established networks, and users in the suburban area still depend on the technology to access the internet service. This paper discussed and compared a machine learning algorithm based on Decision Trees (J48), K-Nearest Neighbor, Multi-level Perceptron, Naïve Bayes, Random Forest, and Sequential Minimal Optimization (SMO) for ADSL line impairment that affects the line operation performance concerning their percentage of accuracy. Resulting from classifications done using algorithms as mentioned above, the random forest algorithm gives the highest overall accuracy for the ADSL line impairment dataset. The best algorithm for classifying DSL line impairment is chosen based on the highest accuracy percentage. The accomplishment classification of fault type in the ADSL copper access network project may benefit the telecommunication network provider by remotely assessing the network condition rather than on-site.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"103 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72439806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study compares cyber insurance premiums with and without a communication network effect frequency. As a cybersecurity factor, the frequency in a communication network influences the speed of cyberattack transmission. It means that a network or a high activity node is more vulnerable than a network with low activity. Traditionally, cyber insurance pricing considers historical data to set premiums or rates. Conversely, the network security level can evaluate using the Monte Carlo simulation based on the epidemic model. This simulation requires spreading parameters, such as infection rate, recovery rate, and self-infection rate. Our idea is to modify the infection rate as a function of the frequency in a communication network. The node-based model uses probability distributions for the communication mechanism to generate the data. It adopts the co-purchase network formation in market basket analysis for building weighted edges and nodes. Simulations are used to compare the initial and modified infection rates. This paper considered prism and Petersen graph topology as case studies. The relative difference is a metric to compare the significance of premium adjustment. The results show that the premium for a node with a low level in a communication network can reach 28.28% lower than the initial premium. The premium can reach 20.99% lower than the initial network premium for a network. Based on these results, insurance companies can adjust cyber insurance premiums based on computer usage to offer a more appropriate price.
{"title":"Adjusting cyber insurance premiums based on frequency in a communication network","authors":"S. Indratno, Y. Antonio, S. Saputro","doi":"10.26555/ijain.v7i3.698","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.698","url":null,"abstract":"This study compares cyber insurance premiums with and without a communication network effect frequency. As a cybersecurity factor, the frequency in a communication network influences the speed of cyberattack transmission. It means that a network or a high activity node is more vulnerable than a network with low activity. Traditionally, cyber insurance pricing considers historical data to set premiums or rates. Conversely, the network security level can evaluate using the Monte Carlo simulation based on the epidemic model. This simulation requires spreading parameters, such as infection rate, recovery rate, and self-infection rate. Our idea is to modify the infection rate as a function of the frequency in a communication network. The node-based model uses probability distributions for the communication mechanism to generate the data. It adopts the co-purchase network formation in market basket analysis for building weighted edges and nodes. Simulations are used to compare the initial and modified infection rates. This paper considered prism and Petersen graph topology as case studies. The relative difference is a metric to compare the significance of premium adjustment. The results show that the premium for a node with a low level in a communication network can reach 28.28% lower than the initial premium. The premium can reach 20.99% lower than the initial network premium for a network. Based on these results, insurance companies can adjust cyber insurance premiums based on computer usage to offer a more appropriate price.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75921750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wirapong Chansanam, Kulthida Tuamsuk, Kanyarat Kwiecien, Sam Oh
This research aimed to study and analyze the influence and impact of Korean popular culture (K-pop) on Thai society. In this study, we used Social Network Analysis (SNA) to analyze streaming data obtained from a variety of YouTube channels belonging to YouTubers across the world, text analytics to analyze demographic characteristics, YouTuber's presentation techniques, as well as subscriber behavior, and multiple correlations analysis to analyze the relationship between factors affecting YouTube Channels in Thailand. The findings revealed that five Thai YouTube Channels were influencing Thai society. Furthermore, there were robust positive correlations between the number of dislikes and the number of comments (0.79), and the number of likes and comments (0.65). Additionally, there was a positive correlation between the number of views and the number of dislikes and one between the number of likes and dislikes. Future research can supplement the present findings with other social media sources to yield an even more diverse and comprehensive analysis. These analytics can be applied to various situations, including corporate marketing strategies, political campaigns, or disease/symptom analysis in medicine. This research extends to social computing by revealing intelligent trends in social networks.
{"title":"Korean popular culture analytics in social media streaming: evidence from YouTube channels in Thailand","authors":"Wirapong Chansanam, Kulthida Tuamsuk, Kanyarat Kwiecien, Sam Oh","doi":"10.26555/ijain.v7i3.769","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.769","url":null,"abstract":"This research aimed to study and analyze the influence and impact of Korean popular culture (K-pop) on Thai society. In this study, we used Social Network Analysis (SNA) to analyze streaming data obtained from a variety of YouTube channels belonging to YouTubers across the world, text analytics to analyze demographic characteristics, YouTuber's presentation techniques, as well as subscriber behavior, and multiple correlations analysis to analyze the relationship between factors affecting YouTube Channels in Thailand. The findings revealed that five Thai YouTube Channels were influencing Thai society. Furthermore, there were robust positive correlations between the number of dislikes and the number of comments (0.79), and the number of likes and comments (0.65). Additionally, there was a positive correlation between the number of views and the number of dislikes and one between the number of likes and dislikes. Future research can supplement the present findings with other social media sources to yield an even more diverse and comprehensive analysis. These analytics can be applied to various situations, including corporate marketing strategies, political campaigns, or disease/symptom analysis in medicine. This research extends to social computing by revealing intelligent trends in social networks.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88551899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper proposes a Marker Detection Method for Estimating the Angle and Distance of Underwater Remotely Operated Vehicle (ROV) to Buoyant Boat. To keep the ROV aligned with the boat, a marker and visual recognition system are designed. The marker is placed facing down under the boat and a method is developed to recognize the angle and distance of the marker from a facing up camera on the ROV. By considering space, payload, heat dissipation, and buoyancy in a micro class ROV, there are limited options for computing power that can be utilized. This challenge demands a lightweight visual recognition technique for small computers. The proposed method consists of two steps. The marker designing step explains how the marker is constructed of simple components. The marker recognizing step is based on image processing that uses threshold and blob filtering. They are blob size and blob circularity filters which are used to eliminate unwanted information. The real-time orientation and distance estimation by using one camera are the superiority of this method. The proposed method has been tested by using an 11x11 cm2 marker size. The detection rate of the marker is 90% and can be detected up to 120 cm from the camera. The marker can be tilted up to 50° and still has an 80% detection rate. The method can estimate marker rotation angle accurately with a 1.75° average error. The method can estimate the distance between the marker and camera with a -0.62 cm average error. The blob filter is also proven to be superior to a regular dilating and eroding method.
{"title":"Development of marker detection method for estimating angle and distance of underwater remotely operated vehicle to buoyant boat","authors":"Muhammad Qomaruz Zaman, R. Mardiyanto","doi":"10.26555/ijain.v7i3.455","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.455","url":null,"abstract":"The paper proposes a Marker Detection Method for Estimating the Angle and Distance of Underwater Remotely Operated Vehicle (ROV) to Buoyant Boat. To keep the ROV aligned with the boat, a marker and visual recognition system are designed. The marker is placed facing down under the boat and a method is developed to recognize the angle and distance of the marker from a facing up camera on the ROV. By considering space, payload, heat dissipation, and buoyancy in a micro class ROV, there are limited options for computing power that can be utilized. This challenge demands a lightweight visual recognition technique for small computers. The proposed method consists of two steps. The marker designing step explains how the marker is constructed of simple components. The marker recognizing step is based on image processing that uses threshold and blob filtering. They are blob size and blob circularity filters which are used to eliminate unwanted information. The real-time orientation and distance estimation by using one camera are the superiority of this method. The proposed method has been tested by using an 11x11 cm2 marker size. The detection rate of the marker is 90% and can be detected up to 120 cm from the camera. The marker can be tilted up to 50° and still has an 80% detection rate. The method can estimate marker rotation angle accurately with a 1.75° average error. The method can estimate the distance between the marker and camera with a -0.62 cm average error. The blob filter is also proven to be superior to a regular dilating and eroding method.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85405815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K-nearest neighbors (KNN) has been extensively used as imputation algorithm to substitute missing data with plausible values. One of the successes of KNN imputation is the ability to measure the missing data simulated from its nearest neighbors robustly. However, despite the favorable points, KNN still imposes undesirable circumstances. KNN suffers from high time complexity, choosing the right k, and different functions. Thus, this paper proposes a novel method for imputation of missing data, named KNNGOA, which optimized the KNN imputation technique based on the grasshopper optimization algorithm. Our GOA is designed to find the best value of k and optimize the imputed value from KNN that maximizes the imputation accuracy. Experimental evaluation for different types of datasets collected from UCI, with various rates of missing values ranging from 10%, 30%, and 50%. Our proposed algorithm has achieved promising results from the experiment conducted, which outperformed other methods, especially in terms of accuracy.
{"title":"An improved K-Nearest neighbour with grasshopper optimization algorithm for imputation of missing data","authors":"Nadzurah Zainal Abidin, Amelia Ritahani Ismail","doi":"10.26555/ijain.v7i3.696","DOIUrl":"https://doi.org/10.26555/ijain.v7i3.696","url":null,"abstract":"K-nearest neighbors (KNN) has been extensively used as imputation algorithm to substitute missing data with plausible values. One of the successes of KNN imputation is the ability to measure the missing data simulated from its nearest neighbors robustly. However, despite the favorable points, KNN still imposes undesirable circumstances. KNN suffers from high time complexity, choosing the right k, and different functions. Thus, this paper proposes a novel method for imputation of missing data, named KNNGOA, which optimized the KNN imputation technique based on the grasshopper optimization algorithm. Our GOA is designed to find the best value of k and optimize the imputed value from KNN that maximizes the imputation accuracy. Experimental evaluation for different types of datasets collected from UCI, with various rates of missing values ranging from 10%, 30%, and 50%. Our proposed algorithm has achieved promising results from the experiment conducted, which outperformed other methods, especially in terms of accuracy.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73621283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}