Deep learning and computer vision-based approaches incorporated with the evolution of the relevant technologies of Unmanned Aerial Vehicles (UAVs) and drones have significantly motivated the advancements of disaster management applications. This research studied a classification method for disaster event identification from UAV images that is suitable for disaster monitoring. A Convolution Neural Network (CNN) of GoogleNet models that were pretrained from ImageNet and Place365 datasets was explored to find the appropriate one for fine-tuning to classify the disaster events. In order to get the optimal performance, a systematic configuration for searching the hyperparameters in fine-tuning the CNN model was proposed. The top three hyperparameters that affect the performance, which are the initial learning rate, the number of epochs, and the minibatch size, were systematically set and tuned for each configuration. The proposed approach consists of five stages, during which three types of trials were used to monitor different sets of the hyperparameters. The experimental result revealed that by applying the proposed approach the model performance can increase up to 5%. The optimal performance achieved was 98.77 percent accuracy. For UAV/drone applications, where a small onboard model is preferred, GoogleNet that is quite small in model size and has a good structure for further fine tuning is suitable to deploy.
{"title":"Systematic Configuration for Hyperparameters Optimization in Transferring of CNN Model to Disaster Events Classification from UAV Images","authors":"Supaporn Bunrit, Nittaya Kerdprasop, Kittisak Kerdprasop","doi":"10.18178/joig.11.3.263-270","DOIUrl":"https://doi.org/10.18178/joig.11.3.263-270","url":null,"abstract":"Deep learning and computer vision-based approaches incorporated with the evolution of the relevant technologies of Unmanned Aerial Vehicles (UAVs) and drones have significantly motivated the advancements of disaster management applications. This research studied a classification method for disaster event identification from UAV images that is suitable for disaster monitoring. A Convolution Neural Network (CNN) of GoogleNet models that were pretrained from ImageNet and Place365 datasets was explored to find the appropriate one for fine-tuning to classify the disaster events. In order to get the optimal performance, a systematic configuration for searching the hyperparameters in fine-tuning the CNN model was proposed. The top three hyperparameters that affect the performance, which are the initial learning rate, the number of epochs, and the minibatch size, were systematically set and tuned for each configuration. The proposed approach consists of five stages, during which three types of trials were used to monitor different sets of the hyperparameters. The experimental result revealed that by applying the proposed approach the model performance can increase up to 5%. The optimal performance achieved was 98.77 percent accuracy. For UAV/drone applications, where a small onboard model is preferred, GoogleNet that is quite small in model size and has a good structure for further fine tuning is suitable to deploy.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74403775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-01DOI: 10.18178/joig.11.3.227-232
D. Mahanta, D. Hazarika, V. K. Nath
Forest fires cause disastrous damage to both human life and ecosystem. Therefore, it is essential to detect forest fires in the early stage to reduce the damage. Convolutional Neural Networks (CNNs) are widely used for forest fire detection. This paper proposes a new backbone network for a CNN-based forest fire detection model. The proposed backbone network can detect the plumes of smoke well by decomposing the conventional convolution into depth-wise and coordinate ones to better extract information from objects that spread along the vertical dimension. Experimental results show that the proposed backbone network outperforms other popular ones by achieving a detection accuracy of up to 52.6 AP.1
{"title":"An Efficient Backbone for Early Forest Fire Detection Based on Convolutional Neural Networks","authors":"D. Mahanta, D. Hazarika, V. K. Nath","doi":"10.18178/joig.11.3.227-232","DOIUrl":"https://doi.org/10.18178/joig.11.3.227-232","url":null,"abstract":"Forest fires cause disastrous damage to both human life and ecosystem. Therefore, it is essential to detect forest fires in the early stage to reduce the damage. Convolutional Neural Networks (CNNs) are widely used for forest fire detection. This paper proposes a new backbone network for a CNN-based forest fire detection model. The proposed backbone network can detect the plumes of smoke well by decomposing the conventional convolution into depth-wise and coordinate ones to better extract information from objects that spread along the vertical dimension. Experimental results show that the proposed backbone network outperforms other popular ones by achieving a detection accuracy of up to 52.6 AP.1","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88046094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A novel medical image encryption technique has been proposed based on the features of DNA encodingdecoding in combination with Logistic map approach. The approach is proven for encryption of highly sensitive medical images with 100 percent integrity or negligible data loss. Testing is done on both high and low-resolution images. Proposed encryption technique consists of two levels of diffusion using the actual structure of the DNA. In the first level of diffusion process, we have used DNA encoding and decoding operations to generate DNA sequence of each pixel. The originality of the work is to use a long DNA structure stored in a text file stored on both sender and receiver’s end to improve the performance of the proposed method. In this initial level of diffusion, DNA sequences are generated for each pixe-land in each of the DNA sequence. Index values are obtained by employing a search operation on the DNA structure. This index values are further modified and ready to be used for next diffusion process. In the second level diffusion, a highly chaotic logistic map is iterated to generate sequences and is employed to extract the chaotic values to form the cipher images. The correlation coefficient analysis, Histogram analysis, Entropy analysis, NPCR, and UACI exhibit significant results. Therefore; the proposed technique can play an important role in the security of low-resolution medical images as well as other visible highly sensitive images.
{"title":"An Enhanced Security in Medical Image Encryption Based on Multi-level Chaotic DNA Diffusion","authors":"Mousumi Gupta, Snehashish Bhattacharjee, Biswajoy Chatterjee","doi":"10.18178/joig.11.2.153-160","DOIUrl":"https://doi.org/10.18178/joig.11.2.153-160","url":null,"abstract":"A novel medical image encryption technique has been proposed based on the features of DNA encodingdecoding in combination with Logistic map approach. The approach is proven for encryption of highly sensitive medical images with 100 percent integrity or negligible data loss. Testing is done on both high and low-resolution images. Proposed encryption technique consists of two levels of diffusion using the actual structure of the DNA. In the first level of diffusion process, we have used DNA encoding and decoding operations to generate DNA sequence of each pixel. The originality of the work is to use a long DNA structure stored in a text file stored on both sender and receiver’s end to improve the performance of the proposed method. In this initial level of diffusion, DNA sequences are generated for each pixe-land in each of the DNA sequence. Index values are obtained by employing a search operation on the DNA structure. This index values are further modified and ready to be used for next diffusion process. In the second level diffusion, a highly chaotic logistic map is iterated to generate sequences and is employed to extract the chaotic values to form the cipher images. The correlation coefficient analysis, Histogram analysis, Entropy analysis, NPCR, and UACI exhibit significant results. Therefore; the proposed technique can play an important role in the security of low-resolution medical images as well as other visible highly sensitive images.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84970956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.18178/joig.11.2.161-169
Nauman Ullah Gilal, Samah Ahmed Mustapha Ahmed, J. Schneider, Mowafa J Househ, Marco Agus
We present an end-to-end framework for real-time melanoma detection on mole images acquired with mobile devices equipped with off-the-shelf magnifying lens. We trained our models by using transfer learning through EfficientNet convolutional neural networks by using public domain The International Skin Imaging Collaboration (ISIC)-2019 and ISIC-2020 datasets. To reduce the class imbalance issue, we integrated the standard training pipeline with schemes for effective data balance using oversampling and iterative cleaning through loss ranking. We also introduce a blurring scheme able to emulate the aberrations produced by commonly available magnifying lenses, and a novel loss function incorporating the difference in cost between false positive (melanoma misses) and false negative (benignant misses) predictions. Through preliminary experiments, we show that our framework is able to create models for real-time mobile inference with controlled tradeoff between false positive rate and false negative rate. The obtained performances on ISIC-2020 dataset are the following: accuracy 96.9%, balanced accuracy 98%, ROCAUC=0.98, benign recall 97.7%, malignant recall 97.2%.
{"title":"Mobile Dermatoscopy: Class Imbalance Management Based on Blurring Augmentation, Iterative Refining and Cost-Weighted Recall Loss","authors":"Nauman Ullah Gilal, Samah Ahmed Mustapha Ahmed, J. Schneider, Mowafa J Househ, Marco Agus","doi":"10.18178/joig.11.2.161-169","DOIUrl":"https://doi.org/10.18178/joig.11.2.161-169","url":null,"abstract":"We present an end-to-end framework for real-time melanoma detection on mole images acquired with mobile devices equipped with off-the-shelf magnifying lens. We trained our models by using transfer learning through EfficientNet convolutional neural networks by using public domain The International Skin Imaging Collaboration (ISIC)-2019 and ISIC-2020 datasets. To reduce the class imbalance issue, we integrated the standard training pipeline with schemes for effective data balance using oversampling and iterative cleaning through loss ranking. We also introduce a blurring scheme able to emulate the aberrations produced by commonly available magnifying lenses, and a novel loss function incorporating the difference in cost between false positive (melanoma misses) and false negative (benignant misses) predictions. Through preliminary experiments, we show that our framework is able to create models for real-time mobile inference with controlled tradeoff between false positive rate and false negative rate. The obtained performances on ISIC-2020 dataset are the following: accuracy 96.9%, balanced accuracy 98%, ROCAUC=0.98, benign recall 97.7%, malignant recall 97.2%.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82984587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.18178/joig.11.2.104-114
Yining Yang, Vuksanovic Branislav, Hongjie Ma
Different parts of our face contribute to overall facial expressions, such as anger, happiness and sadness in distinct ways. This paper investigates the degree of importance of different human face parts to the accuracy of Facial Expression Recognition (FER). In the context of machine learning, FER refers to a problem where a computer vision system is trained to automatically detect the facial expression from a presented facial image. This is a difficult image classification problem that is not yet fully solved and has received significant attention in recent years, mainly due to the increased number of possible applications in daily life. To establish the extent to which different human face parts contribute to overall facial expression, various sections have been extracted from a set of facial images and then used as inputs into three different FER systems. In terms of the recognition rates for each facial section, this result confirms that various regions of the face have different levels of importance regarding the accuracy rate achieved by an associated FER system.
{"title":"The Performance Analysis of Facial Expression Recognition System Using Local Regions and Features","authors":"Yining Yang, Vuksanovic Branislav, Hongjie Ma","doi":"10.18178/joig.11.2.104-114","DOIUrl":"https://doi.org/10.18178/joig.11.2.104-114","url":null,"abstract":"Different parts of our face contribute to overall facial expressions, such as anger, happiness and sadness in distinct ways. This paper investigates the degree of importance of different human face parts to the accuracy of Facial Expression Recognition (FER). In the context of machine learning, FER refers to a problem where a computer vision system is trained to automatically detect the facial expression from a presented facial image. This is a difficult image classification problem that is not yet fully solved and has received significant attention in recent years, mainly due to the increased number of possible applications in daily life. To establish the extent to which different human face parts contribute to overall facial expression, various sections have been extracted from a set of facial images and then used as inputs into three different FER systems. In terms of the recognition rates for each facial section, this result confirms that various regions of the face have different levels of importance regarding the accuracy rate achieved by an associated FER system.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90231457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.18178/joig.11.2.185-194
Yang Ai, Yinhao Li, Yen-Wei Chen, Panyanat Aonpong, Xianhua Han
Non-small Cell Lung Cancer (NSCLC) is one of the malignant tumors with the highest morbidity and mortality. The postoperative recurrence rate in patients with NSCLC is high, which directly endangers the lives of patients. In recent years, many studies have used Computed Tomography (CT) images to predict NSCLC recurrence. Although this approach is inexpensive, it has low prediction accuracy. Gene expression data can achieve high accuracy. However, gene acquisition is expensive and invasive, and cannot meet the recurrence prediction requirements of all patients. In this study, a low-cost, high-accuracy residual multilayer perceptrons-based genotype-guided recurrence (ResMLP_GGR) prediction method is proposed that uses a gene estimation model to guide recurrence prediction. First, a gene estimation model is proposed to construct a mapping function of mixed features (handcrafted and deep features) and gene data to estimate the genetic information of tumor heterogeneity. Then, from gene estimation data obtained using a regression model, representations related to recurrence are learned to realize NSCLC recurrence prediction. In the testing phase, NSCLC recurrence prediction can be achieved with only CT images. The experimental results show that the proposed method has few parameters, strong generalization ability, and is suitable for small datasets. Compared with state-of-the-art methods, the proposed method significantly improves recurrence prediction accuracy by 3.39% with only 1% of parameters.
{"title":"ResMLP_GGR: Residual Multilayer Perceptrons- Based Genotype-Guided Recurrence Prediction of Non-small Cell Lung Cancer","authors":"Yang Ai, Yinhao Li, Yen-Wei Chen, Panyanat Aonpong, Xianhua Han","doi":"10.18178/joig.11.2.185-194","DOIUrl":"https://doi.org/10.18178/joig.11.2.185-194","url":null,"abstract":"Non-small Cell Lung Cancer (NSCLC) is one of the malignant tumors with the highest morbidity and mortality. The postoperative recurrence rate in patients with NSCLC is high, which directly endangers the lives of patients. In recent years, many studies have used Computed Tomography (CT) images to predict NSCLC recurrence. Although this approach is inexpensive, it has low prediction accuracy. Gene expression data can achieve high accuracy. However, gene acquisition is expensive and invasive, and cannot meet the recurrence prediction requirements of all patients. In this study, a low-cost, high-accuracy residual multilayer perceptrons-based genotype-guided recurrence (ResMLP_GGR) prediction method is proposed that uses a gene estimation model to guide recurrence prediction. First, a gene estimation model is proposed to construct a mapping function of mixed features (handcrafted and deep features) and gene data to estimate the genetic information of tumor heterogeneity. Then, from gene estimation data obtained using a regression model, representations related to recurrence are learned to realize NSCLC recurrence prediction. In the testing phase, NSCLC recurrence prediction can be achieved with only CT images. The experimental results show that the proposed method has few parameters, strong generalization ability, and is suitable for small datasets. Compared with state-of-the-art methods, the proposed method significantly improves recurrence prediction accuracy by 3.39% with only 1% of parameters.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"71 9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83616358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.18178/joig.11.2.212-226
D. Mahanta, D. Hazarika, V. K. Nath
A new feature descriptor called local bit-plane domain 3D oriented arbitrary and circular shaped scanning pattern (LB-3D-OACSP) is proposed for biomedical image retrieval in this study. Unlike the circular, zigzag and other scanning structures, the LB-3D-OACSP descriptor calculates the association between reference-pixel and its surrounding pixels in bit-plane domain in a 3D plane using multi-directional 3D arbitrary and 3D circular shaped scanning patterns. In contrast to other scanning structures, the multi-directional 3-D arbitrary shaped patterns provide more continual angular dissimilarity among the sampling positions with the aim to capture more frequent changes in the local textures. The total of sixteen number of discriminative 3D arbitrary and 3D circular shaped patterns oriented in various directions are applied on a 3D plane constructed using respective bit-planes of three multi-scale images which ensures the maximum extraction of inter-scale geometrical information across the scales which very effectively captures not only the uniform but non-uniform textures too. The multi-scale images are generated by processing the input image with Gaussian filter banks generating three multi-scale images. The LB-3D-OACSP descriptor is able to capture most of the very fine to coarse image textures through encoding of bit-planes. The performance of LB-3D-OACSP is tested on three popular biomedical image databases both in terms of % average retrieval precision (ARP) and % average retrieval recall (ARR). The experiments demonstrate an encouraging enhancement in terms of %ARP and %ARR as compared to many existing state of the art descriptors.
{"title":"Local Bit-Plane Domain 3D Oriented Arbitrary and Circular Shaped Scanning Patterns for Bio-Medical Image Retrieval","authors":"D. Mahanta, D. Hazarika, V. K. Nath","doi":"10.18178/joig.11.2.212-226","DOIUrl":"https://doi.org/10.18178/joig.11.2.212-226","url":null,"abstract":"A new feature descriptor called local bit-plane domain 3D oriented arbitrary and circular shaped scanning pattern (LB-3D-OACSP) is proposed for biomedical image retrieval in this study. Unlike the circular, zigzag and other scanning structures, the LB-3D-OACSP descriptor calculates the association between reference-pixel and its surrounding pixels in bit-plane domain in a 3D plane using multi-directional 3D arbitrary and 3D circular shaped scanning patterns. In contrast to other scanning structures, the multi-directional 3-D arbitrary shaped patterns provide more continual angular dissimilarity among the sampling positions with the aim to capture more frequent changes in the local textures. The total of sixteen number of discriminative 3D arbitrary and 3D circular shaped patterns oriented in various directions are applied on a 3D plane constructed using respective bit-planes of three multi-scale images which ensures the maximum extraction of inter-scale geometrical information across the scales which very effectively captures not only the uniform but non-uniform textures too. The multi-scale images are generated by processing the input image with Gaussian filter banks generating three multi-scale images. The LB-3D-OACSP descriptor is able to capture most of the very fine to coarse image textures through encoding of bit-planes. The performance of LB-3D-OACSP is tested on three popular biomedical image databases both in terms of % average retrieval precision (ARP) and % average retrieval recall (ARR). The experiments demonstrate an encouraging enhancement in terms of %ARP and %ARR as compared to many existing state of the art descriptors.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91355411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.18178/joig.11.2.121-126
Y. Fujita, Taisei Tanaka, Tomoki Hori, Y. Hamamoto
The purpose of our study is to detect cracks accurately from asphalt pavement surface images, which includes unexpected objects, non-uniform illumination, and irregularities in surfaces. We propose a method to construct a classification Convolutional Neural Network (CNN) model based on the pre-trained U-Net, which is a well-known semantic segmentation model. Firstly, we train the U-Net with a limited amount of the asphalt pavement surface dataset which is obtained by a Mobile Mapping System (MMS). Then, we use the encoder of the trained U-Net as a feature extractor to construct a classification model, and train by fine-tuning. We describe comparative evaluations with VGG11, ResNet18, and GoogLeNet as well-known models constructed by transfer learning using ImageNet, which is a large size dataset of natural images. Experimental results show our model has high classification performance, compared to the other models constructed by transfer learning using ImageNet. Our method is effective to construct convolutional neural network model using the limited training dataset.
{"title":"Classification Model Based on U-Net for Crack Detection from Asphalt Pavement Images","authors":"Y. Fujita, Taisei Tanaka, Tomoki Hori, Y. Hamamoto","doi":"10.18178/joig.11.2.121-126","DOIUrl":"https://doi.org/10.18178/joig.11.2.121-126","url":null,"abstract":"The purpose of our study is to detect cracks accurately from asphalt pavement surface images, which includes unexpected objects, non-uniform illumination, and irregularities in surfaces. We propose a method to construct a classification Convolutional Neural Network (CNN) model based on the pre-trained U-Net, which is a well-known semantic segmentation model. Firstly, we train the U-Net with a limited amount of the asphalt pavement surface dataset which is obtained by a Mobile Mapping System (MMS). Then, we use the encoder of the trained U-Net as a feature extractor to construct a classification model, and train by fine-tuning. We describe comparative evaluations with VGG11, ResNet18, and GoogLeNet as well-known models constructed by transfer learning using ImageNet, which is a large size dataset of natural images. Experimental results show our model has high classification performance, compared to the other models constructed by transfer learning using ImageNet. Our method is effective to construct convolutional neural network model using the limited training dataset.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84582636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.18178/joig.11.2.170-177
D. H. Hareva, A. Sebastian, A. Mitra, Irene A. Lazarusli, C. Haryani
Visually impaired people can use smartphone navigation applications to arrive at their destination. However, those applications do not provide the means to detect moving objects. This paper presents an Android application that uses the smartphone’s camera to provide real-time object detection. Images captured by the camera are to be processed digitally. The model then predicts objects from the processed image using a Convolutional Neural Network (CNN) stored in mobile devices. The model returns bounding boxes for each of the detected objects. These bounding boxes are used to calculate the distance from the object to the camera. The model used is SSD MobileNet V1, which is pre-trained using the Common Objects in Context (COCO) dataset. System testing is divided into object distance and accuracy testing. Results show that the margin of error for calculating distance is below 5% for distances under 8 meters. The mean average precision is 0.9393, while the mean average recall is 0.4479. It means that the system can recognize moving objects through the embedded model in a smartphone.
视障人士可以使用智能手机导航应用程序到达目的地。然而,这些应用程序不提供检测移动物体的手段。本文介绍了一个Android应用程序,该应用程序使用智能手机的摄像头提供实时目标检测。照相机捕捉到的图像要进行数字处理。然后,该模型使用存储在移动设备中的卷积神经网络(CNN)从处理后的图像中预测物体。该模型为每个检测到的对象返回边界框。这些边界框用于计算物体到相机的距离。使用的模型是SSD MobileNet V1,该模型使用COCO (Common Objects in Context)数据集进行预训练。系统测试分为对象距离测试和精度测试。结果表明,对于8米以下的距离,计算距离的误差范围在5%以下。平均精密度为0.9393,平均召回率为0.4479。这意味着该系统可以通过智能手机中的嵌入式模型识别移动物体。
{"title":"Mobile Surveillance Siren Against Moving Object as a Support System for Blind PeopleMobile Surveillance Siren Against Moving Object as a Support System for Blind People","authors":"D. H. Hareva, A. Sebastian, A. Mitra, Irene A. Lazarusli, C. Haryani","doi":"10.18178/joig.11.2.170-177","DOIUrl":"https://doi.org/10.18178/joig.11.2.170-177","url":null,"abstract":"Visually impaired people can use smartphone navigation applications to arrive at their destination. However, those applications do not provide the means to detect moving objects. This paper presents an Android application that uses the smartphone’s camera to provide real-time object detection. Images captured by the camera are to be processed digitally. The model then predicts objects from the processed image using a Convolutional Neural Network (CNN) stored in mobile devices. The model returns bounding boxes for each of the detected objects. These bounding boxes are used to calculate the distance from the object to the camera. The model used is SSD MobileNet V1, which is pre-trained using the Common Objects in Context (COCO) dataset. System testing is divided into object distance and accuracy testing. Results show that the margin of error for calculating distance is below 5% for distances under 8 meters. The mean average precision is 0.9393, while the mean average recall is 0.4479. It means that the system can recognize moving objects through the embedded model in a smartphone.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83167665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01DOI: 10.18178/joig.11.2.204-211
Abdullah M. Algamdi, Hammam M. AlGhamdi
During the past decade, artificial intelligence technologies, especially Computer Vision (CV) technologies, have experienced significant breakthroughs due to the development of deep learning models, particularly Convolutional Neural Networks (CNNs). These networks have been utilized in various research applications, including astronomy, marine sciences, security, medicine, and pathology. In this paper, we build a framework utilizing CV technology to support decision-makers during the Hajj season. We collect and process real-time/instant images from multiple aircraft/drones, which follow the pilgrims while they move around the holy sites during Hajj. These images, taken by multiple drones, are processed in two stages. First, we purify the images collected from multiple drones and stitch them, producing one image that captures the whole holy site. Second, the stitched image is processed using a CNN to provide two pieces of information: (1) the number of buses and ambulances; and (2) the estimated count of pilgrims. This information could help decision-makers identify needs for further support during Hajj, such as logistics services, security personnel, and/or ambulances.
{"title":"Instant Counting & Vehicle Detection during Hajj Using Drones","authors":"Abdullah M. Algamdi, Hammam M. AlGhamdi","doi":"10.18178/joig.11.2.204-211","DOIUrl":"https://doi.org/10.18178/joig.11.2.204-211","url":null,"abstract":"During the past decade, artificial intelligence technologies, especially Computer Vision (CV) technologies, have experienced significant breakthroughs due to the development of deep learning models, particularly Convolutional Neural Networks (CNNs). These networks have been utilized in various research applications, including astronomy, marine sciences, security, medicine, and pathology. In this paper, we build a framework utilizing CV technology to support decision-makers during the Hajj season. We collect and process real-time/instant images from multiple aircraft/drones, which follow the pilgrims while they move around the holy sites during Hajj. These images, taken by multiple drones, are processed in two stages. First, we purify the images collected from multiple drones and stitch them, producing one image that captures the whole holy site. Second, the stitched image is processed using a CNN to provide two pieces of information: (1) the number of buses and ambulances; and (2) the estimated count of pilgrims. This information could help decision-makers identify needs for further support during Hajj, such as logistics services, security personnel, and/or ambulances.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"88 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72488825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}