The benefits of using an automatic dietary assessment system for accompanying diabetes patients and prediabetic persons to control the risk factor also referred to as the obesity “pandemic” are now widely proven and accepted. However, there is no universal solution as eating habits of people are dependent on context and culture. This project is the cornerstone for future works of researchers and health professionals in the field of automatic dietary assessment of Mauritian dishes. We propose a process to produce a food dataset for Mauritian dishes using the Generative Adversarial Network (GAN) and a fine Convolutional Neural Network (CNN) model for identifying Mauritian food dishes. The outputs and findings of this research can be used in the process of automatic calorie calculation and food recommendation, primarily using ubiquitous devices like mobile phones via mobile applications. Using the Adam optimizer with carefully fixed hyper-parameters, we achieved an Accuracy of 95.66% and Loss of 3.5% as concerns the recognition task.
{"title":"DLMDish: Using Applied Deep Learning and Computer Vision to Automatically Classify Mauritian Dishes","authors":"Mohammud Shaad Ally Toofanee, Omar Boudraa, Karim Tamine","doi":"10.1142/s0219467825500457","DOIUrl":"https://doi.org/10.1142/s0219467825500457","url":null,"abstract":"The benefits of using an automatic dietary assessment system for accompanying diabetes patients and prediabetic persons to control the risk factor also referred to as the obesity “pandemic” are now widely proven and accepted. However, there is no universal solution as eating habits of people are dependent on context and culture. This project is the cornerstone for future works of researchers and health professionals in the field of automatic dietary assessment of Mauritian dishes. We propose a process to produce a food dataset for Mauritian dishes using the Generative Adversarial Network (GAN) and a fine Convolutional Neural Network (CNN) model for identifying Mauritian food dishes. The outputs and findings of this research can be used in the process of automatic calorie calculation and food recommendation, primarily using ubiquitous devices like mobile phones via mobile applications. Using the Adam optimizer with carefully fixed hyper-parameters, we achieved an Accuracy of 95.66% and Loss of 3.5% as concerns the recognition task.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139262226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1142/s0219467825500469
N. P. Jayasri, R. Aruna
In the past decades, there is a wide increase in the number of people affected by diabetes, a chronic illness. Early prediction of diabetes is still a challenging problem as it requires clear and sound datasets for a precise prediction. In this era of ubiquitous information technology, big data helps to collect a large amount of information regarding healthcare systems. Due to explosion in the generation of digital data, selecting appropriate data for analysis still remains a complex task. Moreover, missing values and insignificantly labeled data restrict the prediction accuracy. In this context, with the aim of improving the quality of the dataset, missing values are effectively handled by three major phases such as (1) pre-processing, (2) feature extraction, and (3) classification. Pre-processing involves outlier rejection and filling missing values. Feature extraction is done by a principal component analysis (PCA) and finally, the precise prediction of diabetes is accomplished by implementing an effective distance adaptive-KNN (DA-KNN) classifier. The experiments were conducted using Pima Indian Diabetes (PID) dataset and the performance of the proposed model was compared with the state-of-the-art models. The analysis after implementation shows that the proposed model outperforms the conventional models such as NB, SVM, KNN, and RF in terms of accuracy and ROC.
{"title":"A Novel Diabetes Prediction Model in Big Data Healthcare Systems Using DA-KNN Technique","authors":"N. P. Jayasri, R. Aruna","doi":"10.1142/s0219467825500469","DOIUrl":"https://doi.org/10.1142/s0219467825500469","url":null,"abstract":"In the past decades, there is a wide increase in the number of people affected by diabetes, a chronic illness. Early prediction of diabetes is still a challenging problem as it requires clear and sound datasets for a precise prediction. In this era of ubiquitous information technology, big data helps to collect a large amount of information regarding healthcare systems. Due to explosion in the generation of digital data, selecting appropriate data for analysis still remains a complex task. Moreover, missing values and insignificantly labeled data restrict the prediction accuracy. In this context, with the aim of improving the quality of the dataset, missing values are effectively handled by three major phases such as (1) pre-processing, (2) feature extraction, and (3) classification. Pre-processing involves outlier rejection and filling missing values. Feature extraction is done by a principal component analysis (PCA) and finally, the precise prediction of diabetes is accomplished by implementing an effective distance adaptive-KNN (DA-KNN) classifier. The experiments were conducted using Pima Indian Diabetes (PID) dataset and the performance of the proposed model was compared with the state-of-the-art models. The analysis after implementation shows that the proposed model outperforms the conventional models such as NB, SVM, KNN, and RF in terms of accuracy and ROC.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1142/s0219467825500421
Peng Zhang, Yangyang Miao, Dongri Shan, Shuang Li
In the 2D–3D registration process, due to the differences in CAD model sizes, models may be too large to be displayed in full or too small to have obvious features. To address these problems, previous studies have attempted to adjust parameters manually; however, this is imprecise and frequently requires multiple adjustments. Thus, in this paper, we propose the model self-adaptive display of fixed-distance and maximization (MSDFM) algorithm. The uncertainty of the model display affects the storage costs of pose images, and pose images themselves occupy a large amount of storage space; thus, we also propose the storage optimization based on the region of interest (SOBROI) method to reduce storage costs. The proposed MSDFM algorithm retrieves the farthest point of the model and then searches for the maximum pose image of the model display through the farthest point. The algorithm then changes the projection angle until the maximum pose image is maximized within the window. The pose images are then cropped by the proposed SOBROI method to reduce storage costs. By labeling the connected domains in the binary pose image, an external rectangle of the largest connected domain is applied to crop the pose image, which is then saved in the lossless compression portable network image (PNG) format. Experimental results demonstrate that the proposed MSDFM algorithm can automatically adjust models of different sizes. In addition, the results show that the proposed SOBROI method reduces the storage space of pose libraries by at least 89.66% and at most 99.86%.
{"title":"Model Self-Adaptive Display for 2D–3D Registration","authors":"Peng Zhang, Yangyang Miao, Dongri Shan, Shuang Li","doi":"10.1142/s0219467825500421","DOIUrl":"https://doi.org/10.1142/s0219467825500421","url":null,"abstract":"In the 2D–3D registration process, due to the differences in CAD model sizes, models may be too large to be displayed in full or too small to have obvious features. To address these problems, previous studies have attempted to adjust parameters manually; however, this is imprecise and frequently requires multiple adjustments. Thus, in this paper, we propose the model self-adaptive display of fixed-distance and maximization (MSDFM) algorithm. The uncertainty of the model display affects the storage costs of pose images, and pose images themselves occupy a large amount of storage space; thus, we also propose the storage optimization based on the region of interest (SOBROI) method to reduce storage costs. The proposed MSDFM algorithm retrieves the farthest point of the model and then searches for the maximum pose image of the model display through the farthest point. The algorithm then changes the projection angle until the maximum pose image is maximized within the window. The pose images are then cropped by the proposed SOBROI method to reduce storage costs. By labeling the connected domains in the binary pose image, an external rectangle of the largest connected domain is applied to crop the pose image, which is then saved in the lossless compression portable network image (PNG) format. Experimental results demonstrate that the proposed MSDFM algorithm can automatically adjust models of different sizes. In addition, the results show that the proposed SOBROI method reduces the storage space of pose libraries by at least 89.66% and at most 99.86%.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1142/s021946782550041x
Manoj Krishna Bhosale, Shubhangi B. Patil, Babasaheb B Patil
Recently, the increased count of surveillance cameras has manipulated the demand criteria for a higher effective video coding process. Moreover, the ultra-modern video coding standards have appreciably enhanced the efficiency of video coding, which has been developed for gathering common videos over surveillance videos. Various vehicle recognition techniques have provided a challenging and promising role in computer vision applications and intelligent transport systems. In this case, most of the conventional techniques have recognized the vehicles along with bounding box depiction and thus failed to provide the proper locations of the vehicles. Moreover, the position details have been vigorous in terms of various real-time applications trajectory of vehicle’s motion on the road as well as movement estimation. Numerous advancements have been offered throughout the years in the traffic surveillance area through the random propagation of intelligent traffic video surveillance techniques. The ultimate goal of this model is to design and enhance intelligent traffic video surveillance techniques by utilizing the developed deep learning techniques. This model has the ability to handle video traffic surveillance by measuring the speed of vehicles and recognizing their number plates. The initial process is considered the data collection, in which the traffic video data is gathered. Furthermore, the vehicle detection is performed by the Optimized YOLOv3 deep learning classifier, in which the parameter optimization is performed by using the newly recommended Modified Coyote Spider Monkey Optimization (MCSMO), which is the combination of Coyote Optimization Algorithm (COA) and Spider Monkey Optimization (SMO). Furthermore, the speed of the vehicles has been measured from each frame. For high-speed vehicles, the same Optimized YOLOv3 is used for detecting the number plates. Once the number plates are detected, plate character recognition is performed by the Improved Convolutional Neural Network (ICNN). Thus, the information about the vehicles, which are violating the traffic rules, can be conveyed to the vehicle owners and Regional Transport Office (RTO) to take further action to avoid accidents. From the experimental validation, the accuracy and precision rate of the designed method achieves 97.53% and 96.83%. Experimental results show that the proposed method achieves enhanced performance when compared to conventional models, thus ensuring the security of the transport system.
{"title":"Automatic Video Traffic Surveillance System with Number Plate Character Recognition Using Hybrid Optimization-Based YOLOv3 and Improved CNN","authors":"Manoj Krishna Bhosale, Shubhangi B. Patil, Babasaheb B Patil","doi":"10.1142/s021946782550041x","DOIUrl":"https://doi.org/10.1142/s021946782550041x","url":null,"abstract":"Recently, the increased count of surveillance cameras has manipulated the demand criteria for a higher effective video coding process. Moreover, the ultra-modern video coding standards have appreciably enhanced the efficiency of video coding, which has been developed for gathering common videos over surveillance videos. Various vehicle recognition techniques have provided a challenging and promising role in computer vision applications and intelligent transport systems. In this case, most of the conventional techniques have recognized the vehicles along with bounding box depiction and thus failed to provide the proper locations of the vehicles. Moreover, the position details have been vigorous in terms of various real-time applications trajectory of vehicle’s motion on the road as well as movement estimation. Numerous advancements have been offered throughout the years in the traffic surveillance area through the random propagation of intelligent traffic video surveillance techniques. The ultimate goal of this model is to design and enhance intelligent traffic video surveillance techniques by utilizing the developed deep learning techniques. This model has the ability to handle video traffic surveillance by measuring the speed of vehicles and recognizing their number plates. The initial process is considered the data collection, in which the traffic video data is gathered. Furthermore, the vehicle detection is performed by the Optimized YOLOv3 deep learning classifier, in which the parameter optimization is performed by using the newly recommended Modified Coyote Spider Monkey Optimization (MCSMO), which is the combination of Coyote Optimization Algorithm (COA) and Spider Monkey Optimization (SMO). Furthermore, the speed of the vehicles has been measured from each frame. For high-speed vehicles, the same Optimized YOLOv3 is used for detecting the number plates. Once the number plates are detected, plate character recognition is performed by the Improved Convolutional Neural Network (ICNN). Thus, the information about the vehicles, which are violating the traffic rules, can be conveyed to the vehicle owners and Regional Transport Office (RTO) to take further action to avoid accidents. From the experimental validation, the accuracy and precision rate of the designed method achieves 97.53% and 96.83%. Experimental results show that the proposed method achieves enhanced performance when compared to conventional models, thus ensuring the security of the transport system.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1142/s0219467825500433
B. N. Nithya, D. Evangelin Geetha, Manish Kumar
In today’s world, the web is a prominent communication channel. However, the variety of strategies available on event-based social networks (EBSNs) also makes it difficult for users to choose the events that are most relevant to their interests. In EBSNs, searching for events that better fit a user’s preferences are necessary, complex, and time consuming due to a large number of events available. Toward this end, a community-contributed data event recommender framework assists consumers in filtering daunting information and providing appropriate feedback, making EBSNs more appealing to them. A novel customized event recommendation system that uses the “multi-criteria decision-making (MCDM) approach” to rank the events is introduced in this research work. The calculation of categorical, geographical, temporal, and social factors is carried out in the proposed model, and the recommendation list is ordered using a contextual post-filtering system that includes Weight and Filter. To align the recommendation list, a new probabilistic weight model is added. To be more constructive, this model incorporates metaheuristic reasoning, which will fine-tune the probabilistic threshold value using a new hybrid algorithm. The proposed hybrid model is referred to as Beetle Swarm Hybridized Elephant Herding Algorithm (BSH-EHA), which combines the algorithms like Elephant Herding Optimization (EHO) and Beetle Swarm Optimization (BSO) Algorithm. Finally, the top recommendations will be given to the users.
{"title":"Metaheuristic-Assisted Contextual Post-Filtering Method for Event Recommendation System","authors":"B. N. Nithya, D. Evangelin Geetha, Manish Kumar","doi":"10.1142/s0219467825500433","DOIUrl":"https://doi.org/10.1142/s0219467825500433","url":null,"abstract":"In today’s world, the web is a prominent communication channel. However, the variety of strategies available on event-based social networks (EBSNs) also makes it difficult for users to choose the events that are most relevant to their interests. In EBSNs, searching for events that better fit a user’s preferences are necessary, complex, and time consuming due to a large number of events available. Toward this end, a community-contributed data event recommender framework assists consumers in filtering daunting information and providing appropriate feedback, making EBSNs more appealing to them. A novel customized event recommendation system that uses the “multi-criteria decision-making (MCDM) approach” to rank the events is introduced in this research work. The calculation of categorical, geographical, temporal, and social factors is carried out in the proposed model, and the recommendation list is ordered using a contextual post-filtering system that includes Weight and Filter. To align the recommendation list, a new probabilistic weight model is added. To be more constructive, this model incorporates metaheuristic reasoning, which will fine-tune the probabilistic threshold value using a new hybrid algorithm. The proposed hybrid model is referred to as Beetle Swarm Hybridized Elephant Herding Algorithm (BSH-EHA), which combines the algorithms like Elephant Herding Optimization (EHO) and Beetle Swarm Optimization (BSO) Algorithm. Finally, the top recommendations will be given to the users.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-03DOI: 10.1142/s0219467825500391
Jampani Ravi, R. Narmadha
Imaging technology has undergone extensive development since 1985, which has practical implications concerning civilians and the military. Recently, image fusion is an emerging tool in image processing that is adept at handling diverse image types. Those image types include remote sensing images and medical images for upgrading the information through the fusion of visible and infrared light based on the analysis of the materials used. Presently, image fusion has been mainly performed in the medical industry. With the constraints of diagnosing a disease via single-modality images, image fusion could be able to meet up the prerequisites. Hence, it is further suggested to develop a fusion model using different modalities of images. The major intention of the fusion approach is to achieve higher contrast, enhancing the quality of images and apparent knowledge. The validation of fused images is done by three factors that are: (i) fused images should sustain significant information from the source images, (ii) artifacts must not be present in the fused images and (iii) the flaws of noise and misregistration must be evaded. Multimodal image fusion is one of the developing domains through the implementation of robust algorithms and standard transformation techniques. Thus, this work aims to analyze the different contributions of various multimodal image fusion models using intelligent methods. It will provide an extensive literature survey on image fusion techniques and comparison of those methods with the existing ones. It will offer various state-of-the-arts of image fusion methods with their diverse levels as well as their pros and cons. This review will give an introduction to the current fusion methods, modes of multimodal fusion, the datasets used and performance metrics; and finally, it also discusses the challenges of multimodal image fusion methods and the future research trends.
{"title":"A Systematic Literature Review on Multimodal Image Fusion Models With Challenges and Future Research Trends","authors":"Jampani Ravi, R. Narmadha","doi":"10.1142/s0219467825500391","DOIUrl":"https://doi.org/10.1142/s0219467825500391","url":null,"abstract":"Imaging technology has undergone extensive development since 1985, which has practical implications concerning civilians and the military. Recently, image fusion is an emerging tool in image processing that is adept at handling diverse image types. Those image types include remote sensing images and medical images for upgrading the information through the fusion of visible and infrared light based on the analysis of the materials used. Presently, image fusion has been mainly performed in the medical industry. With the constraints of diagnosing a disease via single-modality images, image fusion could be able to meet up the prerequisites. Hence, it is further suggested to develop a fusion model using different modalities of images. The major intention of the fusion approach is to achieve higher contrast, enhancing the quality of images and apparent knowledge. The validation of fused images is done by three factors that are: (i) fused images should sustain significant information from the source images, (ii) artifacts must not be present in the fused images and (iii) the flaws of noise and misregistration must be evaded. Multimodal image fusion is one of the developing domains through the implementation of robust algorithms and standard transformation techniques. Thus, this work aims to analyze the different contributions of various multimodal image fusion models using intelligent methods. It will provide an extensive literature survey on image fusion techniques and comparison of those methods with the existing ones. It will offer various state-of-the-arts of image fusion methods with their diverse levels as well as their pros and cons. This review will give an introduction to the current fusion methods, modes of multimodal fusion, the datasets used and performance metrics; and finally, it also discusses the challenges of multimodal image fusion methods and the future research trends.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.1142/s0219467825500366
Yuze Zhou, Liwei Yan, Qi Zhu
As a kind of promising biometric technology, multispectral palmprint recognition methods have attracted increasing attention in security due to their high recognition accuracy and ease of use. It is worth noting that although multispectral palmprint data contains rich complementary information, multispectral palmprint recognition methods are still vulnerable to adversarial attacks. Even if only one image of a spectrum is attacked, it can have a catastrophic impact on the recognition results. Therefore, we propose a robustness-enhanced multispectral palmprint recognition method, including a model interpretability-based adversarial detection module and a robust multispectral fusion module. Inspired by the model interpretation technology, we found there is a large difference between clean palmprint and adversarial examples after CAM visualization. Using visualized images to build an adversarial detector can lead to better detection results. Finally, the weights of clean images and adversarial examples in the fusion layer are dynamically adjusted to obtain the correct recognition results. Experiments have shown that our method can make full use of the image features that are not attacked and can effectively improve the robustness of the model.
{"title":"Adversarial Detection and Fusion Method for Multispectral Palmprint Recognition","authors":"Yuze Zhou, Liwei Yan, Qi Zhu","doi":"10.1142/s0219467825500366","DOIUrl":"https://doi.org/10.1142/s0219467825500366","url":null,"abstract":"As a kind of promising biometric technology, multispectral palmprint recognition methods have attracted increasing attention in security due to their high recognition accuracy and ease of use. It is worth noting that although multispectral palmprint data contains rich complementary information, multispectral palmprint recognition methods are still vulnerable to adversarial attacks. Even if only one image of a spectrum is attacked, it can have a catastrophic impact on the recognition results. Therefore, we propose a robustness-enhanced multispectral palmprint recognition method, including a model interpretability-based adversarial detection module and a robust multispectral fusion module. Inspired by the model interpretation technology, we found there is a large difference between clean palmprint and adversarial examples after CAM visualization. Using visualized images to build an adversarial detector can lead to better detection results. Finally, the weights of clean images and adversarial examples in the fusion layer are dynamically adjusted to obtain the correct recognition results. Experiments have shown that our method can make full use of the image features that are not attacked and can effectively improve the robustness of the model.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135410857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.1142/s0219467823990012
{"title":"Author Index (Volume 23)","authors":"","doi":"10.1142/s0219467823990012","DOIUrl":"https://doi.org/10.1142/s0219467823990012","url":null,"abstract":"","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139300900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.1142/s0219467825500378
Zhipeng Li, Jun Wang, Lijun Hua, Honghui Liu, Wenli Song
Automatic tracking of three-dimensional (3D) human motion pose has the potential to provide corresponding technical support in various fields. However, existing methods for tracking human motion pose suffer from significant errors, long tracking times and suboptimal tracking results. To address these issues, an automatic tracking method for 3D human motion pose using contrastive learning is proposed. By using the feature parameters of 3D human motion poses, threshold variation parameters of 3D human motion poses are computed. The golden section is introduced to transform the threshold variation parameters and extract the features of 3D human motion poses by comparing the feature parameters with the threshold of parameter variation. Under the supervision of contrastive learning, a constraint loss is added to the local–global deep supervision module of contrastive learning to extract local parameters of 3D human motion poses, combined with their local features. After normalizing the 3D human motion pose images, frame differences of the background image are calculated. By constructing an automatic tracking model for 3D human motion poses, automatic tracking of 3D human motion poses is achieved. Experimental results demonstrate that the highest tracking lag is 9%, there is no deviation in node tracking, the pixel contrast is maintained above 90% and only 6 sub-blocks have detail loss. This indicates that the proposed method effectively tracks 3D human motion poses, tracks all the nodes, achieves high accuracy in automatic tracking and produces good tracking results.
{"title":"Automatic Tracking Method for 3D Human Motion Pose Using Contrastive Learning","authors":"Zhipeng Li, Jun Wang, Lijun Hua, Honghui Liu, Wenli Song","doi":"10.1142/s0219467825500378","DOIUrl":"https://doi.org/10.1142/s0219467825500378","url":null,"abstract":"Automatic tracking of three-dimensional (3D) human motion pose has the potential to provide corresponding technical support in various fields. However, existing methods for tracking human motion pose suffer from significant errors, long tracking times and suboptimal tracking results. To address these issues, an automatic tracking method for 3D human motion pose using contrastive learning is proposed. By using the feature parameters of 3D human motion poses, threshold variation parameters of 3D human motion poses are computed. The golden section is introduced to transform the threshold variation parameters and extract the features of 3D human motion poses by comparing the feature parameters with the threshold of parameter variation. Under the supervision of contrastive learning, a constraint loss is added to the local–global deep supervision module of contrastive learning to extract local parameters of 3D human motion poses, combined with their local features. After normalizing the 3D human motion pose images, frame differences of the background image are calculated. By constructing an automatic tracking model for 3D human motion poses, automatic tracking of 3D human motion poses is achieved. Experimental results demonstrate that the highest tracking lag is 9%, there is no deviation in node tracking, the pixel contrast is maintained above 90% and only 6 sub-blocks have detail loss. This indicates that the proposed method effectively tracks 3D human motion poses, tracks all the nodes, achieves high accuracy in automatic tracking and produces good tracking results.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135371900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-20DOI: 10.1142/s0219467825500445
Tesfayee Meshu Welde, Lejian Liao
Visual Question Answering (VQA) is a language-based method for analyzing images, which is highly helpful in assisting people with visual impairment. The VQA system requires a demonstrated holistic image understanding and conducts basic reasoning tasks concerning the image in contrast to the specific task-oriented models that simply classifies object into categories. Thus, VQA systems contribute to the growth of Artificial Intelligence (AI) technology by answering open-ended, arbitrary questions about a given image. In addition, VQA is also used to assess the system’s ability by conducting Visual Turing Test (VTT). However, because of the inability to generate the essential datasets and being incapable of evaluating the systems due to flawlessness and bias, the VQA system is incapable of assessing the system’s overall efficiency. This is seen as a possible and significant limitation of the VQA system. This, in turn, has a negative impact on the progress of performance observed in VQA algorithms. Currently, the research on the VQA system is dealing with more specific sub-problems, which include counting in VQA systems. The counting sub-problem of VQA is a more sophisticated one, riddling with several challenging questions, especially when it comes to complex counting questions such as those that demand object identifications along with detection of objects attributes and positional reasoning. The pooling operation that is considered to perform an attention mechanism in VQA is found to degrade the counting performance. A number of algorithms have been developed to address this issue. In this paper, we provide a comprehensive survey of counting techniques in the VQA system that is developed especially for answering questions such as “How many?”. However, the performance progress achieved by this system is still not satisfactory due to bias that occurs in the datasets from the way we phrase the questions and because of weak evaluation metrics. In the future, fully-fledged architecture, wide-size datasets with complex counting questions and a detailed breakdown in categories, and strong evaluation metrics for evaluating the ability of the system to answer complex counting questions, such as positional and comparative reasoning will be executed.
{"title":"Counting in Visual Question Answering: Methods, Datasets, and Future Work","authors":"Tesfayee Meshu Welde, Lejian Liao","doi":"10.1142/s0219467825500445","DOIUrl":"https://doi.org/10.1142/s0219467825500445","url":null,"abstract":"Visual Question Answering (VQA) is a language-based method for analyzing images, which is highly helpful in assisting people with visual impairment. The VQA system requires a demonstrated holistic image understanding and conducts basic reasoning tasks concerning the image in contrast to the specific task-oriented models that simply classifies object into categories. Thus, VQA systems contribute to the growth of Artificial Intelligence (AI) technology by answering open-ended, arbitrary questions about a given image. In addition, VQA is also used to assess the system’s ability by conducting Visual Turing Test (VTT). However, because of the inability to generate the essential datasets and being incapable of evaluating the systems due to flawlessness and bias, the VQA system is incapable of assessing the system’s overall efficiency. This is seen as a possible and significant limitation of the VQA system. This, in turn, has a negative impact on the progress of performance observed in VQA algorithms. Currently, the research on the VQA system is dealing with more specific sub-problems, which include counting in VQA systems. The counting sub-problem of VQA is a more sophisticated one, riddling with several challenging questions, especially when it comes to complex counting questions such as those that demand object identifications along with detection of objects attributes and positional reasoning. The pooling operation that is considered to perform an attention mechanism in VQA is found to degrade the counting performance. A number of algorithms have been developed to address this issue. In this paper, we provide a comprehensive survey of counting techniques in the VQA system that is developed especially for answering questions such as “How many?”. However, the performance progress achieved by this system is still not satisfactory due to bias that occurs in the datasets from the way we phrase the questions and because of weak evaluation metrics. In the future, fully-fledged architecture, wide-size datasets with complex counting questions and a detailed breakdown in categories, and strong evaluation metrics for evaluating the ability of the system to answer complex counting questions, such as positional and comparative reasoning will be executed.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135618886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}