International Journal of Image and Graphics最新文献_第2页

Automatic Video Traffic Surveillance System with Number Plate Character Recognition Using Hybrid Optimization-Based YOLOv3 and Improved CNN 基于混合优化YOLOv3和改进CNN的车牌字符识别自动视频交通监控系统

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-11-03 DOI: 10.1142/s021946782550041x

Manoj Krishna Bhosale, Shubhangi B. Patil, Babasaheb B Patil

Recently, the increased count of surveillance cameras has manipulated the demand criteria for a higher effective video coding process. Moreover, the ultra-modern video coding standards have appreciably enhanced the efficiency of video coding, which has been developed for gathering common videos over surveillance videos. Various vehicle recognition techniques have provided a challenging and promising role in computer vision applications and intelligent transport systems. In this case, most of the conventional techniques have recognized the vehicles along with bounding box depiction and thus failed to provide the proper locations of the vehicles. Moreover, the position details have been vigorous in terms of various real-time applications trajectory of vehicle’s motion on the road as well as movement estimation. Numerous advancements have been offered throughout the years in the traffic surveillance area through the random propagation of intelligent traffic video surveillance techniques. The ultimate goal of this model is to design and enhance intelligent traffic video surveillance techniques by utilizing the developed deep learning techniques. This model has the ability to handle video traffic surveillance by measuring the speed of vehicles and recognizing their number plates. The initial process is considered the data collection, in which the traffic video data is gathered. Furthermore, the vehicle detection is performed by the Optimized YOLOv3 deep learning classifier, in which the parameter optimization is performed by using the newly recommended Modified Coyote Spider Monkey Optimization (MCSMO), which is the combination of Coyote Optimization Algorithm (COA) and Spider Monkey Optimization (SMO). Furthermore, the speed of the vehicles has been measured from each frame. For high-speed vehicles, the same Optimized YOLOv3 is used for detecting the number plates. Once the number plates are detected, plate character recognition is performed by the Improved Convolutional Neural Network (ICNN). Thus, the information about the vehicles, which are violating the traffic rules, can be conveyed to the vehicle owners and Regional Transport Office (RTO) to take further action to avoid accidents. From the experimental validation, the accuracy and precision rate of the designed method achieves 97.53% and 96.83%. Experimental results show that the proposed method achieves enhanced performance when compared to conventional models, thus ensuring the security of the transport system.

近年来，随着监控摄像机数量的不断增加，对高效视频编码处理的需求标准发生了变化。此外，超现代的视频编码标准大大提高了视频编码的效率，这是为了收集普通视频而不是监控视频而开发的。各种车辆识别技术在计算机视觉应用和智能交通系统中具有挑战性和前景。在这种情况下，大多数传统技术都是通过边界框描述来识别车辆，因此无法提供车辆的正确位置。此外，在各种实时应用中，车辆在道路上的运动轨迹以及运动估计的位置细节都得到了有力的处理。近年来，智能交通视频监控技术的随机传播在交通监控领域取得了许多进展。该模型的最终目标是利用已开发的深度学习技术设计和增强智能交通视频监控技术。该模型具有通过测量车辆速度和识别车牌来处理视频交通监控的能力。最初的过程被认为是数据采集，在这个过程中采集交通视频数据。此外，车辆检测采用优化后的YOLOv3深度学习分类器，其中参数优化采用新推荐的Coyote optimization Algorithm (COA)和Spider Monkey optimization (SMO)相结合的Modified Coyote Spider Monkey optimization (MCSMO)进行。此外，从每帧开始测量车辆的速度。对于高速车辆，同样的优化YOLOv3用于检测号牌。一旦检测到车牌，车牌字符识别由改进的卷积神经网络(ICNN)进行。因此，有关违反交通规则的车辆的信息可以传达给车主和区域运输办事处(RTO)，以采取进一步行动，避免事故的发生。通过实验验证，所设计方法的准确度和精密度分别达到97.53%和96.83%。实验结果表明，与传统模型相比，该方法的性能得到了提高，从而保证了运输系统的安全性。

{"title":"Automatic Video Traffic Surveillance System with Number Plate Character Recognition Using Hybrid Optimization-Based YOLOv3 and Improved CNN","authors":"Manoj Krishna Bhosale, Shubhangi B. Patil, Babasaheb B Patil","doi":"10.1142/s021946782550041x","DOIUrl":"https://doi.org/10.1142/s021946782550041x","url":null,"abstract":"Recently, the increased count of surveillance cameras has manipulated the demand criteria for a higher effective video coding process. Moreover, the ultra-modern video coding standards have appreciably enhanced the efficiency of video coding, which has been developed for gathering common videos over surveillance videos. Various vehicle recognition techniques have provided a challenging and promising role in computer vision applications and intelligent transport systems. In this case, most of the conventional techniques have recognized the vehicles along with bounding box depiction and thus failed to provide the proper locations of the vehicles. Moreover, the position details have been vigorous in terms of various real-time applications trajectory of vehicle’s motion on the road as well as movement estimation. Numerous advancements have been offered throughout the years in the traffic surveillance area through the random propagation of intelligent traffic video surveillance techniques. The ultimate goal of this model is to design and enhance intelligent traffic video surveillance techniques by utilizing the developed deep learning techniques. This model has the ability to handle video traffic surveillance by measuring the speed of vehicles and recognizing their number plates. The initial process is considered the data collection, in which the traffic video data is gathered. Furthermore, the vehicle detection is performed by the Optimized YOLOv3 deep learning classifier, in which the parameter optimization is performed by using the newly recommended Modified Coyote Spider Monkey Optimization (MCSMO), which is the combination of Coyote Optimization Algorithm (COA) and Spider Monkey Optimization (SMO). Furthermore, the speed of the vehicles has been measured from each frame. For high-speed vehicles, the same Optimized YOLOv3 is used for detecting the number plates. Once the number plates are detected, plate character recognition is performed by the Improved Convolutional Neural Network (ICNN). Thus, the information about the vehicles, which are violating the traffic rules, can be conveyed to the vehicle owners and Regional Transport Office (RTO) to take further action to avoid accidents. From the experimental validation, the accuracy and precision rate of the designed method achieves 97.53% and 96.83%. Experimental results show that the proposed method achieves enhanced performance when compared to conventional models, thus ensuring the security of the transport system.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"28 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Metaheuristic-Assisted Contextual Post-Filtering Method for Event Recommendation System 事件推荐系统的元启发式辅助语境后过滤方法

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-11-03 DOI: 10.1142/s0219467825500433

B. N. Nithya, D. Evangelin Geetha, Manish Kumar

In today’s world, the web is a prominent communication channel. However, the variety of strategies available on event-based social networks (EBSNs) also makes it difficult for users to choose the events that are most relevant to their interests. In EBSNs, searching for events that better fit a user’s preferences are necessary, complex, and time consuming due to a large number of events available. Toward this end, a community-contributed data event recommender framework assists consumers in filtering daunting information and providing appropriate feedback, making EBSNs more appealing to them. A novel customized event recommendation system that uses the “multi-criteria decision-making (MCDM) approach” to rank the events is introduced in this research work. The calculation of categorical, geographical, temporal, and social factors is carried out in the proposed model, and the recommendation list is ordered using a contextual post-filtering system that includes Weight and Filter. To align the recommendation list, a new probabilistic weight model is added. To be more constructive, this model incorporates metaheuristic reasoning, which will fine-tune the probabilistic threshold value using a new hybrid algorithm. The proposed hybrid model is referred to as Beetle Swarm Hybridized Elephant Herding Algorithm (BSH-EHA), which combines the algorithms like Elephant Herding Optimization (EHO) and Beetle Swarm Optimization (BSO) Algorithm. Finally, the top recommendations will be given to the users.

在当今世界，网络是一个重要的沟通渠道。然而，基于事件的社交网络(EBSNs)上可用的各种策略也使得用户很难选择与他们的兴趣最相关的事件。在ebsn中，搜索更符合用户偏好的事件是必要的、复杂的、耗时的，因为有大量的事件可用。为此，社区贡献的数据事件推荐框架帮助消费者过滤令人生畏的信息并提供适当的反馈，从而使ebsn对他们更具吸引力。本文提出了一种采用“多准则决策”方法对事件进行排序的定制事件推荐系统。在提出的模型中进行了分类、地理、时间和社会因素的计算，并使用包含Weight和Filter的上下文后过滤系统对推荐列表进行排序。为了对齐推荐列表，添加了一个新的概率权重模型。为了更具建设性，该模型结合了元启发式推理，它将使用一种新的混合算法微调概率阈值。所提出的混合模型被称为甲虫群杂交象群算法(BSH-EHA)，它结合了象群优化算法(EHO)和甲虫群优化算法(BSO)。最后，将给用户提供最佳推荐。

{"title":"Metaheuristic-Assisted Contextual Post-Filtering Method for Event Recommendation System","authors":"B. N. Nithya, D. Evangelin Geetha, Manish Kumar","doi":"10.1142/s0219467825500433","DOIUrl":"https://doi.org/10.1142/s0219467825500433","url":null,"abstract":"In today’s world, the web is a prominent communication channel. However, the variety of strategies available on event-based social networks (EBSNs) also makes it difficult for users to choose the events that are most relevant to their interests. In EBSNs, searching for events that better fit a user’s preferences are necessary, complex, and time consuming due to a large number of events available. Toward this end, a community-contributed data event recommender framework assists consumers in filtering daunting information and providing appropriate feedback, making EBSNs more appealing to them. A novel customized event recommendation system that uses the “multi-criteria decision-making (MCDM) approach” to rank the events is introduced in this research work. The calculation of categorical, geographical, temporal, and social factors is carried out in the proposed model, and the recommendation list is ordered using a contextual post-filtering system that includes Weight and Filter. To align the recommendation list, a new probabilistic weight model is added. To be more constructive, this model incorporates metaheuristic reasoning, which will fine-tune the probabilistic threshold value using a new hybrid algorithm. The proposed hybrid model is referred to as Beetle Swarm Hybridized Elephant Herding Algorithm (BSH-EHA), which combines the algorithms like Elephant Herding Optimization (EHO) and Beetle Swarm Optimization (BSO) Algorithm. Finally, the top recommendations will be given to the users.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"28 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Systematic Literature Review on Multimodal Image Fusion Models With Challenges and Future Research Trends 多模态图像融合模型的系统文献综述、挑战和未来研究趋势

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-11-03 DOI: 10.1142/s0219467825500391

Jampani Ravi, R. Narmadha

Imaging technology has undergone extensive development since 1985, which has practical implications concerning civilians and the military. Recently, image fusion is an emerging tool in image processing that is adept at handling diverse image types. Those image types include remote sensing images and medical images for upgrading the information through the fusion of visible and infrared light based on the analysis of the materials used. Presently, image fusion has been mainly performed in the medical industry. With the constraints of diagnosing a disease via single-modality images, image fusion could be able to meet up the prerequisites. Hence, it is further suggested to develop a fusion model using different modalities of images. The major intention of the fusion approach is to achieve higher contrast, enhancing the quality of images and apparent knowledge. The validation of fused images is done by three factors that are: (i) fused images should sustain significant information from the source images, (ii) artifacts must not be present in the fused images and (iii) the flaws of noise and misregistration must be evaded. Multimodal image fusion is one of the developing domains through the implementation of robust algorithms and standard transformation techniques. Thus, this work aims to analyze the different contributions of various multimodal image fusion models using intelligent methods. It will provide an extensive literature survey on image fusion techniques and comparison of those methods with the existing ones. It will offer various state-of-the-arts of image fusion methods with their diverse levels as well as their pros and cons. This review will give an introduction to the current fusion methods, modes of multimodal fusion, the datasets used and performance metrics; and finally, it also discusses the challenges of multimodal image fusion methods and the future research trends.

自1985年以来，成像技术得到了广泛的发展，具有民用和军事方面的实际意义。近年来，图像融合是一种新兴的图像处理工具，它擅长处理不同类型的图像。这些图像类型包括遥感图像和医学图像，通过对所用材料的分析，通过融合可见光和红外光来升级信息。目前，图像融合主要应用于医疗行业。由于单模态图像诊断疾病的局限性，图像融合可以满足这一条件。因此，进一步建议开发一个融合模型使用不同的模式的图像。融合方法的主要目的是实现更高的对比度，增强图像质量和表观知识。融合图像的验证由三个因素完成，即:(i)融合图像应保持源图像的重要信息，(ii)融合图像中不得存在伪影，(iii)必须避免噪声和错配的缺陷。通过实现鲁棒算法和标准变换技术，多模态图像融合是一个发展中的领域。因此，本工作旨在利用智能方法分析各种多模态图像融合模型的不同贡献。它将提供广泛的文献综述图像融合技术和比较这些方法与现有的。它将提供各种最新的图像融合方法，它们具有不同的水平以及它们的优缺点。本综述将介绍当前的融合方法，多模态融合模式，使用的数据集和性能指标;最后，讨论了多模态图像融合方法面临的挑战和未来的研究趋势。

{"title":"A Systematic Literature Review on Multimodal Image Fusion Models With Challenges and Future Research Trends","authors":"Jampani Ravi, R. Narmadha","doi":"10.1142/s0219467825500391","DOIUrl":"https://doi.org/10.1142/s0219467825500391","url":null,"abstract":"Imaging technology has undergone extensive development since 1985, which has practical implications concerning civilians and the military. Recently, image fusion is an emerging tool in image processing that is adept at handling diverse image types. Those image types include remote sensing images and medical images for upgrading the information through the fusion of visible and infrared light based on the analysis of the materials used. Presently, image fusion has been mainly performed in the medical industry. With the constraints of diagnosing a disease via single-modality images, image fusion could be able to meet up the prerequisites. Hence, it is further suggested to develop a fusion model using different modalities of images. The major intention of the fusion approach is to achieve higher contrast, enhancing the quality of images and apparent knowledge. The validation of fused images is done by three factors that are: (i) fused images should sustain significant information from the source images, (ii) artifacts must not be present in the fused images and (iii) the flaws of noise and misregistration must be evaded. Multimodal image fusion is one of the developing domains through the implementation of robust algorithms and standard transformation techniques. Thus, this work aims to analyze the different contributions of various multimodal image fusion models using intelligent methods. It will provide an extensive literature survey on image fusion techniques and comparison of those methods with the existing ones. It will offer various state-of-the-arts of image fusion methods with their diverse levels as well as their pros and cons. This review will give an introduction to the current fusion methods, modes of multimodal fusion, the datasets used and performance metrics; and finally, it also discusses the challenges of multimodal image fusion methods and the future research trends.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"28 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adversarial Detection and Fusion Method for Multispectral Palmprint Recognition 多光谱掌纹识别的对抗检测与融合方法

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-11-01 DOI: 10.1142/s0219467825500366

Yuze Zhou, Liwei Yan, Qi Zhu

As a kind of promising biometric technology, multispectral palmprint recognition methods have attracted increasing attention in security due to their high recognition accuracy and ease of use. It is worth noting that although multispectral palmprint data contains rich complementary information, multispectral palmprint recognition methods are still vulnerable to adversarial attacks. Even if only one image of a spectrum is attacked, it can have a catastrophic impact on the recognition results. Therefore, we propose a robustness-enhanced multispectral palmprint recognition method, including a model interpretability-based adversarial detection module and a robust multispectral fusion module. Inspired by the model interpretation technology, we found there is a large difference between clean palmprint and adversarial examples after CAM visualization. Using visualized images to build an adversarial detector can lead to better detection results. Finally, the weights of clean images and adversarial examples in the fusion layer are dynamically adjusted to obtain the correct recognition results. Experiments have shown that our method can make full use of the image features that are not attacked and can effectively improve the robustness of the model.

作为一种极具发展前景的生物识别技术，多光谱掌纹识别方法以其较高的识别精度和易用性在安全性方面受到越来越多的关注。值得注意的是，尽管多光谱掌纹数据包含丰富的互补信息，但多光谱掌纹识别方法仍然容易受到对抗性攻击。即使只有光谱中的一张图像受到攻击，也会对识别结果产生灾难性的影响。因此，我们提出了一种鲁棒性增强的多光谱掌纹识别方法，包括基于模型可解释性的对抗检测模块和鲁棒性多光谱融合模块。受模型解释技术的启发，我们发现CAM可视化后干净掌纹与对抗掌纹样本之间存在很大差异。使用可视化图像构建对抗检测器可以获得更好的检测结果。最后，动态调整融合层中干净图像和对抗样例的权重，得到正确的识别结果。实验表明，该方法能够充分利用图像中未被攻击的特征，有效提高模型的鲁棒性。

{"title":"Adversarial Detection and Fusion Method for Multispectral Palmprint Recognition","authors":"Yuze Zhou, Liwei Yan, Qi Zhu","doi":"10.1142/s0219467825500366","DOIUrl":"https://doi.org/10.1142/s0219467825500366","url":null,"abstract":"As a kind of promising biometric technology, multispectral palmprint recognition methods have attracted increasing attention in security due to their high recognition accuracy and ease of use. It is worth noting that although multispectral palmprint data contains rich complementary information, multispectral palmprint recognition methods are still vulnerable to adversarial attacks. Even if only one image of a spectrum is attacked, it can have a catastrophic impact on the recognition results. Therefore, we propose a robustness-enhanced multispectral palmprint recognition method, including a model interpretability-based adversarial detection module and a robust multispectral fusion module. Inspired by the model interpretation technology, we found there is a large difference between clean palmprint and adversarial examples after CAM visualization. Using visualized images to build an adversarial detector can lead to better detection results. Finally, the weights of clean images and adversarial examples in the fusion layer are dynamically adjusted to obtain the correct recognition results. Experiments have shown that our method can make full use of the image features that are not attacked and can effectively improve the robustness of the model.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"20 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135410857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Author Index (Volume 23) 作者索引（第 23 卷）

IF 1.6 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-11-01 DOI: 10.1142/s0219467823990012

引用次数: 0

Automatic Tracking Method for 3D Human Motion Pose Using Contrastive Learning 基于对比学习的三维人体运动姿态自动跟踪方法

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-11-01 DOI: 10.1142/s0219467825500378

Zhipeng Li, Jun Wang, Lijun Hua, Honghui Liu, Wenli Song

Automatic tracking of three-dimensional (3D) human motion pose has the potential to provide corresponding technical support in various fields. However, existing methods for tracking human motion pose suffer from significant errors, long tracking times and suboptimal tracking results. To address these issues, an automatic tracking method for 3D human motion pose using contrastive learning is proposed. By using the feature parameters of 3D human motion poses, threshold variation parameters of 3D human motion poses are computed. The golden section is introduced to transform the threshold variation parameters and extract the features of 3D human motion poses by comparing the feature parameters with the threshold of parameter variation. Under the supervision of contrastive learning, a constraint loss is added to the local–global deep supervision module of contrastive learning to extract local parameters of 3D human motion poses, combined with their local features. After normalizing the 3D human motion pose images, frame differences of the background image are calculated. By constructing an automatic tracking model for 3D human motion poses, automatic tracking of 3D human motion poses is achieved. Experimental results demonstrate that the highest tracking lag is 9%, there is no deviation in node tracking, the pixel contrast is maintained above 90% and only 6 sub-blocks have detail loss. This indicates that the proposed method effectively tracks 3D human motion poses, tracks all the nodes, achieves high accuracy in automatic tracking and produces good tracking results.

三维(3D)人体运动姿态的自动跟踪具有在各个领域提供相应技术支持的潜力。然而，现有的人体运动姿态跟踪方法存在误差大、跟踪时间长、跟踪结果不理想等问题。为了解决这些问题，提出了一种基于对比学习的三维人体运动姿态自动跟踪方法。利用三维人体运动姿态的特征参数，计算出三维人体运动姿态的阈值变化参数。引入黄金分割对阈值变化参数进行变换，通过特征参数与参数变化阈值的比较，提取出三维人体运动姿态的特征。在对比学习的监督下，在对比学习的局部-全局深度监督模块中加入约束损失，结合人体三维运动姿态的局部特征提取局部参数。对三维人体运动姿态图像进行归一化后，计算背景图像的帧差。通过构建人体三维运动姿态自动跟踪模型，实现人体三维运动姿态的自动跟踪。实验结果表明，该算法的最大跟踪滞后为9%，节点跟踪无偏差，像素对比度保持在90%以上，只有6个子块存在细节丢失。这表明该方法能有效地跟踪人体三维运动姿态，跟踪所有节点，自动跟踪精度高，跟踪效果好。

{"title":"Automatic Tracking Method for 3D Human Motion Pose Using Contrastive Learning","authors":"Zhipeng Li, Jun Wang, Lijun Hua, Honghui Liu, Wenli Song","doi":"10.1142/s0219467825500378","DOIUrl":"https://doi.org/10.1142/s0219467825500378","url":null,"abstract":"Automatic tracking of three-dimensional (3D) human motion pose has the potential to provide corresponding technical support in various fields. However, existing methods for tracking human motion pose suffer from significant errors, long tracking times and suboptimal tracking results. To address these issues, an automatic tracking method for 3D human motion pose using contrastive learning is proposed. By using the feature parameters of 3D human motion poses, threshold variation parameters of 3D human motion poses are computed. The golden section is introduced to transform the threshold variation parameters and extract the features of 3D human motion poses by comparing the feature parameters with the threshold of parameter variation. Under the supervision of contrastive learning, a constraint loss is added to the local–global deep supervision module of contrastive learning to extract local parameters of 3D human motion poses, combined with their local features. After normalizing the 3D human motion pose images, frame differences of the background image are calculated. By constructing an automatic tracking model for 3D human motion poses, automatic tracking of 3D human motion poses is achieved. Experimental results demonstrate that the highest tracking lag is 9%, there is no deviation in node tracking, the pixel contrast is maintained above 90% and only 6 sub-blocks have detail loss. This indicates that the proposed method effectively tracks 3D human motion poses, tracks all the nodes, achieves high accuracy in automatic tracking and produces good tracking results.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"229 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135371900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Counting in Visual Question Answering: Methods, Datasets, and Future Work 视觉问答中的计数:方法、数据集和未来工作

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-10-20 DOI: 10.1142/s0219467825500445

Tesfayee Meshu Welde, Lejian Liao

Visual Question Answering (VQA) is a language-based method for analyzing images, which is highly helpful in assisting people with visual impairment. The VQA system requires a demonstrated holistic image understanding and conducts basic reasoning tasks concerning the image in contrast to the specific task-oriented models that simply classifies object into categories. Thus, VQA systems contribute to the growth of Artificial Intelligence (AI) technology by answering open-ended, arbitrary questions about a given image. In addition, VQA is also used to assess the system’s ability by conducting Visual Turing Test (VTT). However, because of the inability to generate the essential datasets and being incapable of evaluating the systems due to flawlessness and bias, the VQA system is incapable of assessing the system’s overall efficiency. This is seen as a possible and significant limitation of the VQA system. This, in turn, has a negative impact on the progress of performance observed in VQA algorithms. Currently, the research on the VQA system is dealing with more specific sub-problems, which include counting in VQA systems. The counting sub-problem of VQA is a more sophisticated one, riddling with several challenging questions, especially when it comes to complex counting questions such as those that demand object identifications along with detection of objects attributes and positional reasoning. The pooling operation that is considered to perform an attention mechanism in VQA is found to degrade the counting performance. A number of algorithms have been developed to address this issue. In this paper, we provide a comprehensive survey of counting techniques in the VQA system that is developed especially for answering questions such as “How many?”. However, the performance progress achieved by this system is still not satisfactory due to bias that occurs in the datasets from the way we phrase the questions and because of weak evaluation metrics. In the future, fully-fledged architecture, wide-size datasets with complex counting questions and a detailed breakdown in categories, and strong evaluation metrics for evaluating the ability of the system to answer complex counting questions, such as positional and comparative reasoning will be executed.

视觉问答(Visual Question answer, VQA)是一种基于语言的图像分析方法，对视障人士有很大的帮助。VQA系统需要一个完整的图像理解，并对图像进行基本的推理任务，而不是简单地将对象分类的特定任务导向模型。因此，VQA系统通过回答关于给定图像的开放式、任意问题，促进了人工智能(AI)技术的发展。此外，VQA还用于通过进行视觉图灵测试(VTT)来评估系统的能力。然而，由于无法生成必要的数据集，并且由于缺陷和偏见而无法评估系统，VQA系统无法评估系统的整体效率。这被视为VQA系统的一个可能的和重要的限制。这反过来又会对VQA算法中观察到的性能进展产生负面影响。目前，对VQA系统的研究主要集中在更具体的子问题上，其中包括VQA系统中的计数问题。VQA的计数子问题是一个更复杂的问题，包含几个具有挑战性的问题，特别是当涉及到复杂的计数问题时，例如那些需要识别对象以及检测对象属性和位置推理的问题。在VQA中，池化操作被认为执行了一种注意机制，结果发现池化操作降低了计数性能。已经开发了许多算法来解决这个问题。在本文中，我们提供了VQA系统中计数技术的全面调查，该系统是专门为回答诸如“有多少?”之类的问题而开发的。然而，由于我们表达问题的方式和薄弱的评估指标在数据集中发生的偏差，该系统取得的性能进展仍然不令人满意。在未来，将会执行完全成熟的架构、具有复杂计数问题和类别详细细分的大尺寸数据集，以及用于评估系统回答复杂计数问题(如位置和比较推理)的能力的强大评估指标。

{"title":"Counting in Visual Question Answering: Methods, Datasets, and Future Work","authors":"Tesfayee Meshu Welde, Lejian Liao","doi":"10.1142/s0219467825500445","DOIUrl":"https://doi.org/10.1142/s0219467825500445","url":null,"abstract":"Visual Question Answering (VQA) is a language-based method for analyzing images, which is highly helpful in assisting people with visual impairment. The VQA system requires a demonstrated holistic image understanding and conducts basic reasoning tasks concerning the image in contrast to the specific task-oriented models that simply classifies object into categories. Thus, VQA systems contribute to the growth of Artificial Intelligence (AI) technology by answering open-ended, arbitrary questions about a given image. In addition, VQA is also used to assess the system’s ability by conducting Visual Turing Test (VTT). However, because of the inability to generate the essential datasets and being incapable of evaluating the systems due to flawlessness and bias, the VQA system is incapable of assessing the system’s overall efficiency. This is seen as a possible and significant limitation of the VQA system. This, in turn, has a negative impact on the progress of performance observed in VQA algorithms. Currently, the research on the VQA system is dealing with more specific sub-problems, which include counting in VQA systems. The counting sub-problem of VQA is a more sophisticated one, riddling with several challenging questions, especially when it comes to complex counting questions such as those that demand object identifications along with detection of objects attributes and positional reasoning. The pooling operation that is considered to perform an attention mechanism in VQA is found to degrade the counting performance. A number of algorithms have been developed to address this issue. In this paper, we provide a comprehensive survey of counting techniques in the VQA system that is developed especially for answering questions such as “How many?”. However, the performance progress achieved by this system is still not satisfactory due to bias that occurs in the datasets from the way we phrase the questions and because of weak evaluation metrics. In the future, fully-fledged architecture, wide-size datasets with complex counting questions and a detailed breakdown in categories, and strong evaluation metrics for evaluating the ability of the system to answer complex counting questions, such as positional and comparative reasoning will be executed.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135618886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Comprehensive Review of GAN-Based Denoising Models for Low-Dose Computed Tomography Images 基于gan的低剂量计算机断层图像去噪模型综述

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-10-14 DOI: 10.1142/s0219467825500305

Manbir Sandhu, Sumit Kushwaha, Tanvi Arora

Computed Tomography (CT) offers great visualization of the intricate internal body structures. To protect a patient from the potential radiation-related health risks, the acquisition of CT images should adhere to the “as low as reasonably allowed” (ALARA) standard. However, the acquired Low-dose CT (LDCT) images are inadvertently corrupted by artifacts and noise during the processes of acquisition, storage, and transmission, degrading the visual quality of the image and also causing the loss of image features and relevant information. Most recently, generative adversarial network (GAN) models based on deep learning (DL) have demonstrated ground-breaking performance to minimize image noise while maintaining high image quality. These models’ ability to adapt to uncertain noise distributions and representation-learning ability makes them highly desirable for the denoising of CT images. The state-of-the-art GANs used for LDCT image denoising have been comprehensively reviewed in this research paper. The aim of this paper is to highlight the potential of DL-based GAN for CT dose optimization and present future scope of research in the domain of LDCT image denoising.

计算机断层扫描(CT)提供了复杂的身体内部结构的可视化。为了保护患者免受潜在的辐射相关健康风险，CT图像的获取应遵循“尽可能低的合理允许”(ALARA)标准。然而，所获得的低剂量CT (LDCT)图像在采集、存储和传输过程中会被伪影和噪声破坏，降低图像的视觉质量，也会导致图像特征和相关信息的丢失。最近，基于深度学习(DL)的生成对抗网络(GAN)模型已经展示了突破性的性能，可以在保持高图像质量的同时最小化图像噪声。这些模型对不确定噪声分布的适应能力和表示学习能力使其成为CT图像去噪的理想选择。本文对目前用于LDCT图像去噪的gan进行了综述。本文的目的是强调基于dl的GAN在CT剂量优化方面的潜力，并提出未来在LDCT图像去噪领域的研究范围。

{"title":"A Comprehensive Review of GAN-Based Denoising Models for Low-Dose Computed Tomography Images","authors":"Manbir Sandhu, Sumit Kushwaha, Tanvi Arora","doi":"10.1142/s0219467825500305","DOIUrl":"https://doi.org/10.1142/s0219467825500305","url":null,"abstract":"Computed Tomography (CT) offers great visualization of the intricate internal body structures. To protect a patient from the potential radiation-related health risks, the acquisition of CT images should adhere to the “as low as reasonably allowed” (ALARA) standard. However, the acquired Low-dose CT (LDCT) images are inadvertently corrupted by artifacts and noise during the processes of acquisition, storage, and transmission, degrading the visual quality of the image and also causing the loss of image features and relevant information. Most recently, generative adversarial network (GAN) models based on deep learning (DL) have demonstrated ground-breaking performance to minimize image noise while maintaining high image quality. These models’ ability to adapt to uncertain noise distributions and representation-learning ability makes them highly desirable for the denoising of CT images. The state-of-the-art GANs used for LDCT image denoising have been comprehensively reviewed in this research paper. The aim of this paper is to highlight the potential of DL-based GAN for CT dose optimization and present future scope of research in the domain of LDCT image denoising.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135803438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Content-Based Image Retrieval (CBIR): Using Combined Color and Texture Features (TriCLR and HistLBP) 基于内容的图像检索(CBIR):结合颜色和纹理特征(TriCLR和HistLBP)

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-09-26 DOI: 10.1142/s0219467825500214

P. John Bosco, S. Janakiraman

Content-Based Image Retrieval (CBIR) is a broad research field in the current digital world. This paper focuses on content-based image retrieval based on visual properties, consisting of high-level semantic information. The variation between low-level and high-level features is identified as a semantic gap. The semantic gap is the biggest problem in CBIR. The visual characteristics are extracted from low-level features such as color, texture and shape. The low-level feature increases CBIRs performance level. The paper mainly focuses on an image retrieval system called combined color (TriCLR) (RGB, YCbCr, and [Formula: see text]) with the histogram of texture features in LBP (HistLBP), which, is known as a hybrid of three colors (TriCLR) with Histogram of LBP (TriCLR and HistLBP). The study also discusses the hybrid method in light of low-level features. Finally, the hybrid approach uses the (TriCLR and HistLBP) algorithm, which provides a new solution to the CBIR system that is better than the existing methods.

基于内容的图像检索(CBIR)是当今数字世界一个广泛的研究领域。本文主要研究基于视觉属性的基于内容的图像检索，视觉属性由高级语义信息组成。低级特征和高级特征之间的差异被认为是语义差距。语义缺口是CBIR中最大的问题。视觉特征是从颜色、纹理和形状等底层特征中提取出来的。低级特性提高了CBIRs的性能水平。本文主要研究了一种基于LBP纹理特征直方图(HistLBP)的组合颜色(TriCLR) (RGB、YCbCr和[公式:见文])的图像检索系统，称为混合三色(TriCLR)和LBP直方图(TriCLR和HistLBP)。本文还针对低层次特征对混合方法进行了探讨。最后，该混合方法采用了(TriCLR和HistLBP)算法，为CBIR系统提供了一种优于现有方法的新解决方案。

引用次数: 0

Deep Ensemble of Classifiers for Alzheimer’s Disease Detection with Optimal Feature Set 基于最优特征集的阿尔茨海默病深度集成分类器检测

Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

International Journal of Image and Graphics

Pub Date : 2023-09-25 DOI: 10.1142/s0219467825500329

R. S. Rajasree, S. Brintha Rajakumari

Machine learning (ML) and deep learning (DL) techniques can considerably enhance the process of making a precise diagnosis of Alzheimer’s disease (AD). Recently, DL techniques have had considerable success in processing medical data. They still have drawbacks, like large data requirements and a protracted training phase. With this concern, we have developed a novel strategy with the four stages. In the initial stage, the input data is subjected to data imbalance processing, which is crucial for enhancing the accuracy of disease detection. Subsequently, entropy-based, correlation-based, and improved mutual information-based features will be extracted from these pre-processed data. However, the curse of dimensionality will be a serious issue in this work, and hence we have sorted it out via optimization strategy. Particularly, the tunicate updated golden eagle optimization (TUGEO) algorithm is proposed to pick out the optimal features from the extracted features. Finally, the ensemble classifier, which integrates models like CNN, DBN, and improved RNN is modeled to diagnose the diseases by training the selected optimal features from the previous stage. The suggested model achieves the maximum F-measure as 97.67, which is better than the extant methods like [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], respectively. The suggested TUGEO-based AD detection is then compared to the traditional models like various performance matrices including accuracy, sensitivity, specificity, and precision.

机器学习(ML)和深度学习(DL)技术可以大大提高对阿尔茨海默病(AD)的精确诊断过程。最近，深度学习技术在处理医疗数据方面取得了相当大的成功。它们仍然有缺点，比如需要大量的数据和漫长的训练阶段。考虑到这一点，我们制定了一个新的四个阶段的战略。在初始阶段，对输入数据进行数据不平衡处理，这对提高疾病检测的准确性至关重要。随后，从这些预处理数据中提取基于熵、基于关联和改进的互信息特征。然而，维数的诅咒在这项工作中将是一个严重的问题，因此我们通过优化策略对其进行了整理。特别地，提出了被囊更新金鹰优化算法(TUGEO)，从提取的特征中挑选出最优特征。最后，对集成了CNN、DBN和改进RNN等模型的集成分类器进行建模，通过训练从前一阶段选出的最优特征来诊断疾病。该模型的最大f值为97.67，优于现有的[公式:见文]、[公式:见文]、[公式:见文]、[公式:见文]、[公式:见文]、[公式:见文]等方法。然后将建议的基于tugeo的AD检测与传统模型(如各种性能矩阵，包括准确性、灵敏度、特异性和精密度)进行比较。

{"title":"Deep Ensemble of Classifiers for Alzheimer’s Disease Detection with Optimal Feature Set","authors":"R. S. Rajasree, S. Brintha Rajakumari","doi":"10.1142/s0219467825500329","DOIUrl":"https://doi.org/10.1142/s0219467825500329","url":null,"abstract":"Machine learning (ML) and deep learning (DL) techniques can considerably enhance the process of making a precise diagnosis of Alzheimer’s disease (AD). Recently, DL techniques have had considerable success in processing medical data. They still have drawbacks, like large data requirements and a protracted training phase. With this concern, we have developed a novel strategy with the four stages. In the initial stage, the input data is subjected to data imbalance processing, which is crucial for enhancing the accuracy of disease detection. Subsequently, entropy-based, correlation-based, and improved mutual information-based features will be extracted from these pre-processed data. However, the curse of dimensionality will be a serious issue in this work, and hence we have sorted it out via optimization strategy. Particularly, the tunicate updated golden eagle optimization (TUGEO) algorithm is proposed to pick out the optimal features from the extracted features. Finally, the ensemble classifier, which integrates models like CNN, DBN, and improved RNN is modeled to diagnose the diseases by training the selected optimal features from the previous stage. The suggested model achieves the maximum F-measure as 97.67, which is better than the extant methods like [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], respectively. The suggested TUGEO-based AD detection is then compared to the traditional models like various performance matrices including accuracy, sensitivity, specificity, and precision.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135816967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0