首页 > 最新文献

International Journal of Image and Graphics最新文献

英文 中文
DLMDish: Using Applied Deep Learning and Computer Vision to Automatically Classify Mauritian Dishes DLMDish:利用应用深度学习和计算机视觉自动分类毛里求斯菜肴
IF 1.6 Q3 Computer Science Pub Date : 2023-11-18 DOI: 10.1142/s0219467825500457
Mohammud Shaad Ally Toofanee, Omar Boudraa, Karim Tamine
The benefits of using an automatic dietary assessment system for accompanying diabetes patients and prediabetic persons to control the risk factor also referred to as the obesity “pandemic” are now widely proven and accepted. However, there is no universal solution as eating habits of people are dependent on context and culture. This project is the cornerstone for future works of researchers and health professionals in the field of automatic dietary assessment of Mauritian dishes. We propose a process to produce a food dataset for Mauritian dishes using the Generative Adversarial Network (GAN) and a fine Convolutional Neural Network (CNN) model for identifying Mauritian food dishes. The outputs and findings of this research can be used in the process of automatic calorie calculation and food recommendation, primarily using ubiquitous devices like mobile phones via mobile applications. Using the Adam optimizer with carefully fixed hyper-parameters, we achieved an Accuracy of 95.66% and Loss of 3.5% as concerns the recognition task.
使用自动饮食评估系统来帮助糖尿病患者和糖尿病前期患者控制肥胖这一危险因素的好处现已得到广泛证实和认可。然而,由于人们的饮食习惯取决于环境和文化,因此并没有通用的解决方案。本项目是研究人员和卫生专业人员今后在毛里求斯菜肴自动饮食评估领域开展工作的基石。我们提出了一种利用生成对抗网络(GAN)和精细卷积神经网络(CNN)模型生成毛里求斯菜肴数据集的方法,用于识别毛里求斯菜肴。这项研究的成果和发现可用于自动计算卡路里和推荐食物的过程,主要是通过移动应用程序使用手机等无处不在的设备。利用亚当优化器和精心设定的超参数,我们在识别任务中取得了 95.66% 的准确率和 3.5% 的损失率。
{"title":"DLMDish: Using Applied Deep Learning and Computer Vision to Automatically Classify Mauritian Dishes","authors":"Mohammud Shaad Ally Toofanee, Omar Boudraa, Karim Tamine","doi":"10.1142/s0219467825500457","DOIUrl":"https://doi.org/10.1142/s0219467825500457","url":null,"abstract":"The benefits of using an automatic dietary assessment system for accompanying diabetes patients and prediabetic persons to control the risk factor also referred to as the obesity “pandemic” are now widely proven and accepted. However, there is no universal solution as eating habits of people are dependent on context and culture. This project is the cornerstone for future works of researchers and health professionals in the field of automatic dietary assessment of Mauritian dishes. We propose a process to produce a food dataset for Mauritian dishes using the Generative Adversarial Network (GAN) and a fine Convolutional Neural Network (CNN) model for identifying Mauritian food dishes. The outputs and findings of this research can be used in the process of automatic calorie calculation and food recommendation, primarily using ubiquitous devices like mobile phones via mobile applications. Using the Adam optimizer with carefully fixed hyper-parameters, we achieved an Accuracy of 95.66% and Loss of 3.5% as concerns the recognition task.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139262226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Diabetes Prediction Model in Big Data Healthcare Systems Using DA-KNN Technique 基于DA-KNN技术的大数据医疗系统糖尿病预测模型
Q3 Computer Science Pub Date : 2023-11-03 DOI: 10.1142/s0219467825500469
N. P. Jayasri, R. Aruna
In the past decades, there is a wide increase in the number of people affected by diabetes, a chronic illness. Early prediction of diabetes is still a challenging problem as it requires clear and sound datasets for a precise prediction. In this era of ubiquitous information technology, big data helps to collect a large amount of information regarding healthcare systems. Due to explosion in the generation of digital data, selecting appropriate data for analysis still remains a complex task. Moreover, missing values and insignificantly labeled data restrict the prediction accuracy. In this context, with the aim of improving the quality of the dataset, missing values are effectively handled by three major phases such as (1) pre-processing, (2) feature extraction, and (3) classification. Pre-processing involves outlier rejection and filling missing values. Feature extraction is done by a principal component analysis (PCA) and finally, the precise prediction of diabetes is accomplished by implementing an effective distance adaptive-KNN (DA-KNN) classifier. The experiments were conducted using Pima Indian Diabetes (PID) dataset and the performance of the proposed model was compared with the state-of-the-art models. The analysis after implementation shows that the proposed model outperforms the conventional models such as NB, SVM, KNN, and RF in terms of accuracy and ROC.
在过去的几十年里,受糖尿病(一种慢性疾病)影响的人数大幅增加。糖尿病的早期预测仍然是一个具有挑战性的问题,因为它需要清晰可靠的数据集才能进行准确的预测。在这个信息技术无处不在的时代,大数据有助于收集大量关于医疗保健系统的信息。由于数字数据的爆炸式增长,选择合适的数据进行分析仍然是一项复杂的任务。此外,缺失值和标记不显著的数据限制了预测的准确性。在这种情况下,为了提高数据集的质量,缺失值通过三个主要阶段(1)预处理、(2)特征提取和(3)分类进行有效处理。预处理包括排除异常值和填充缺失值。通过主成分分析(PCA)进行特征提取,最后通过实现有效的距离自适应knn (DA-KNN)分类器实现对糖尿病的精确预测。实验使用皮马印第安糖尿病(PID)数据集进行,并将所提出模型的性能与最先进的模型进行了比较。实现后的分析表明,该模型在准确率和ROC方面都优于NB、SVM、KNN和RF等传统模型。
{"title":"A Novel Diabetes Prediction Model in Big Data Healthcare Systems Using DA-KNN Technique","authors":"N. P. Jayasri, R. Aruna","doi":"10.1142/s0219467825500469","DOIUrl":"https://doi.org/10.1142/s0219467825500469","url":null,"abstract":"In the past decades, there is a wide increase in the number of people affected by diabetes, a chronic illness. Early prediction of diabetes is still a challenging problem as it requires clear and sound datasets for a precise prediction. In this era of ubiquitous information technology, big data helps to collect a large amount of information regarding healthcare systems. Due to explosion in the generation of digital data, selecting appropriate data for analysis still remains a complex task. Moreover, missing values and insignificantly labeled data restrict the prediction accuracy. In this context, with the aim of improving the quality of the dataset, missing values are effectively handled by three major phases such as (1) pre-processing, (2) feature extraction, and (3) classification. Pre-processing involves outlier rejection and filling missing values. Feature extraction is done by a principal component analysis (PCA) and finally, the precise prediction of diabetes is accomplished by implementing an effective distance adaptive-KNN (DA-KNN) classifier. The experiments were conducted using Pima Indian Diabetes (PID) dataset and the performance of the proposed model was compared with the state-of-the-art models. The analysis after implementation shows that the proposed model outperforms the conventional models such as NB, SVM, KNN, and RF in terms of accuracy and ROC.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model Self-Adaptive Display for 2D–3D Registration 2D-3D配准模型自适应显示
Q3 Computer Science Pub Date : 2023-11-03 DOI: 10.1142/s0219467825500421
Peng Zhang, Yangyang Miao, Dongri Shan, Shuang Li
In the 2D–3D registration process, due to the differences in CAD model sizes, models may be too large to be displayed in full or too small to have obvious features. To address these problems, previous studies have attempted to adjust parameters manually; however, this is imprecise and frequently requires multiple adjustments. Thus, in this paper, we propose the model self-adaptive display of fixed-distance and maximization (MSDFM) algorithm. The uncertainty of the model display affects the storage costs of pose images, and pose images themselves occupy a large amount of storage space; thus, we also propose the storage optimization based on the region of interest (SOBROI) method to reduce storage costs. The proposed MSDFM algorithm retrieves the farthest point of the model and then searches for the maximum pose image of the model display through the farthest point. The algorithm then changes the projection angle until the maximum pose image is maximized within the window. The pose images are then cropped by the proposed SOBROI method to reduce storage costs. By labeling the connected domains in the binary pose image, an external rectangle of the largest connected domain is applied to crop the pose image, which is then saved in the lossless compression portable network image (PNG) format. Experimental results demonstrate that the proposed MSDFM algorithm can automatically adjust models of different sizes. In addition, the results show that the proposed SOBROI method reduces the storage space of pose libraries by at least 89.66% and at most 99.86%.
在2D-3D配准过程中,由于CAD模型尺寸的差异,模型可能太大而无法完整显示,也可能太小而没有明显的特征。为了解决这些问题,以前的研究试图手动调整参数;然而,这是不精确的,并且经常需要多次调整。因此,在本文中,我们提出了模型自适应显示的固定距离和最大化(MSDFM)算法。模型显示的不确定性影响姿态图像的存储成本,姿态图像本身占用大量的存储空间;因此,我们还提出了基于感兴趣区域(SOBROI)方法的存储优化来降低存储成本。本文提出的MSDFM算法首先从模型的最远点进行检索,然后通过该最远点搜索模型显示的最大位姿图像。然后,该算法改变投影角度,直到最大姿态图像在窗口内最大化。然后使用所提出的SOBROI方法对姿态图像进行裁剪,以降低存储成本。通过标记二值姿态图像中的连通域,应用最大连通域的外部矩形对姿态图像进行裁剪,然后将其保存为无损压缩的便携式网络图像(PNG)格式。实验结果表明,本文提出的MSDFM算法能够自动调整不同尺寸的模型。此外,结果表明,所提出的SOBROI方法将姿态库的存储空间减少了至少89.66%,最多99.86%。
{"title":"Model Self-Adaptive Display for 2D–3D Registration","authors":"Peng Zhang, Yangyang Miao, Dongri Shan, Shuang Li","doi":"10.1142/s0219467825500421","DOIUrl":"https://doi.org/10.1142/s0219467825500421","url":null,"abstract":"In the 2D–3D registration process, due to the differences in CAD model sizes, models may be too large to be displayed in full or too small to have obvious features. To address these problems, previous studies have attempted to adjust parameters manually; however, this is imprecise and frequently requires multiple adjustments. Thus, in this paper, we propose the model self-adaptive display of fixed-distance and maximization (MSDFM) algorithm. The uncertainty of the model display affects the storage costs of pose images, and pose images themselves occupy a large amount of storage space; thus, we also propose the storage optimization based on the region of interest (SOBROI) method to reduce storage costs. The proposed MSDFM algorithm retrieves the farthest point of the model and then searches for the maximum pose image of the model display through the farthest point. The algorithm then changes the projection angle until the maximum pose image is maximized within the window. The pose images are then cropped by the proposed SOBROI method to reduce storage costs. By labeling the connected domains in the binary pose image, an external rectangle of the largest connected domain is applied to crop the pose image, which is then saved in the lossless compression portable network image (PNG) format. Experimental results demonstrate that the proposed MSDFM algorithm can automatically adjust models of different sizes. In addition, the results show that the proposed SOBROI method reduces the storage space of pose libraries by at least 89.66% and at most 99.86%.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Video Traffic Surveillance System with Number Plate Character Recognition Using Hybrid Optimization-Based YOLOv3 and Improved CNN 基于混合优化YOLOv3和改进CNN的车牌字符识别自动视频交通监控系统
Q3 Computer Science Pub Date : 2023-11-03 DOI: 10.1142/s021946782550041x
Manoj Krishna Bhosale, Shubhangi B. Patil, Babasaheb B Patil
Recently, the increased count of surveillance cameras has manipulated the demand criteria for a higher effective video coding process. Moreover, the ultra-modern video coding standards have appreciably enhanced the efficiency of video coding, which has been developed for gathering common videos over surveillance videos. Various vehicle recognition techniques have provided a challenging and promising role in computer vision applications and intelligent transport systems. In this case, most of the conventional techniques have recognized the vehicles along with bounding box depiction and thus failed to provide the proper locations of the vehicles. Moreover, the position details have been vigorous in terms of various real-time applications trajectory of vehicle’s motion on the road as well as movement estimation. Numerous advancements have been offered throughout the years in the traffic surveillance area through the random propagation of intelligent traffic video surveillance techniques. The ultimate goal of this model is to design and enhance intelligent traffic video surveillance techniques by utilizing the developed deep learning techniques. This model has the ability to handle video traffic surveillance by measuring the speed of vehicles and recognizing their number plates. The initial process is considered the data collection, in which the traffic video data is gathered. Furthermore, the vehicle detection is performed by the Optimized YOLOv3 deep learning classifier, in which the parameter optimization is performed by using the newly recommended Modified Coyote Spider Monkey Optimization (MCSMO), which is the combination of Coyote Optimization Algorithm (COA) and Spider Monkey Optimization (SMO). Furthermore, the speed of the vehicles has been measured from each frame. For high-speed vehicles, the same Optimized YOLOv3 is used for detecting the number plates. Once the number plates are detected, plate character recognition is performed by the Improved Convolutional Neural Network (ICNN). Thus, the information about the vehicles, which are violating the traffic rules, can be conveyed to the vehicle owners and Regional Transport Office (RTO) to take further action to avoid accidents. From the experimental validation, the accuracy and precision rate of the designed method achieves 97.53% and 96.83%. Experimental results show that the proposed method achieves enhanced performance when compared to conventional models, thus ensuring the security of the transport system.
近年来,随着监控摄像机数量的不断增加,对高效视频编码处理的需求标准发生了变化。此外,超现代的视频编码标准大大提高了视频编码的效率,这是为了收集普通视频而不是监控视频而开发的。各种车辆识别技术在计算机视觉应用和智能交通系统中具有挑战性和前景。在这种情况下,大多数传统技术都是通过边界框描述来识别车辆,因此无法提供车辆的正确位置。此外,在各种实时应用中,车辆在道路上的运动轨迹以及运动估计的位置细节都得到了有力的处理。近年来,智能交通视频监控技术的随机传播在交通监控领域取得了许多进展。该模型的最终目标是利用已开发的深度学习技术设计和增强智能交通视频监控技术。该模型具有通过测量车辆速度和识别车牌来处理视频交通监控的能力。最初的过程被认为是数据采集,在这个过程中采集交通视频数据。此外,车辆检测采用优化后的YOLOv3深度学习分类器,其中参数优化采用新推荐的Coyote optimization Algorithm (COA)和Spider Monkey optimization (SMO)相结合的Modified Coyote Spider Monkey optimization (MCSMO)进行。此外,从每帧开始测量车辆的速度。对于高速车辆,同样的优化YOLOv3用于检测号牌。一旦检测到车牌,车牌字符识别由改进的卷积神经网络(ICNN)进行。因此,有关违反交通规则的车辆的信息可以传达给车主和区域运输办事处(RTO),以采取进一步行动,避免事故的发生。通过实验验证,所设计方法的准确度和精密度分别达到97.53%和96.83%。实验结果表明,与传统模型相比,该方法的性能得到了提高,从而保证了运输系统的安全性。
{"title":"Automatic Video Traffic Surveillance System with Number Plate Character Recognition Using Hybrid Optimization-Based YOLOv3 and Improved CNN","authors":"Manoj Krishna Bhosale, Shubhangi B. Patil, Babasaheb B Patil","doi":"10.1142/s021946782550041x","DOIUrl":"https://doi.org/10.1142/s021946782550041x","url":null,"abstract":"Recently, the increased count of surveillance cameras has manipulated the demand criteria for a higher effective video coding process. Moreover, the ultra-modern video coding standards have appreciably enhanced the efficiency of video coding, which has been developed for gathering common videos over surveillance videos. Various vehicle recognition techniques have provided a challenging and promising role in computer vision applications and intelligent transport systems. In this case, most of the conventional techniques have recognized the vehicles along with bounding box depiction and thus failed to provide the proper locations of the vehicles. Moreover, the position details have been vigorous in terms of various real-time applications trajectory of vehicle’s motion on the road as well as movement estimation. Numerous advancements have been offered throughout the years in the traffic surveillance area through the random propagation of intelligent traffic video surveillance techniques. The ultimate goal of this model is to design and enhance intelligent traffic video surveillance techniques by utilizing the developed deep learning techniques. This model has the ability to handle video traffic surveillance by measuring the speed of vehicles and recognizing their number plates. The initial process is considered the data collection, in which the traffic video data is gathered. Furthermore, the vehicle detection is performed by the Optimized YOLOv3 deep learning classifier, in which the parameter optimization is performed by using the newly recommended Modified Coyote Spider Monkey Optimization (MCSMO), which is the combination of Coyote Optimization Algorithm (COA) and Spider Monkey Optimization (SMO). Furthermore, the speed of the vehicles has been measured from each frame. For high-speed vehicles, the same Optimized YOLOv3 is used for detecting the number plates. Once the number plates are detected, plate character recognition is performed by the Improved Convolutional Neural Network (ICNN). Thus, the information about the vehicles, which are violating the traffic rules, can be conveyed to the vehicle owners and Regional Transport Office (RTO) to take further action to avoid accidents. From the experimental validation, the accuracy and precision rate of the designed method achieves 97.53% and 96.83%. Experimental results show that the proposed method achieves enhanced performance when compared to conventional models, thus ensuring the security of the transport system.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metaheuristic-Assisted Contextual Post-Filtering Method for Event Recommendation System 事件推荐系统的元启发式辅助语境后过滤方法
Q3 Computer Science Pub Date : 2023-11-03 DOI: 10.1142/s0219467825500433
B. N. Nithya, D. Evangelin Geetha, Manish Kumar
In today’s world, the web is a prominent communication channel. However, the variety of strategies available on event-based social networks (EBSNs) also makes it difficult for users to choose the events that are most relevant to their interests. In EBSNs, searching for events that better fit a user’s preferences are necessary, complex, and time consuming due to a large number of events available. Toward this end, a community-contributed data event recommender framework assists consumers in filtering daunting information and providing appropriate feedback, making EBSNs more appealing to them. A novel customized event recommendation system that uses the “multi-criteria decision-making (MCDM) approach” to rank the events is introduced in this research work. The calculation of categorical, geographical, temporal, and social factors is carried out in the proposed model, and the recommendation list is ordered using a contextual post-filtering system that includes Weight and Filter. To align the recommendation list, a new probabilistic weight model is added. To be more constructive, this model incorporates metaheuristic reasoning, which will fine-tune the probabilistic threshold value using a new hybrid algorithm. The proposed hybrid model is referred to as Beetle Swarm Hybridized Elephant Herding Algorithm (BSH-EHA), which combines the algorithms like Elephant Herding Optimization (EHO) and Beetle Swarm Optimization (BSO) Algorithm. Finally, the top recommendations will be given to the users.
在当今世界,网络是一个重要的沟通渠道。然而,基于事件的社交网络(EBSNs)上可用的各种策略也使得用户很难选择与他们的兴趣最相关的事件。在ebsn中,搜索更符合用户偏好的事件是必要的、复杂的、耗时的,因为有大量的事件可用。为此,社区贡献的数据事件推荐框架帮助消费者过滤令人生畏的信息并提供适当的反馈,从而使ebsn对他们更具吸引力。本文提出了一种采用“多准则决策”方法对事件进行排序的定制事件推荐系统。在提出的模型中进行了分类、地理、时间和社会因素的计算,并使用包含Weight和Filter的上下文后过滤系统对推荐列表进行排序。为了对齐推荐列表,添加了一个新的概率权重模型。为了更具建设性,该模型结合了元启发式推理,它将使用一种新的混合算法微调概率阈值。所提出的混合模型被称为甲虫群杂交象群算法(BSH-EHA),它结合了象群优化算法(EHO)和甲虫群优化算法(BSO)。最后,将给用户提供最佳推荐。
{"title":"Metaheuristic-Assisted Contextual Post-Filtering Method for Event Recommendation System","authors":"B. N. Nithya, D. Evangelin Geetha, Manish Kumar","doi":"10.1142/s0219467825500433","DOIUrl":"https://doi.org/10.1142/s0219467825500433","url":null,"abstract":"In today’s world, the web is a prominent communication channel. However, the variety of strategies available on event-based social networks (EBSNs) also makes it difficult for users to choose the events that are most relevant to their interests. In EBSNs, searching for events that better fit a user’s preferences are necessary, complex, and time consuming due to a large number of events available. Toward this end, a community-contributed data event recommender framework assists consumers in filtering daunting information and providing appropriate feedback, making EBSNs more appealing to them. A novel customized event recommendation system that uses the “multi-criteria decision-making (MCDM) approach” to rank the events is introduced in this research work. The calculation of categorical, geographical, temporal, and social factors is carried out in the proposed model, and the recommendation list is ordered using a contextual post-filtering system that includes Weight and Filter. To align the recommendation list, a new probabilistic weight model is added. To be more constructive, this model incorporates metaheuristic reasoning, which will fine-tune the probabilistic threshold value using a new hybrid algorithm. The proposed hybrid model is referred to as Beetle Swarm Hybridized Elephant Herding Algorithm (BSH-EHA), which combines the algorithms like Elephant Herding Optimization (EHO) and Beetle Swarm Optimization (BSO) Algorithm. Finally, the top recommendations will be given to the users.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Systematic Literature Review on Multimodal Image Fusion Models With Challenges and Future Research Trends 多模态图像融合模型的系统文献综述、挑战和未来研究趋势
Q3 Computer Science Pub Date : 2023-11-03 DOI: 10.1142/s0219467825500391
Jampani Ravi, R. Narmadha
Imaging technology has undergone extensive development since 1985, which has practical implications concerning civilians and the military. Recently, image fusion is an emerging tool in image processing that is adept at handling diverse image types. Those image types include remote sensing images and medical images for upgrading the information through the fusion of visible and infrared light based on the analysis of the materials used. Presently, image fusion has been mainly performed in the medical industry. With the constraints of diagnosing a disease via single-modality images, image fusion could be able to meet up the prerequisites. Hence, it is further suggested to develop a fusion model using different modalities of images. The major intention of the fusion approach is to achieve higher contrast, enhancing the quality of images and apparent knowledge. The validation of fused images is done by three factors that are: (i) fused images should sustain significant information from the source images, (ii) artifacts must not be present in the fused images and (iii) the flaws of noise and misregistration must be evaded. Multimodal image fusion is one of the developing domains through the implementation of robust algorithms and standard transformation techniques. Thus, this work aims to analyze the different contributions of various multimodal image fusion models using intelligent methods. It will provide an extensive literature survey on image fusion techniques and comparison of those methods with the existing ones. It will offer various state-of-the-arts of image fusion methods with their diverse levels as well as their pros and cons. This review will give an introduction to the current fusion methods, modes of multimodal fusion, the datasets used and performance metrics; and finally, it also discusses the challenges of multimodal image fusion methods and the future research trends.
自1985年以来,成像技术得到了广泛的发展,具有民用和军事方面的实际意义。近年来,图像融合是一种新兴的图像处理工具,它擅长处理不同类型的图像。这些图像类型包括遥感图像和医学图像,通过对所用材料的分析,通过融合可见光和红外光来升级信息。目前,图像融合主要应用于医疗行业。由于单模态图像诊断疾病的局限性,图像融合可以满足这一条件。因此,进一步建议开发一个融合模型使用不同的模式的图像。融合方法的主要目的是实现更高的对比度,增强图像质量和表观知识。融合图像的验证由三个因素完成,即:(i)融合图像应保持源图像的重要信息,(ii)融合图像中不得存在伪影,(iii)必须避免噪声和错配的缺陷。通过实现鲁棒算法和标准变换技术,多模态图像融合是一个发展中的领域。因此,本工作旨在利用智能方法分析各种多模态图像融合模型的不同贡献。它将提供广泛的文献综述图像融合技术和比较这些方法与现有的。它将提供各种最新的图像融合方法,它们具有不同的水平以及它们的优缺点。本综述将介绍当前的融合方法,多模态融合模式,使用的数据集和性能指标;最后,讨论了多模态图像融合方法面临的挑战和未来的研究趋势。
{"title":"A Systematic Literature Review on Multimodal Image Fusion Models With Challenges and Future Research Trends","authors":"Jampani Ravi, R. Narmadha","doi":"10.1142/s0219467825500391","DOIUrl":"https://doi.org/10.1142/s0219467825500391","url":null,"abstract":"Imaging technology has undergone extensive development since 1985, which has practical implications concerning civilians and the military. Recently, image fusion is an emerging tool in image processing that is adept at handling diverse image types. Those image types include remote sensing images and medical images for upgrading the information through the fusion of visible and infrared light based on the analysis of the materials used. Presently, image fusion has been mainly performed in the medical industry. With the constraints of diagnosing a disease via single-modality images, image fusion could be able to meet up the prerequisites. Hence, it is further suggested to develop a fusion model using different modalities of images. The major intention of the fusion approach is to achieve higher contrast, enhancing the quality of images and apparent knowledge. The validation of fused images is done by three factors that are: (i) fused images should sustain significant information from the source images, (ii) artifacts must not be present in the fused images and (iii) the flaws of noise and misregistration must be evaded. Multimodal image fusion is one of the developing domains through the implementation of robust algorithms and standard transformation techniques. Thus, this work aims to analyze the different contributions of various multimodal image fusion models using intelligent methods. It will provide an extensive literature survey on image fusion techniques and comparison of those methods with the existing ones. It will offer various state-of-the-arts of image fusion methods with their diverse levels as well as their pros and cons. This review will give an introduction to the current fusion methods, modes of multimodal fusion, the datasets used and performance metrics; and finally, it also discusses the challenges of multimodal image fusion methods and the future research trends.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135873593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Detection and Fusion Method for Multispectral Palmprint Recognition 多光谱掌纹识别的对抗检测与融合方法
Q3 Computer Science Pub Date : 2023-11-01 DOI: 10.1142/s0219467825500366
Yuze Zhou, Liwei Yan, Qi Zhu
As a kind of promising biometric technology, multispectral palmprint recognition methods have attracted increasing attention in security due to their high recognition accuracy and ease of use. It is worth noting that although multispectral palmprint data contains rich complementary information, multispectral palmprint recognition methods are still vulnerable to adversarial attacks. Even if only one image of a spectrum is attacked, it can have a catastrophic impact on the recognition results. Therefore, we propose a robustness-enhanced multispectral palmprint recognition method, including a model interpretability-based adversarial detection module and a robust multispectral fusion module. Inspired by the model interpretation technology, we found there is a large difference between clean palmprint and adversarial examples after CAM visualization. Using visualized images to build an adversarial detector can lead to better detection results. Finally, the weights of clean images and adversarial examples in the fusion layer are dynamically adjusted to obtain the correct recognition results. Experiments have shown that our method can make full use of the image features that are not attacked and can effectively improve the robustness of the model.
作为一种极具发展前景的生物识别技术,多光谱掌纹识别方法以其较高的识别精度和易用性在安全性方面受到越来越多的关注。值得注意的是,尽管多光谱掌纹数据包含丰富的互补信息,但多光谱掌纹识别方法仍然容易受到对抗性攻击。即使只有光谱中的一张图像受到攻击,也会对识别结果产生灾难性的影响。因此,我们提出了一种鲁棒性增强的多光谱掌纹识别方法,包括基于模型可解释性的对抗检测模块和鲁棒性多光谱融合模块。受模型解释技术的启发,我们发现CAM可视化后干净掌纹与对抗掌纹样本之间存在很大差异。使用可视化图像构建对抗检测器可以获得更好的检测结果。最后,动态调整融合层中干净图像和对抗样例的权重,得到正确的识别结果。实验表明,该方法能够充分利用图像中未被攻击的特征,有效提高模型的鲁棒性。
{"title":"Adversarial Detection and Fusion Method for Multispectral Palmprint Recognition","authors":"Yuze Zhou, Liwei Yan, Qi Zhu","doi":"10.1142/s0219467825500366","DOIUrl":"https://doi.org/10.1142/s0219467825500366","url":null,"abstract":"As a kind of promising biometric technology, multispectral palmprint recognition methods have attracted increasing attention in security due to their high recognition accuracy and ease of use. It is worth noting that although multispectral palmprint data contains rich complementary information, multispectral palmprint recognition methods are still vulnerable to adversarial attacks. Even if only one image of a spectrum is attacked, it can have a catastrophic impact on the recognition results. Therefore, we propose a robustness-enhanced multispectral palmprint recognition method, including a model interpretability-based adversarial detection module and a robust multispectral fusion module. Inspired by the model interpretation technology, we found there is a large difference between clean palmprint and adversarial examples after CAM visualization. Using visualized images to build an adversarial detector can lead to better detection results. Finally, the weights of clean images and adversarial examples in the fusion layer are dynamically adjusted to obtain the correct recognition results. Experiments have shown that our method can make full use of the image features that are not attacked and can effectively improve the robustness of the model.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135410857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Author Index (Volume 23) 作者索引(第 23 卷)
IF 1.6 Q3 Computer Science Pub Date : 2023-11-01 DOI: 10.1142/s0219467823990012
{"title":"Author Index (Volume 23)","authors":"","doi":"10.1142/s0219467823990012","DOIUrl":"https://doi.org/10.1142/s0219467823990012","url":null,"abstract":"","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139300900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Tracking Method for 3D Human Motion Pose Using Contrastive Learning 基于对比学习的三维人体运动姿态自动跟踪方法
Q3 Computer Science Pub Date : 2023-11-01 DOI: 10.1142/s0219467825500378
Zhipeng Li, Jun Wang, Lijun Hua, Honghui Liu, Wenli Song
Automatic tracking of three-dimensional (3D) human motion pose has the potential to provide corresponding technical support in various fields. However, existing methods for tracking human motion pose suffer from significant errors, long tracking times and suboptimal tracking results. To address these issues, an automatic tracking method for 3D human motion pose using contrastive learning is proposed. By using the feature parameters of 3D human motion poses, threshold variation parameters of 3D human motion poses are computed. The golden section is introduced to transform the threshold variation parameters and extract the features of 3D human motion poses by comparing the feature parameters with the threshold of parameter variation. Under the supervision of contrastive learning, a constraint loss is added to the local–global deep supervision module of contrastive learning to extract local parameters of 3D human motion poses, combined with their local features. After normalizing the 3D human motion pose images, frame differences of the background image are calculated. By constructing an automatic tracking model for 3D human motion poses, automatic tracking of 3D human motion poses is achieved. Experimental results demonstrate that the highest tracking lag is 9%, there is no deviation in node tracking, the pixel contrast is maintained above 90% and only 6 sub-blocks have detail loss. This indicates that the proposed method effectively tracks 3D human motion poses, tracks all the nodes, achieves high accuracy in automatic tracking and produces good tracking results.
三维(3D)人体运动姿态的自动跟踪具有在各个领域提供相应技术支持的潜力。然而,现有的人体运动姿态跟踪方法存在误差大、跟踪时间长、跟踪结果不理想等问题。为了解决这些问题,提出了一种基于对比学习的三维人体运动姿态自动跟踪方法。利用三维人体运动姿态的特征参数,计算出三维人体运动姿态的阈值变化参数。引入黄金分割对阈值变化参数进行变换,通过特征参数与参数变化阈值的比较,提取出三维人体运动姿态的特征。在对比学习的监督下,在对比学习的局部-全局深度监督模块中加入约束损失,结合人体三维运动姿态的局部特征提取局部参数。对三维人体运动姿态图像进行归一化后,计算背景图像的帧差。通过构建人体三维运动姿态自动跟踪模型,实现人体三维运动姿态的自动跟踪。实验结果表明,该算法的最大跟踪滞后为9%,节点跟踪无偏差,像素对比度保持在90%以上,只有6个子块存在细节丢失。这表明该方法能有效地跟踪人体三维运动姿态,跟踪所有节点,自动跟踪精度高,跟踪效果好。
{"title":"Automatic Tracking Method for 3D Human Motion Pose Using Contrastive Learning","authors":"Zhipeng Li, Jun Wang, Lijun Hua, Honghui Liu, Wenli Song","doi":"10.1142/s0219467825500378","DOIUrl":"https://doi.org/10.1142/s0219467825500378","url":null,"abstract":"Automatic tracking of three-dimensional (3D) human motion pose has the potential to provide corresponding technical support in various fields. However, existing methods for tracking human motion pose suffer from significant errors, long tracking times and suboptimal tracking results. To address these issues, an automatic tracking method for 3D human motion pose using contrastive learning is proposed. By using the feature parameters of 3D human motion poses, threshold variation parameters of 3D human motion poses are computed. The golden section is introduced to transform the threshold variation parameters and extract the features of 3D human motion poses by comparing the feature parameters with the threshold of parameter variation. Under the supervision of contrastive learning, a constraint loss is added to the local–global deep supervision module of contrastive learning to extract local parameters of 3D human motion poses, combined with their local features. After normalizing the 3D human motion pose images, frame differences of the background image are calculated. By constructing an automatic tracking model for 3D human motion poses, automatic tracking of 3D human motion poses is achieved. Experimental results demonstrate that the highest tracking lag is 9%, there is no deviation in node tracking, the pixel contrast is maintained above 90% and only 6 sub-blocks have detail loss. This indicates that the proposed method effectively tracks 3D human motion poses, tracks all the nodes, achieves high accuracy in automatic tracking and produces good tracking results.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135371900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Counting in Visual Question Answering: Methods, Datasets, and Future Work 视觉问答中的计数:方法、数据集和未来工作
Q3 Computer Science Pub Date : 2023-10-20 DOI: 10.1142/s0219467825500445
Tesfayee Meshu Welde, Lejian Liao
Visual Question Answering (VQA) is a language-based method for analyzing images, which is highly helpful in assisting people with visual impairment. The VQA system requires a demonstrated holistic image understanding and conducts basic reasoning tasks concerning the image in contrast to the specific task-oriented models that simply classifies object into categories. Thus, VQA systems contribute to the growth of Artificial Intelligence (AI) technology by answering open-ended, arbitrary questions about a given image. In addition, VQA is also used to assess the system’s ability by conducting Visual Turing Test (VTT). However, because of the inability to generate the essential datasets and being incapable of evaluating the systems due to flawlessness and bias, the VQA system is incapable of assessing the system’s overall efficiency. This is seen as a possible and significant limitation of the VQA system. This, in turn, has a negative impact on the progress of performance observed in VQA algorithms. Currently, the research on the VQA system is dealing with more specific sub-problems, which include counting in VQA systems. The counting sub-problem of VQA is a more sophisticated one, riddling with several challenging questions, especially when it comes to complex counting questions such as those that demand object identifications along with detection of objects attributes and positional reasoning. The pooling operation that is considered to perform an attention mechanism in VQA is found to degrade the counting performance. A number of algorithms have been developed to address this issue. In this paper, we provide a comprehensive survey of counting techniques in the VQA system that is developed especially for answering questions such as “How many?”. However, the performance progress achieved by this system is still not satisfactory due to bias that occurs in the datasets from the way we phrase the questions and because of weak evaluation metrics. In the future, fully-fledged architecture, wide-size datasets with complex counting questions and a detailed breakdown in categories, and strong evaluation metrics for evaluating the ability of the system to answer complex counting questions, such as positional and comparative reasoning will be executed.
视觉问答(Visual Question answer, VQA)是一种基于语言的图像分析方法,对视障人士有很大的帮助。VQA系统需要一个完整的图像理解,并对图像进行基本的推理任务,而不是简单地将对象分类的特定任务导向模型。因此,VQA系统通过回答关于给定图像的开放式、任意问题,促进了人工智能(AI)技术的发展。此外,VQA还用于通过进行视觉图灵测试(VTT)来评估系统的能力。然而,由于无法生成必要的数据集,并且由于缺陷和偏见而无法评估系统,VQA系统无法评估系统的整体效率。这被视为VQA系统的一个可能的和重要的限制。这反过来又会对VQA算法中观察到的性能进展产生负面影响。目前,对VQA系统的研究主要集中在更具体的子问题上,其中包括VQA系统中的计数问题。VQA的计数子问题是一个更复杂的问题,包含几个具有挑战性的问题,特别是当涉及到复杂的计数问题时,例如那些需要识别对象以及检测对象属性和位置推理的问题。在VQA中,池化操作被认为执行了一种注意机制,结果发现池化操作降低了计数性能。已经开发了许多算法来解决这个问题。在本文中,我们提供了VQA系统中计数技术的全面调查,该系统是专门为回答诸如“有多少?”之类的问题而开发的。然而,由于我们表达问题的方式和薄弱的评估指标在数据集中发生的偏差,该系统取得的性能进展仍然不令人满意。在未来,将会执行完全成熟的架构、具有复杂计数问题和类别详细细分的大尺寸数据集,以及用于评估系统回答复杂计数问题(如位置和比较推理)的能力的强大评估指标。
{"title":"Counting in Visual Question Answering: Methods, Datasets, and Future Work","authors":"Tesfayee Meshu Welde, Lejian Liao","doi":"10.1142/s0219467825500445","DOIUrl":"https://doi.org/10.1142/s0219467825500445","url":null,"abstract":"Visual Question Answering (VQA) is a language-based method for analyzing images, which is highly helpful in assisting people with visual impairment. The VQA system requires a demonstrated holistic image understanding and conducts basic reasoning tasks concerning the image in contrast to the specific task-oriented models that simply classifies object into categories. Thus, VQA systems contribute to the growth of Artificial Intelligence (AI) technology by answering open-ended, arbitrary questions about a given image. In addition, VQA is also used to assess the system’s ability by conducting Visual Turing Test (VTT). However, because of the inability to generate the essential datasets and being incapable of evaluating the systems due to flawlessness and bias, the VQA system is incapable of assessing the system’s overall efficiency. This is seen as a possible and significant limitation of the VQA system. This, in turn, has a negative impact on the progress of performance observed in VQA algorithms. Currently, the research on the VQA system is dealing with more specific sub-problems, which include counting in VQA systems. The counting sub-problem of VQA is a more sophisticated one, riddling with several challenging questions, especially when it comes to complex counting questions such as those that demand object identifications along with detection of objects attributes and positional reasoning. The pooling operation that is considered to perform an attention mechanism in VQA is found to degrade the counting performance. A number of algorithms have been developed to address this issue. In this paper, we provide a comprehensive survey of counting techniques in the VQA system that is developed especially for answering questions such as “How many?”. However, the performance progress achieved by this system is still not satisfactory due to bias that occurs in the datasets from the way we phrase the questions and because of weak evaluation metrics. In the future, fully-fledged architecture, wide-size datasets with complex counting questions and a detailed breakdown in categories, and strong evaluation metrics for evaluating the ability of the system to answer complex counting questions, such as positional and comparative reasoning will be executed.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135618886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Image and Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1