首页 > 最新文献

International Journal of Image and Graphics最新文献

英文 中文
Fault Signal Perception of Nanofiber Sensor for 3D Human Motion Detection Using Multi-Task Deep Learning 利用多任务深度学习感知用于三维人体运动检测的纳米纤维传感器的故障信号
IF 1.6 Q3 Computer Science Pub Date : 2024-03-12 DOI: 10.1142/s0219467825500603
Yun Liu
Once a fault occurs in the nanofiber sensor, the scientific and reliable three-dimensional (3D) human motion detection results will be compromised. It is necessary to accurately and rapidly perceive the fault signals of the nanofiber sensor and determine the type of fault, to enable it to continue operating in a sustained and stable manner. Therefore, we propose a fault signal perception method for 3D human motion detection nanofiber sensor based on multi-task deep learning. First, through obtaining the fault characteristic parameters of the nanofiber sensor, the fault of the nanofiber sensor is reconstructed to complete the fault location of the nanofiber sensor. Second, the fault signal of the nanofiber sensor is mapped by the penalty function, and the feature extraction model of the fault signal of the nanofiber sensor is constructed by combining the multi-task deep learning. Finally, the multi-task deep learning algorithm is used to calculate the sampling frequency of the fault signal, and the key variable information of the fault of the nanofiber sensor is extracted according to the amplitude of the state change of the nanofiber sensor, to realize the perception of the fault signal of the nanofiber sensor. The results show that the proposed method can accurately perceive the fault signal of a nanofiber sensor in 3D human motion detection, the maximum sensor fault location accuracy is 97%, and the maximum noise content of the fault signal is only 5 dB, which shows that the method can be widely used in fault signal perception.
一旦纳米纤维传感器出现故障,科学可靠的三维(3D)人体运动检测结果就会受到影响。因此,有必要准确、快速地感知纳米纤维传感器的故障信号,并确定故障类型,使其能够持续、稳定地工作。因此,我们提出了一种基于多任务深度学习的三维人体运动检测纳米纤维传感器故障信号感知方法。首先,通过获取纳米纤维传感器的故障特征参数,重构纳米纤维传感器的故障,完成纳米纤维传感器的故障定位。其次,利用惩罚函数对纳米纤维传感器的故障信号进行映射,并结合多任务深度学习构建纳米纤维传感器故障信号的特征提取模型。最后,利用多任务深度学习算法计算故障信号的采样频率,并根据纳米纤维传感器的状态变化幅度提取纳米纤维传感器故障的关键变量信息,实现对纳米纤维传感器故障信号的感知。结果表明,所提出的方法能在三维人体运动检测中准确感知纳米纤维传感器的故障信号,传感器故障定位精度最高可达97%,故障信号的最大噪声含量仅为5 dB,表明该方法可广泛应用于故障信号的感知。
{"title":"Fault Signal Perception of Nanofiber Sensor for 3D Human Motion Detection Using Multi-Task Deep Learning","authors":"Yun Liu","doi":"10.1142/s0219467825500603","DOIUrl":"https://doi.org/10.1142/s0219467825500603","url":null,"abstract":"Once a fault occurs in the nanofiber sensor, the scientific and reliable three-dimensional (3D) human motion detection results will be compromised. It is necessary to accurately and rapidly perceive the fault signals of the nanofiber sensor and determine the type of fault, to enable it to continue operating in a sustained and stable manner. Therefore, we propose a fault signal perception method for 3D human motion detection nanofiber sensor based on multi-task deep learning. First, through obtaining the fault characteristic parameters of the nanofiber sensor, the fault of the nanofiber sensor is reconstructed to complete the fault location of the nanofiber sensor. Second, the fault signal of the nanofiber sensor is mapped by the penalty function, and the feature extraction model of the fault signal of the nanofiber sensor is constructed by combining the multi-task deep learning. Finally, the multi-task deep learning algorithm is used to calculate the sampling frequency of the fault signal, and the key variable information of the fault of the nanofiber sensor is extracted according to the amplitude of the state change of the nanofiber sensor, to realize the perception of the fault signal of the nanofiber sensor. The results show that the proposed method can accurately perceive the fault signal of a nanofiber sensor in 3D human motion detection, the maximum sensor fault location accuracy is 97%, and the maximum noise content of the fault signal is only 5 dB, which shows that the method can be widely used in fault signal perception.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140248265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entropy Kernel Graph Cut Feature Space Enhancement with SqueezeNet Deep Neural Network for Textural Image Segmentation 用 SqueezeNet 深度神经网络增强用于纹理图像分割的熵核图切特征空间
IF 1.6 Q3 Computer Science Pub Date : 2024-03-12 DOI: 10.1142/s0219467825500640
M. Niazi, Kambiz Rahbar
Recently, image segmentation based on graph cut methods has shown remarkable performance on a set of image data. Although the kernel graph cut method provides good performance, its performance is highly dependent on the data mapping to the transformation space and image features. The entropy-based kernel graph cut method is suitable for segmentation of textured images. Nonetheless, its segmentation quality remains significantly contingent on the accuracy and richness of feature space representation and kernel centers. This paper introduces an entropy-based kernel graph cut method, which leverages the discriminative feature space extracted from SqueezeNet, a deep neural network. The fusion of SqueezeNet’s features enriches the segmentation process by capturing high-level semantic information. Moreover, the extraction of kernel centers is refined through a weighted k-means approach, contributing further to the segmentation’s precision and effectiveness. The proposed method, while exploiting the benefits of suitable computational load of graph cut methods, will be a suitable alternative for segmenting textured images. Laboratory results have been taken on a set of well-known datasets that include textured shapes in order to evaluate the efficiency of the algorithm compared to other well-known methods in the field of kernel graph cut.
最近,基于图切割方法的图像分割在一组图像数据上表现出了不俗的性能。虽然核图切割法性能良好,但其性能高度依赖于数据映射到变换空间和图像特征。基于熵的核图切割方法适用于纹理图像的分割。然而,其分割质量在很大程度上取决于特征空间表示和核中心的准确性和丰富性。本文介绍了一种基于熵的核图切割方法,该方法利用了从深度神经网络 SqueezeNet 中提取的分辨特征空间。通过捕捉高级语义信息,融合 SqueezeNet 的特征丰富了分割过程。此外,内核中心的提取是通过加权 k-means 方法来完善的,从而进一步提高了分割的精度和有效性。所提出的方法利用了图切割方法计算量大的优点,将成为纹理图像分割的合适替代方法。我们在一组包含纹理形状的知名数据集上取得了实验结果,以评估该算法与核图切割领域其他知名方法相比的效率。
{"title":"Entropy Kernel Graph Cut Feature Space Enhancement with SqueezeNet Deep Neural Network for Textural Image Segmentation","authors":"M. Niazi, Kambiz Rahbar","doi":"10.1142/s0219467825500640","DOIUrl":"https://doi.org/10.1142/s0219467825500640","url":null,"abstract":"Recently, image segmentation based on graph cut methods has shown remarkable performance on a set of image data. Although the kernel graph cut method provides good performance, its performance is highly dependent on the data mapping to the transformation space and image features. The entropy-based kernel graph cut method is suitable for segmentation of textured images. Nonetheless, its segmentation quality remains significantly contingent on the accuracy and richness of feature space representation and kernel centers. This paper introduces an entropy-based kernel graph cut method, which leverages the discriminative feature space extracted from SqueezeNet, a deep neural network. The fusion of SqueezeNet’s features enriches the segmentation process by capturing high-level semantic information. Moreover, the extraction of kernel centers is refined through a weighted k-means approach, contributing further to the segmentation’s precision and effectiveness. The proposed method, while exploiting the benefits of suitable computational load of graph cut methods, will be a suitable alternative for segmenting textured images. Laboratory results have been taken on a set of well-known datasets that include textured shapes in order to evaluate the efficiency of the algorithm compared to other well-known methods in the field of kernel graph cut.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140249427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Generative Adversarial Network in Image Color Correction 生成式对抗网络在图像色彩校正中的应用
IF 1.6 Q3 Computer Science Pub Date : 2024-03-06 DOI: 10.1142/s021946782550069x
Meiling Chen, Yao Shi, Lvfen Zhu
The popularity of electronic products has increased with the development of technology. Electronic devices allow people to obtain information through the transmission of images. However, color distortion can occur during the transmission process, which may hinder the usefulness of the images. To this end, a deep residual network and a deep convolutional network were used to define the generator and discriminator. Then, self-attention-enhanced convolution was applied to the generator network to construct an image resolution correction model based on coupled generative adversarial networks. On this basis, a generative network model integrating multi-scale features and contextual attention mechanism was constructed to achieve image restoration. Finally, performance and image restoration application tests were conducted on the constructed model. The test showed that when the coupled generative adversarial network was tested on the Set5 dataset, the image peak signal-to-noise ratio and image structure similarity values were 31.2575 and 0.8173. On the Set14 dataset, they were 30.8521 and 0.8079, respectively. The multi-scale feature-fusion algorithm was tested on the BSDS100 dataset with an image peak signal-to-noise ratio of 30.2541 and an image structure similarity value of 0.8352. Based on the data presented, it can be concluded that the image correction model constructed in this study has a strong image restoration ability. The reconstructed image has the highest similarity with the real high-resolution image and a low distortion rate. It can achieve the task of repairing problems such as color distortion during image transmission. In addition, this study can provide technical support for similar information correction and restoration work.
随着科技的发展,电子产品越来越受欢迎。人们可以通过电子设备传输图像来获取信息。然而,在传输过程中可能会出现色彩失真,这可能会妨碍图像的实用性。为此,我们使用了深度残差网络和深度卷积网络来定义生成器和判别器。然后,将自注意力增强卷积应用于生成器网络,以构建基于耦合生成对抗网络的图像分辨率校正模型。在此基础上,构建了一个集成了多尺度特征和上下文注意机制的生成网络模型,以实现图像复原。最后,对所构建的模型进行了性能和图像修复应用测试。测试结果表明,当耦合生成式对抗网络在 Set5 数据集上进行测试时,图像峰值信噪比和图像结构相似度值分别为 31.2575 和 0.8173。在 Set14 数据集上,这两个值分别为 30.8521 和 0.8079。在 BSDS100 数据集上测试了多尺度特征融合算法,其图像峰值信噪比为 30.2541,图像结构相似度值为 0.8352。基于以上数据,可以得出结论:本研究构建的图像校正模型具有很强的图像复原能力。重建后的图像与真实的高分辨率图像相似度最高,失真率较低。它可以实现图像传输过程中色彩失真等问题的修复任务。此外,本研究还能为类似的信息校正和修复工作提供技术支持。
{"title":"Application of Generative Adversarial Network in Image Color Correction","authors":"Meiling Chen, Yao Shi, Lvfen Zhu","doi":"10.1142/s021946782550069x","DOIUrl":"https://doi.org/10.1142/s021946782550069x","url":null,"abstract":"The popularity of electronic products has increased with the development of technology. Electronic devices allow people to obtain information through the transmission of images. However, color distortion can occur during the transmission process, which may hinder the usefulness of the images. To this end, a deep residual network and a deep convolutional network were used to define the generator and discriminator. Then, self-attention-enhanced convolution was applied to the generator network to construct an image resolution correction model based on coupled generative adversarial networks. On this basis, a generative network model integrating multi-scale features and contextual attention mechanism was constructed to achieve image restoration. Finally, performance and image restoration application tests were conducted on the constructed model. The test showed that when the coupled generative adversarial network was tested on the Set5 dataset, the image peak signal-to-noise ratio and image structure similarity values were 31.2575 and 0.8173. On the Set14 dataset, they were 30.8521 and 0.8079, respectively. The multi-scale feature-fusion algorithm was tested on the BSDS100 dataset with an image peak signal-to-noise ratio of 30.2541 and an image structure similarity value of 0.8352. Based on the data presented, it can be concluded that the image correction model constructed in this study has a strong image restoration ability. The reconstructed image has the highest similarity with the real high-resolution image and a low distortion rate. It can achieve the task of repairing problems such as color distortion during image transmission. In addition, this study can provide technical support for similar information correction and restoration work.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140078544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unconstrained Face Recognition Using Infrared Images 利用红外图像进行无约束人脸识别
IF 1.6 Q3 Computer Science Pub Date : 2024-02-05 DOI: 10.1142/s0219467825500561
Asif Raza Butt, Zahid Ur Rahman, Anwar Ul Haq, Bilal Ahmed, Sajjad Manzoor
Recently, face recognition (FR) has become an important research topic due to increase in video surveillance. However, the surveillance images may have vague non-frontal faces, especially with the unidentifiable face pose or unconstrained environment such as bad illumination and dark environment. As a result, most FR algorithms would not show good performance when they are applied on these images. On the contrary, it is common at surveillance field that only Single Sample per Person (SSPP) is available for identification. In order to resolve such issues, visible spectrum infrared images were used which can work in entirely dark condition without having any light variations. Furthermore, to effectively improve FR for both the low-quality SSPP and unidentifiable pose problem, an approach to synthesize 3D face modeling and pose variations is proposed in this paper. A 2D frontal face image is used to generate a 3D face model. Then several virtual face test images with different poses are synthesized from this model. A well-known Surveillance Camera’s Face (SCface) database is utilized to evaluate the proposed algorithm by using PCA, LDA, KPCA, KFA, RSLDA, LRPP-GRR, deep KNN and DLIB deep learning. The effectiveness of the proposed method is verified through simulations, where increase in average recognition rates up to 10%, 27.69%, 14.62%, 25.38%, 57.46%, 57.43, 37.69% and 63.28%, respectively, for SCface database as observed.
最近,由于视频监控的增加,人脸识别(FR)已成为一个重要的研究课题。然而,监控图像中的非正面人脸可能比较模糊,尤其是在人脸姿势无法识别或环境不受约束(如光照不足和环境昏暗)的情况下。因此,大多数 FR 算法在应用于这些图像时都不会表现出良好的性能。相反,在监控现场,只有每个人的单个样本(SSPP)可用于识别是很常见的。为了解决这些问题,我们使用了可见光谱红外图像,它可以在完全黑暗的条件下工作,没有任何光线变化。此外,为了有效改善 FR,解决低质量 SSPP 和姿势无法识别的问题,本文提出了一种综合三维人脸建模和姿势变化的方法。二维正面人脸图像用于生成三维人脸模型。然后根据该模型合成多个不同姿势的虚拟人脸测试图像。利用知名的监控摄像头人脸(SCface)数据库,通过 PCA、LDA、KPCA、KFA、RSLDA、LRPP-GRR、深度 KNN 和 DLIB 深度学习来评估所提出的算法。通过模拟验证了所提方法的有效性,观察到 SCface 数据库的平均识别率分别提高了 10%、27.69%、14.62%、25.38%、57.46%、57.43%、37.69% 和 63.28%。
{"title":"Unconstrained Face Recognition Using Infrared Images","authors":"Asif Raza Butt, Zahid Ur Rahman, Anwar Ul Haq, Bilal Ahmed, Sajjad Manzoor","doi":"10.1142/s0219467825500561","DOIUrl":"https://doi.org/10.1142/s0219467825500561","url":null,"abstract":"Recently, face recognition (FR) has become an important research topic due to increase in video surveillance. However, the surveillance images may have vague non-frontal faces, especially with the unidentifiable face pose or unconstrained environment such as bad illumination and dark environment. As a result, most FR algorithms would not show good performance when they are applied on these images. On the contrary, it is common at surveillance field that only Single Sample per Person (SSPP) is available for identification. In order to resolve such issues, visible spectrum infrared images were used which can work in entirely dark condition without having any light variations. Furthermore, to effectively improve FR for both the low-quality SSPP and unidentifiable pose problem, an approach to synthesize 3D face modeling and pose variations is proposed in this paper. A 2D frontal face image is used to generate a 3D face model. Then several virtual face test images with different poses are synthesized from this model. A well-known Surveillance Camera’s Face (SCface) database is utilized to evaluate the proposed algorithm by using PCA, LDA, KPCA, KFA, RSLDA, LRPP-GRR, deep KNN and DLIB deep learning. The effectiveness of the proposed method is verified through simulations, where increase in average recognition rates up to 10%, 27.69%, 14.62%, 25.38%, 57.46%, 57.43, 37.69% and 63.28%, respectively, for SCface database as observed.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139805549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Congestion Avoidance in TCP Based on Optimized Random Forest with Improved Random Early Detection Algorithm 基于优化随机森林和改进随机早期检测算法的 TCP 拥塞规避技术
IF 1.6 Q3 Computer Science Pub Date : 2024-02-05 DOI: 10.1142/s021946782550055x
Ajay Kumar, Naveen Hemrajani
Transmission control protocol (TCP) ensures that data are safely and accurately transported over the network for applications that use the transport protocol to allow reliable information delivery. Nowadays, internet usage in the network is growing and has been developing many protocols in the network layer. Congestion leads to packet loss, the high time required for data transmission in the TCP protocol transport layer for end-to-end connections is one of the biggest issues with the internet. An optimized random forest algorithm (RFA) with improved random early detection (IRED) for congestion prediction and avoidance in transport layer is proposed to overcome the drawbacks. Data are initially gathered and sent through data pre-processing to improve the data quality. For data pre-processing, KNN-based missing value imputation is applied to replace the values that are missing in raw data and [Formula: see text]-score normalization is utilized to scale the data in a certain range. Following that, congestion is predicted using an optimized RFA and whale optimization algorithm (WOA) is used to set the learning rate as efficiently as possible in order to reduce error and improve forecast accuracy. To avoid congestion, IRED method is utilized for a congestion-free network in the transport layer. Performance metrics are evaluated and compared with the existing techniques with respect to accuracy, precision, recall, specificity, and error, whose values that occur for the proposed model are 98%, 98%, 99%, 98%, and 1%. Throughput and latency are also evaluated in the proposed method to determine the performance of the network. Finally, the proposed method performs better when compared to the existing techniques and prediction, and avoidance of congestion is identified accurately in the network.
传输控制协议(TCP)可确保数据在网络上安全、准确地传输,使使用传输协议的应用程序能够可靠地传递信息。如今,互联网在网络中的使用越来越多,网络层中的许多协议也在不断发展。拥塞导致数据包丢失,TCP 协议传输层端到端连接数据传输所需的高时间是互联网最大的问题之一。为了克服这些弊端,我们提出了一种优化的随机森林算法(RFA)和改进的随机早期检测(IRED),用于预测和避免传输层的拥塞。数据最初是通过数据预处理收集和发送的,以提高数据质量。在数据预处理中,应用基于 KNN 的缺失值估算来替换原始数据中的缺失值,并利用[公式:见正文]-分数归一化将数据缩放在一定范围内。然后,使用优化的 RFA 预测拥堵情况,并使用鲸鱼优化算法 (WOA) 尽可能有效地设置学习率,以减少误差并提高预测准确性。为避免拥塞,在传输层利用 IRED 方法实现无拥塞网络。在准确度、精确度、召回率、特异性和误差方面,对性能指标进行了评估,并与现有技术进行了比较,所提模型的准确度、精确度、召回率、特异性和误差值分别为 98%、98%、99%、98% 和 1%。为了确定网络的性能,建议的方法还对吞吐量和延迟进行了评估。最后,与现有技术和预测相比,所提出的方法性能更好,而且能准确识别网络中的拥塞避免情况。
{"title":"Congestion Avoidance in TCP Based on Optimized Random Forest with Improved Random Early Detection Algorithm","authors":"Ajay Kumar, Naveen Hemrajani","doi":"10.1142/s021946782550055x","DOIUrl":"https://doi.org/10.1142/s021946782550055x","url":null,"abstract":"Transmission control protocol (TCP) ensures that data are safely and accurately transported over the network for applications that use the transport protocol to allow reliable information delivery. Nowadays, internet usage in the network is growing and has been developing many protocols in the network layer. Congestion leads to packet loss, the high time required for data transmission in the TCP protocol transport layer for end-to-end connections is one of the biggest issues with the internet. An optimized random forest algorithm (RFA) with improved random early detection (IRED) for congestion prediction and avoidance in transport layer is proposed to overcome the drawbacks. Data are initially gathered and sent through data pre-processing to improve the data quality. For data pre-processing, KNN-based missing value imputation is applied to replace the values that are missing in raw data and [Formula: see text]-score normalization is utilized to scale the data in a certain range. Following that, congestion is predicted using an optimized RFA and whale optimization algorithm (WOA) is used to set the learning rate as efficiently as possible in order to reduce error and improve forecast accuracy. To avoid congestion, IRED method is utilized for a congestion-free network in the transport layer. Performance metrics are evaluated and compared with the existing techniques with respect to accuracy, precision, recall, specificity, and error, whose values that occur for the proposed model are 98%, 98%, 99%, 98%, and 1%. Throughput and latency are also evaluated in the proposed method to determine the performance of the network. Finally, the proposed method performs better when compared to the existing techniques and prediction, and avoidance of congestion is identified accurately in the network.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139806205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Modal Information Fusion for Localization of Emergency Vehicles 多模式信息融合用于紧急车辆定位
IF 1.6 Q3 Computer Science Pub Date : 2024-02-05 DOI: 10.1142/s0219467825500500
Arunakumar Joshi, Shrinivasrao B. Kulkarni
In urban and city environments, road transportation contributes significantly to the generation of substantial traffic. However, this surge in vehicles leads to complex issues, including hindered emergency vehicle movement due to high density and congestion. Scarcity of human personnel amplifies these challenges. As traffic conditions worsen, the need for automated solutions to manage emergency situations becomes more evident. Intelligent traffic monitoring can identify and prioritize emergency vehicles, potentially saving lives. However, categorizing emergency vehicles through visual analysis faces difficulties such as clutter, occlusions, and traffic variations. Visual-based techniques for vehicle detection rely on clear rear views, but this is problematic in dense traffic. In contrast, audio-based methods are resilient to the Doppler Effect from moving vehicles, but handling diverse background noises remains unexplored. Using acoustics for emergency vehicle localization presents challenges related to sensor range and real-world noise. Addressing these issues, this study introduces a novel solution: combining visual and audio data for enhanced detection and localization of emergency vehicles in road networks. Leveraging this multi-modal approach aims to bolster accuracy and robustness in emergency vehicle management. The proposed methodology consists of several key steps. The presence of an emergency vehicle is initially detected through the preprocessing of visual images, involving the removal of clutter and occlusions via an adaptive background model. Subsequently, a cell-wise classification strategy utilizing a customized Visual Geometry Group Network (VGGNet) deep learning model is employed to determine the presence of emergency vehicles within individual cells. To further reinforce the accuracy of emergency vehicle presence detection, the outcomes from the audio data analysis are integrated. This involves the extraction of spectral features from audio streams, followed by classification utilizing a support vector machine (SVM) model. The fusion of information derived from both visual and audio sources is utilized in the construction of a more comprehensive and refined traffic state map. This augmented map facilitates the effective management of emergency vehicle transit. In empirical evaluations, the proposed solution demonstrates its capability to mitigate challenges like visual clutter, occlusions, and variations in traffic density common issues encountered in traditional visual analysis methods. Notably, the proposed approach achieves an impressive accuracy rate of approximately 98.15% in the localization of emergency vehicles.
在城市和都市环境中,道路交通极大地增加了交通流量。然而,车辆的激增导致了复杂的问题,包括由于高密度和拥堵而导致的紧急车辆通行受阻。人员稀缺加剧了这些挑战。随着交通状况的恶化,管理紧急情况的自动化解决方案的需求变得更加明显。智能交通监控可以识别紧急车辆并对其进行优先排序,从而挽救生命。然而,通过视觉分析对紧急车辆进行分类面临着杂乱、遮挡和交通变化等困难。基于视觉的车辆检测技术依赖于清晰的后方视野,但这在密集的交通中很成问题。相比之下,基于音频的方法能够抵御移动车辆产生的多普勒效应,但对各种背景噪声的处理仍有待探索。使用声学技术进行紧急车辆定位面临着与传感器范围和现实世界噪声有关的挑战。针对这些问题,本研究提出了一种新颖的解决方案:结合视觉和音频数据,增强对道路网络中紧急车辆的检测和定位。利用这种多模式方法,旨在提高应急车辆管理的准确性和稳健性。建议的方法包括几个关键步骤。首先通过预处理视觉图像来检测紧急车辆的存在,包括通过自适应背景模型去除杂波和遮挡物。随后,利用定制的视觉几何组网络(VGGNet)深度学习模型,采用单元分类策略来确定各个单元内是否存在紧急车辆。为了进一步提高紧急车辆存在检测的准确性,还整合了音频数据分析的结果。这包括从音频流中提取频谱特征,然后利用支持向量机(SVM)模型进行分类。在构建更全面、更精细的交通状态地图时,将利用从视觉和音频来源获得的信息进行融合。这种增强地图有助于对紧急车辆过境进行有效管理。在实证评估中,所提出的解决方案证明了其有能力减轻视觉杂波、遮挡和交通密度变化等挑战,这些都是传统视觉分析方法中常见的问题。值得注意的是,所提出的方法在紧急车辆定位方面达到了令人印象深刻的准确率,约为 98.15%。
{"title":"Multi-Modal Information Fusion for Localization of Emergency Vehicles","authors":"Arunakumar Joshi, Shrinivasrao B. Kulkarni","doi":"10.1142/s0219467825500500","DOIUrl":"https://doi.org/10.1142/s0219467825500500","url":null,"abstract":"In urban and city environments, road transportation contributes significantly to the generation of substantial traffic. However, this surge in vehicles leads to complex issues, including hindered emergency vehicle movement due to high density and congestion. Scarcity of human personnel amplifies these challenges. As traffic conditions worsen, the need for automated solutions to manage emergency situations becomes more evident. Intelligent traffic monitoring can identify and prioritize emergency vehicles, potentially saving lives. However, categorizing emergency vehicles through visual analysis faces difficulties such as clutter, occlusions, and traffic variations. Visual-based techniques for vehicle detection rely on clear rear views, but this is problematic in dense traffic. In contrast, audio-based methods are resilient to the Doppler Effect from moving vehicles, but handling diverse background noises remains unexplored. Using acoustics for emergency vehicle localization presents challenges related to sensor range and real-world noise. Addressing these issues, this study introduces a novel solution: combining visual and audio data for enhanced detection and localization of emergency vehicles in road networks. Leveraging this multi-modal approach aims to bolster accuracy and robustness in emergency vehicle management. The proposed methodology consists of several key steps. The presence of an emergency vehicle is initially detected through the preprocessing of visual images, involving the removal of clutter and occlusions via an adaptive background model. Subsequently, a cell-wise classification strategy utilizing a customized Visual Geometry Group Network (VGGNet) deep learning model is employed to determine the presence of emergency vehicles within individual cells. To further reinforce the accuracy of emergency vehicle presence detection, the outcomes from the audio data analysis are integrated. This involves the extraction of spectral features from audio streams, followed by classification utilizing a support vector machine (SVM) model. The fusion of information derived from both visual and audio sources is utilized in the construction of a more comprehensive and refined traffic state map. This augmented map facilitates the effective management of emergency vehicle transit. In empirical evaluations, the proposed solution demonstrates its capability to mitigate challenges like visual clutter, occlusions, and variations in traffic density common issues encountered in traditional visual analysis methods. Notably, the proposed approach achieves an impressive accuracy rate of approximately 98.15% in the localization of emergency vehicles.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139865007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unconstrained Face Recognition Using Infrared Images 利用红外图像进行无约束人脸识别
IF 1.6 Q3 Computer Science Pub Date : 2024-02-05 DOI: 10.1142/s0219467825500561
Asif Raza Butt, Zahid Ur Rahman, Anwar Ul Haq, Bilal Ahmed, Sajjad Manzoor
Recently, face recognition (FR) has become an important research topic due to increase in video surveillance. However, the surveillance images may have vague non-frontal faces, especially with the unidentifiable face pose or unconstrained environment such as bad illumination and dark environment. As a result, most FR algorithms would not show good performance when they are applied on these images. On the contrary, it is common at surveillance field that only Single Sample per Person (SSPP) is available for identification. In order to resolve such issues, visible spectrum infrared images were used which can work in entirely dark condition without having any light variations. Furthermore, to effectively improve FR for both the low-quality SSPP and unidentifiable pose problem, an approach to synthesize 3D face modeling and pose variations is proposed in this paper. A 2D frontal face image is used to generate a 3D face model. Then several virtual face test images with different poses are synthesized from this model. A well-known Surveillance Camera’s Face (SCface) database is utilized to evaluate the proposed algorithm by using PCA, LDA, KPCA, KFA, RSLDA, LRPP-GRR, deep KNN and DLIB deep learning. The effectiveness of the proposed method is verified through simulations, where increase in average recognition rates up to 10%, 27.69%, 14.62%, 25.38%, 57.46%, 57.43, 37.69% and 63.28%, respectively, for SCface database as observed.
最近,由于视频监控的增加,人脸识别(FR)已成为一个重要的研究课题。然而,监控图像中的非正面人脸可能比较模糊,尤其是在人脸姿势无法识别或环境不受约束(如光照不足和环境昏暗)的情况下。因此,大多数 FR 算法在应用于这些图像时都不会表现出良好的性能。相反,在监控现场,只有每个人的单个样本(SSPP)可用于识别是很常见的。为了解决这些问题,我们使用了可见光谱红外图像,它可以在完全黑暗的条件下工作,没有任何光线变化。此外,为了有效改善 FR,解决低质量 SSPP 和姿势无法识别的问题,本文提出了一种综合三维人脸建模和姿势变化的方法。二维正面人脸图像用于生成三维人脸模型。然后根据该模型合成多个不同姿势的虚拟人脸测试图像。利用知名的监控摄像头人脸(SCface)数据库,通过 PCA、LDA、KPCA、KFA、RSLDA、LRPP-GRR、深度 KNN 和 DLIB 深度学习来评估所提出的算法。通过模拟验证了所提方法的有效性,观察到 SCface 数据库的平均识别率分别提高了 10%、27.69%、14.62%、25.38%、57.46%、57.43%、37.69% 和 63.28%。
{"title":"Unconstrained Face Recognition Using Infrared Images","authors":"Asif Raza Butt, Zahid Ur Rahman, Anwar Ul Haq, Bilal Ahmed, Sajjad Manzoor","doi":"10.1142/s0219467825500561","DOIUrl":"https://doi.org/10.1142/s0219467825500561","url":null,"abstract":"Recently, face recognition (FR) has become an important research topic due to increase in video surveillance. However, the surveillance images may have vague non-frontal faces, especially with the unidentifiable face pose or unconstrained environment such as bad illumination and dark environment. As a result, most FR algorithms would not show good performance when they are applied on these images. On the contrary, it is common at surveillance field that only Single Sample per Person (SSPP) is available for identification. In order to resolve such issues, visible spectrum infrared images were used which can work in entirely dark condition without having any light variations. Furthermore, to effectively improve FR for both the low-quality SSPP and unidentifiable pose problem, an approach to synthesize 3D face modeling and pose variations is proposed in this paper. A 2D frontal face image is used to generate a 3D face model. Then several virtual face test images with different poses are synthesized from this model. A well-known Surveillance Camera’s Face (SCface) database is utilized to evaluate the proposed algorithm by using PCA, LDA, KPCA, KFA, RSLDA, LRPP-GRR, deep KNN and DLIB deep learning. The effectiveness of the proposed method is verified through simulations, where increase in average recognition rates up to 10%, 27.69%, 14.62%, 25.38%, 57.46%, 57.43, 37.69% and 63.28%, respectively, for SCface database as observed.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139865334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Congestion Avoidance in TCP Based on Optimized Random Forest with Improved Random Early Detection Algorithm 基于优化随机森林和改进随机早期检测算法的 TCP 拥塞规避技术
IF 1.6 Q3 Computer Science Pub Date : 2024-02-05 DOI: 10.1142/s021946782550055x
Ajay Kumar, Naveen Hemrajani
Transmission control protocol (TCP) ensures that data are safely and accurately transported over the network for applications that use the transport protocol to allow reliable information delivery. Nowadays, internet usage in the network is growing and has been developing many protocols in the network layer. Congestion leads to packet loss, the high time required for data transmission in the TCP protocol transport layer for end-to-end connections is one of the biggest issues with the internet. An optimized random forest algorithm (RFA) with improved random early detection (IRED) for congestion prediction and avoidance in transport layer is proposed to overcome the drawbacks. Data are initially gathered and sent through data pre-processing to improve the data quality. For data pre-processing, KNN-based missing value imputation is applied to replace the values that are missing in raw data and [Formula: see text]-score normalization is utilized to scale the data in a certain range. Following that, congestion is predicted using an optimized RFA and whale optimization algorithm (WOA) is used to set the learning rate as efficiently as possible in order to reduce error and improve forecast accuracy. To avoid congestion, IRED method is utilized for a congestion-free network in the transport layer. Performance metrics are evaluated and compared with the existing techniques with respect to accuracy, precision, recall, specificity, and error, whose values that occur for the proposed model are 98%, 98%, 99%, 98%, and 1%. Throughput and latency are also evaluated in the proposed method to determine the performance of the network. Finally, the proposed method performs better when compared to the existing techniques and prediction, and avoidance of congestion is identified accurately in the network.
传输控制协议(TCP)可确保数据在网络上安全、准确地传输,使使用传输协议的应用程序能够可靠地传递信息。如今,互联网在网络中的使用越来越多,网络层中的许多协议也在不断发展。拥塞导致数据包丢失,TCP 协议传输层端到端连接数据传输所需的高时间是互联网最大的问题之一。为了克服这些弊端,我们提出了一种优化的随机森林算法(RFA)和改进的随机早期检测(IRED),用于预测和避免传输层的拥塞。数据最初是通过数据预处理收集和发送的,以提高数据质量。在数据预处理中,应用基于 KNN 的缺失值估算来替换原始数据中的缺失值,并利用[公式:见正文]-分数归一化将数据缩放在一定范围内。然后,使用优化的 RFA 预测拥堵情况,并使用鲸鱼优化算法 (WOA) 尽可能有效地设置学习率,以减少误差并提高预测准确性。为避免拥塞,在传输层利用 IRED 方法实现无拥塞网络。在准确度、精确度、召回率、特异性和误差方面,对性能指标进行了评估,并与现有技术进行了比较,所提模型的准确度、精确度、召回率、特异性和误差值分别为 98%、98%、99%、98% 和 1%。为了确定网络的性能,建议的方法还对吞吐量和延迟进行了评估。最后,与现有技术和预测相比,所提出的方法性能更好,而且能准确识别网络中的拥塞避免情况。
{"title":"Congestion Avoidance in TCP Based on Optimized Random Forest with Improved Random Early Detection Algorithm","authors":"Ajay Kumar, Naveen Hemrajani","doi":"10.1142/s021946782550055x","DOIUrl":"https://doi.org/10.1142/s021946782550055x","url":null,"abstract":"Transmission control protocol (TCP) ensures that data are safely and accurately transported over the network for applications that use the transport protocol to allow reliable information delivery. Nowadays, internet usage in the network is growing and has been developing many protocols in the network layer. Congestion leads to packet loss, the high time required for data transmission in the TCP protocol transport layer for end-to-end connections is one of the biggest issues with the internet. An optimized random forest algorithm (RFA) with improved random early detection (IRED) for congestion prediction and avoidance in transport layer is proposed to overcome the drawbacks. Data are initially gathered and sent through data pre-processing to improve the data quality. For data pre-processing, KNN-based missing value imputation is applied to replace the values that are missing in raw data and [Formula: see text]-score normalization is utilized to scale the data in a certain range. Following that, congestion is predicted using an optimized RFA and whale optimization algorithm (WOA) is used to set the learning rate as efficiently as possible in order to reduce error and improve forecast accuracy. To avoid congestion, IRED method is utilized for a congestion-free network in the transport layer. Performance metrics are evaluated and compared with the existing techniques with respect to accuracy, precision, recall, specificity, and error, whose values that occur for the proposed model are 98%, 98%, 99%, 98%, and 1%. Throughput and latency are also evaluated in the proposed method to determine the performance of the network. Finally, the proposed method performs better when compared to the existing techniques and prediction, and avoidance of congestion is identified accurately in the network.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139865794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Modal Information Fusion for Localization of Emergency Vehicles 多模式信息融合用于紧急车辆定位
IF 1.6 Q3 Computer Science Pub Date : 2024-02-05 DOI: 10.1142/s0219467825500500
Arunakumar Joshi, Shrinivasrao B. Kulkarni
In urban and city environments, road transportation contributes significantly to the generation of substantial traffic. However, this surge in vehicles leads to complex issues, including hindered emergency vehicle movement due to high density and congestion. Scarcity of human personnel amplifies these challenges. As traffic conditions worsen, the need for automated solutions to manage emergency situations becomes more evident. Intelligent traffic monitoring can identify and prioritize emergency vehicles, potentially saving lives. However, categorizing emergency vehicles through visual analysis faces difficulties such as clutter, occlusions, and traffic variations. Visual-based techniques for vehicle detection rely on clear rear views, but this is problematic in dense traffic. In contrast, audio-based methods are resilient to the Doppler Effect from moving vehicles, but handling diverse background noises remains unexplored. Using acoustics for emergency vehicle localization presents challenges related to sensor range and real-world noise. Addressing these issues, this study introduces a novel solution: combining visual and audio data for enhanced detection and localization of emergency vehicles in road networks. Leveraging this multi-modal approach aims to bolster accuracy and robustness in emergency vehicle management. The proposed methodology consists of several key steps. The presence of an emergency vehicle is initially detected through the preprocessing of visual images, involving the removal of clutter and occlusions via an adaptive background model. Subsequently, a cell-wise classification strategy utilizing a customized Visual Geometry Group Network (VGGNet) deep learning model is employed to determine the presence of emergency vehicles within individual cells. To further reinforce the accuracy of emergency vehicle presence detection, the outcomes from the audio data analysis are integrated. This involves the extraction of spectral features from audio streams, followed by classification utilizing a support vector machine (SVM) model. The fusion of information derived from both visual and audio sources is utilized in the construction of a more comprehensive and refined traffic state map. This augmented map facilitates the effective management of emergency vehicle transit. In empirical evaluations, the proposed solution demonstrates its capability to mitigate challenges like visual clutter, occlusions, and variations in traffic density common issues encountered in traditional visual analysis methods. Notably, the proposed approach achieves an impressive accuracy rate of approximately 98.15% in the localization of emergency vehicles.
在城市和都市环境中,道路交通极大地增加了交通流量。然而,车辆的激增导致了复杂的问题,包括由于高密度和拥堵而导致的紧急车辆通行受阻。人员稀缺加剧了这些挑战。随着交通状况的恶化,管理紧急情况的自动化解决方案的需求变得更加明显。智能交通监控可以识别紧急车辆并对其进行优先排序,从而挽救生命。然而,通过视觉分析对紧急车辆进行分类面临着杂乱、遮挡和交通变化等困难。基于视觉的车辆检测技术依赖于清晰的后方视野,但这在密集的交通中很成问题。相比之下,基于音频的方法能够抵御移动车辆产生的多普勒效应,但对各种背景噪声的处理仍有待探索。使用声学技术进行紧急车辆定位面临着与传感器范围和现实世界噪声有关的挑战。针对这些问题,本研究提出了一种新颖的解决方案:结合视觉和音频数据,增强对道路网络中紧急车辆的检测和定位。利用这种多模式方法,旨在提高应急车辆管理的准确性和稳健性。建议的方法包括几个关键步骤。首先通过预处理视觉图像来检测紧急车辆的存在,包括通过自适应背景模型去除杂波和遮挡物。随后,利用定制的视觉几何组网络(VGGNet)深度学习模型,采用单元分类策略来确定各个单元内是否存在紧急车辆。为了进一步提高紧急车辆存在检测的准确性,还整合了音频数据分析的结果。这包括从音频流中提取频谱特征,然后利用支持向量机(SVM)模型进行分类。在构建更全面、更精细的交通状态地图时,将利用从视觉和音频来源获得的信息进行融合。这种增强地图有助于对紧急车辆过境进行有效管理。在实证评估中,所提出的解决方案证明了其有能力减轻视觉杂波、遮挡和交通密度变化等挑战,这些都是传统视觉分析方法中常见的问题。值得注意的是,所提出的方法在紧急车辆定位方面达到了令人印象深刻的准确率,约为 98.15%。
{"title":"Multi-Modal Information Fusion for Localization of Emergency Vehicles","authors":"Arunakumar Joshi, Shrinivasrao B. Kulkarni","doi":"10.1142/s0219467825500500","DOIUrl":"https://doi.org/10.1142/s0219467825500500","url":null,"abstract":"In urban and city environments, road transportation contributes significantly to the generation of substantial traffic. However, this surge in vehicles leads to complex issues, including hindered emergency vehicle movement due to high density and congestion. Scarcity of human personnel amplifies these challenges. As traffic conditions worsen, the need for automated solutions to manage emergency situations becomes more evident. Intelligent traffic monitoring can identify and prioritize emergency vehicles, potentially saving lives. However, categorizing emergency vehicles through visual analysis faces difficulties such as clutter, occlusions, and traffic variations. Visual-based techniques for vehicle detection rely on clear rear views, but this is problematic in dense traffic. In contrast, audio-based methods are resilient to the Doppler Effect from moving vehicles, but handling diverse background noises remains unexplored. Using acoustics for emergency vehicle localization presents challenges related to sensor range and real-world noise. Addressing these issues, this study introduces a novel solution: combining visual and audio data for enhanced detection and localization of emergency vehicles in road networks. Leveraging this multi-modal approach aims to bolster accuracy and robustness in emergency vehicle management. The proposed methodology consists of several key steps. The presence of an emergency vehicle is initially detected through the preprocessing of visual images, involving the removal of clutter and occlusions via an adaptive background model. Subsequently, a cell-wise classification strategy utilizing a customized Visual Geometry Group Network (VGGNet) deep learning model is employed to determine the presence of emergency vehicles within individual cells. To further reinforce the accuracy of emergency vehicle presence detection, the outcomes from the audio data analysis are integrated. This involves the extraction of spectral features from audio streams, followed by classification utilizing a support vector machine (SVM) model. The fusion of information derived from both visual and audio sources is utilized in the construction of a more comprehensive and refined traffic state map. This augmented map facilitates the effective management of emergency vehicle transit. In empirical evaluations, the proposed solution demonstrates its capability to mitigate challenges like visual clutter, occlusions, and variations in traffic density common issues encountered in traditional visual analysis methods. Notably, the proposed approach achieves an impressive accuracy rate of approximately 98.15% in the localization of emergency vehicles.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139805323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNN-HHOA: Deep Neural Network Optimization-Based Tabular Data Extraction from Compound Document Images DNN-HHOA:从复合文档图像中提取基于深度神经网络优化的表格数据
IF 1.6 Q3 Computer Science Pub Date : 2024-01-23 DOI: 10.1142/s021946782550010x
Devendra Tiwari, Anand Gupta, Rituraj Soni
Text information extraction from a tabular structure within a compound document image (CDI) is crucial to help better understand the document. The main objective of text extraction is to extract only helpful information since tabular data represents the relation between text lying in a tuple. Text from an image may be of low contrast, different style, size, alignment, orientation, and complex background. This work presents a three-step tabular text extraction process, including pre-processing, separation, and extraction. The pre-processing step uses the guide image filter to remove various kinds of noise from the image. Improved binomial thresholding (IBT) separates the text from the image. Then the tabular text is recognized and extracted from CDI using deep neural network (DNN). In this work, weights of DNN layers are optimized by the Harris Hawk optimization algorithm (HHOA). Obtained text and associated information can be used in many ways, including replicating the document in digital format, information retrieval, and text summarization. The proposed process is applied comprehensively to UNLV, TableBank, and ICDAR 2013 image datasets. The complete procedure is implemented in Python, and precision metrics performance is verified.
从复合文档图像(CDI)中的表格结构中提取文本信息对于更好地理解文档至关重要。文本提取的主要目的是只提取有用的信息,因为表格数据代表了元组中文本之间的关系。图像中的文本可能对比度低、风格、大小、对齐方式、方向和背景复杂。本作品提出了一种三步式表格文本提取流程,包括预处理、分离和提取。预处理步骤使用引导图像滤波器去除图像中的各种噪声。改进的二叉阈值法(IBT)将文本从图像中分离出来。然后使用深度神经网络(DNN)从 CDI 中识别和提取表格文本。在这项工作中,DNN 层的权重通过 Harris Hawk 优化算法(HHOA)进行优化。获取的文本和相关信息可用于多种用途,包括以数字格式复制文档、信息检索和文本摘要。建议的流程已全面应用于 UNLV、TableBank 和 ICDAR 2013 图像数据集。整个过程用 Python 实现,并验证了精确度指标的性能。
{"title":"DNN-HHOA: Deep Neural Network Optimization-Based Tabular Data Extraction from Compound Document Images","authors":"Devendra Tiwari, Anand Gupta, Rituraj Soni","doi":"10.1142/s021946782550010x","DOIUrl":"https://doi.org/10.1142/s021946782550010x","url":null,"abstract":"Text information extraction from a tabular structure within a compound document image (CDI) is crucial to help better understand the document. The main objective of text extraction is to extract only helpful information since tabular data represents the relation between text lying in a tuple. Text from an image may be of low contrast, different style, size, alignment, orientation, and complex background. This work presents a three-step tabular text extraction process, including pre-processing, separation, and extraction. The pre-processing step uses the guide image filter to remove various kinds of noise from the image. Improved binomial thresholding (IBT) separates the text from the image. Then the tabular text is recognized and extracted from CDI using deep neural network (DNN). In this work, weights of DNN layers are optimized by the Harris Hawk optimization algorithm (HHOA). Obtained text and associated information can be used in many ways, including replicating the document in digital format, information retrieval, and text summarization. The proposed process is applied comprehensively to UNLV, TableBank, and ICDAR 2013 image datasets. The complete procedure is implemented in Python, and precision metrics performance is verified.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139605009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Image and Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1