首页 > 最新文献

Journal of Electronic Imaging最新文献

英文 中文
LGD-FCOS: driver distraction detection using improved FCOS based on local and global knowledge distillation LGD-FCOS:利用基于本地和全局知识提炼的改进型 FCOS 检测驾驶员分心
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043046
Kunbiao Li, Xiaohui Yang, Jing Wang, Feng Zhang, Tao Xu
Ensuring safety on the road is crucial, and detecting driving distractions plays a vital role in achieving this goal. Accurate identification of distracted driving behaviors facilitates prompt intervention, thereby contributing to a reduction in accidents. We introduce an advanced fully convolutional one-stage (FCOS) object detection algorithm tailored for driving distraction detection that leverages the knowledge distillation framework. Our proposed methodology enhances the conventional FCOS algorithm through the integration of the selective kernel split-attention module. This module bolsters the performance of the backbone network, ResNet, leading to a substantial improvement in the accuracy of the FCOS target detection algorithm. In addition, we incorporate a knowledge distillation framework equipped with a novel local and global knowledge distillation loss function. This framework facilitates the student network to achieve accuracy levels comparable to that of the teacher network while maintaining a reduced parameter count. The outcomes of our approach are promising, achieving a remarkable accuracy of 92.25% with a compact model size of 31.85 million parameters. This advancement paves the way for more efficient and accurate distracted driving detection systems, ultimately contributing to enhanced road safety.
确保道路安全至关重要,而检测驾驶分心行为在实现这一目标方面发挥着至关重要的作用。准确识别分心驾驶行为有助于及时干预,从而减少事故。我们介绍了一种先进的全卷积单级(FCOS)对象检测算法,该算法利用知识提炼框架,专为驾驶分心检测量身定制。我们提出的方法通过整合选择性内核分离注意力模块,增强了传统的 FCOS 算法。该模块增强了骨干网络 ResNet 的性能,从而大幅提高了 FCOS 目标检测算法的准确性。此外,我们还采用了知识蒸馏框架,该框架配备了新颖的局部和全局知识蒸馏损失函数。该框架有助于学生网络达到与教师网络相当的精度水平,同时保持较少的参数数量。我们的方法取得了可喜的成果,以 3185 万个参数的紧凑模型规模实现了 92.25% 的显著准确率。这一进步为更高效、更准确的分心驾驶检测系统铺平了道路,最终有助于提高道路安全。
{"title":"LGD-FCOS: driver distraction detection using improved FCOS based on local and global knowledge distillation","authors":"Kunbiao Li, Xiaohui Yang, Jing Wang, Feng Zhang, Tao Xu","doi":"10.1117/1.jei.33.4.043046","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043046","url":null,"abstract":"Ensuring safety on the road is crucial, and detecting driving distractions plays a vital role in achieving this goal. Accurate identification of distracted driving behaviors facilitates prompt intervention, thereby contributing to a reduction in accidents. We introduce an advanced fully convolutional one-stage (FCOS) object detection algorithm tailored for driving distraction detection that leverages the knowledge distillation framework. Our proposed methodology enhances the conventional FCOS algorithm through the integration of the selective kernel split-attention module. This module bolsters the performance of the backbone network, ResNet, leading to a substantial improvement in the accuracy of the FCOS target detection algorithm. In addition, we incorporate a knowledge distillation framework equipped with a novel local and global knowledge distillation loss function. This framework facilitates the student network to achieve accuracy levels comparable to that of the teacher network while maintaining a reduced parameter count. The outcomes of our approach are promising, achieving a remarkable accuracy of 92.25% with a compact model size of 31.85 million parameters. This advancement paves the way for more efficient and accurate distracted driving detection systems, ultimately contributing to enhanced road safety.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and robust object region segmentation with self-organized lattice Boltzmann based active contour method 基于主动轮廓法的自组织晶格玻尔兹曼快速鲁棒物体区域分割法
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043050
Fatema A. Albalooshi, Vijayan K. Asari
We propose an approach leveraging the power of self-organizing maps (SOMs) in conjunction with a multiscale local image fitting (LIF) level-set function to enhance the capabilities of the region-based active contour model (ACM). In addition, we employ the lattice Boltzmann method (LBM) to ensure efficient convergence during the segmentation process. The SOM learns the underlying patterns and structures of both the background region and the object of interest region in an image, allowing for more accurate and robust segmentation results. Our multiscale LIF level-set approach influences image-specific fitting criteria into the energy functional, considering the features extracted by the SOM. Finally, the LBM is utilized to solve the level set equation and evolve the contour, allowing for a faster contour evolution. To evaluate the effectiveness of our approach, we performed our experiments on the challenging Pascal Visual Object Classes Challenge 2012 dataset. This dataset consists of images containing objects with diverse characteristics, such as illumination variations, shadows, occlusions, scale changes, and cluttered backgrounds. Our experimental results highlight the efficiency and robustness of our proposed method in achieving accurate segmentation. In terms of accuracy, our approach outperforms state-of-the-art learning-based ACMs, reaching a precision value of up to 93%. Moreover, our approach also demonstrates improvements in terms of computation time, leading to a reduction in computational time of 76% compared with the state-of-the-art methods. By integrating SOMs and the LBM, we enhance the efficiency of the segmentation process. This enables us to achieve accurate segmentation within reasonable time frames, making our method practical for real-world applications. Furthermore, we conducted experiments on medical imagery and thermal imagery, which yielded precise results.
我们提出了一种方法,利用自组织图(SOM)的强大功能,结合多尺度局部图像拟合(LIF)水平集函数,来增强基于区域的主动轮廓模型(ACM)的能力。此外,我们还采用了格子波尔兹曼法(LBM),以确保在分割过程中高效收敛。SOM 可以学习图像中背景区域和感兴趣对象区域的基本模式和结构,从而获得更准确、更稳健的分割结果。考虑到 SOM 提取的特征,我们的多尺度 LIF 水平集方法将特定于图像的拟合标准影响到能量函数中。最后,我们利用 LBM 来求解水平集方程并演化轮廓,从而实现更快的轮廓演化。为了评估我们方法的有效性,我们在具有挑战性的 Pascal Visual Object Classes Challenge 2012 数据集上进行了实验。该数据集由包含各种特征物体的图像组成,如光照变化、阴影、遮挡、比例变化和杂乱的背景。实验结果凸显了我们提出的方法在实现精确分割方面的高效性和鲁棒性。在精确度方面,我们的方法优于最先进的基于学习的 ACM,精确度高达 93%。此外,我们的方法在计算时间方面也有所改进,与最先进的方法相比,计算时间减少了 76%。通过整合 SOM 和 LBM,我们提高了分割过程的效率。这使我们能够在合理的时间范围内实现精确的分割,从而使我们的方法在实际应用中切实可行。此外,我们还在医疗图像和热图像上进行了实验,并取得了精确的结果。
{"title":"Fast and robust object region segmentation with self-organized lattice Boltzmann based active contour method","authors":"Fatema A. Albalooshi, Vijayan K. Asari","doi":"10.1117/1.jei.33.4.043050","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043050","url":null,"abstract":"We propose an approach leveraging the power of self-organizing maps (SOMs) in conjunction with a multiscale local image fitting (LIF) level-set function to enhance the capabilities of the region-based active contour model (ACM). In addition, we employ the lattice Boltzmann method (LBM) to ensure efficient convergence during the segmentation process. The SOM learns the underlying patterns and structures of both the background region and the object of interest region in an image, allowing for more accurate and robust segmentation results. Our multiscale LIF level-set approach influences image-specific fitting criteria into the energy functional, considering the features extracted by the SOM. Finally, the LBM is utilized to solve the level set equation and evolve the contour, allowing for a faster contour evolution. To evaluate the effectiveness of our approach, we performed our experiments on the challenging Pascal Visual Object Classes Challenge 2012 dataset. This dataset consists of images containing objects with diverse characteristics, such as illumination variations, shadows, occlusions, scale changes, and cluttered backgrounds. Our experimental results highlight the efficiency and robustness of our proposed method in achieving accurate segmentation. In terms of accuracy, our approach outperforms state-of-the-art learning-based ACMs, reaching a precision value of up to 93%. Moreover, our approach also demonstrates improvements in terms of computation time, leading to a reduction in computational time of 76% compared with the state-of-the-art methods. By integrating SOMs and the LBM, we enhance the efficiency of the segmentation process. This enables us to achieve accurate segmentation within reasonable time frames, making our method practical for real-world applications. Furthermore, we conducted experiments on medical imagery and thermal imagery, which yielded precise results.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal enhancement method based on dense connection structure for compressed video 基于密集连接结构的压缩视频时空增强方法
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043054
Hongyao Li, Xiaohai He, Xiaodong Bi, Shuhua Xiong, Honggang Chen
Under limited bandwidth conditions, video transmission often employs lossy compression to reduce the data volume, inevitably introducing compression noise. Quality enhancement of compressed videos can effectively recover the information loss incurred during the compression process. Currently, multi-frame quality enhancement of compressed videos has shown performance advantages compared to single-frame methods, as it utilizes the temporal correlation of videos. Methods based on deformable convolutions obtain spatio-temporal fusion features for reconstruction through multi-frame alignment. However, due to the limited utilization of deep information and sensitivity to alignment accuracy, these methods yield suboptimal results, especially in scenarios with scene changes and intense motion. To overcome these limitations, we propose a dense network-based quality enhancement method to obtain more accurate spatio-temporal fusion features. Specifically, the deep spatial features are first extracted from the to-be-enhanced frames using dense connections, then combined with the aligned features obtained from deformable convolution through the convolution and attention mechanism to make the network more attentive to useful branches in an adaptive way, and finally, the enhanced frames are obtained through the quality enhancement module of the dense connection structure. The experimental results show that when the quantization parameter is 37, the proposed method can improve the average peak signal-to-noise ratio by 0.99 dB in the lowdelay_P configuration.
在带宽有限的条件下,视频传输通常采用有损压缩来减少数据量,这不可避免地会引入压缩噪声。对压缩视频进行质量增强可以有效弥补压缩过程中的信息损失。目前,与单帧方法相比,压缩视频的多帧质量增强方法因利用了视频的时间相关性而显示出性能优势。基于可变形卷积的方法可通过多帧对齐获得用于重建的时空融合特征。然而,由于对深度信息的利用有限以及对配准精度的敏感性,这些方法产生的结果并不理想,尤其是在场景变化和运动剧烈的场景中。为了克服这些局限性,我们提出了一种基于密集网络的质量增强方法,以获得更精确的时空融合特征。具体来说,首先利用密集连接从待增强帧中提取深度空间特征,然后通过卷积和关注机制与可变形卷积得到的配准特征相结合,使网络以自适应的方式更加关注有用的分支,最后通过密集连接结构的质量增强模块得到增强帧。实验结果表明,当量化参数为 37 时,在低延迟_P 配置下,所提出的方法可以将平均峰值信噪比提高 0.99 dB。
{"title":"Spatio-temporal enhancement method based on dense connection structure for compressed video","authors":"Hongyao Li, Xiaohai He, Xiaodong Bi, Shuhua Xiong, Honggang Chen","doi":"10.1117/1.jei.33.4.043054","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043054","url":null,"abstract":"Under limited bandwidth conditions, video transmission often employs lossy compression to reduce the data volume, inevitably introducing compression noise. Quality enhancement of compressed videos can effectively recover the information loss incurred during the compression process. Currently, multi-frame quality enhancement of compressed videos has shown performance advantages compared to single-frame methods, as it utilizes the temporal correlation of videos. Methods based on deformable convolutions obtain spatio-temporal fusion features for reconstruction through multi-frame alignment. However, due to the limited utilization of deep information and sensitivity to alignment accuracy, these methods yield suboptimal results, especially in scenarios with scene changes and intense motion. To overcome these limitations, we propose a dense network-based quality enhancement method to obtain more accurate spatio-temporal fusion features. Specifically, the deep spatial features are first extracted from the to-be-enhanced frames using dense connections, then combined with the aligned features obtained from deformable convolution through the convolution and attention mechanism to make the network more attentive to useful branches in an adaptive way, and finally, the enhanced frames are obtained through the quality enhancement module of the dense connection structure. The experimental results show that when the quantization parameter is 37, the proposed method can improve the average peak signal-to-noise ratio by 0.99 dB in the lowdelay_P configuration.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion 3D object tracking method based on region and point cloud registration 基于区域和点云注册的三维物体融合跟踪方法
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043048
Yixin Jin, Jiawei Zhang, Yinhua Liu, Wei Mo, Hua Chen
Tracking rigid objects in three-dimensional (3D) space and 6DoF pose estimating are essential tasks in the field of computer vision. In general, the region-based 3D tracking methods have emerged as the optimal solution for weakly textured objects tracking within intricate scenes in recent years. However, tracking robustness in situations such as partial occlusion and similarly colored backgrounds is relatively poor. To address this issue, an improved region-based tracking method is proposed for achieving accurate 3D object tracking in the presence of partial occlusion and similarly colored backgrounds. First, a regional cost function based on the correspondence line is adopted, and a step function is proposed to alleviate the misclassification of sampling points in scenes. Afterward, in order to reduce the influence of similarly colored background and partial occlusion on the tracking performance, a weight function that fuses color and distance information of the object contour is proposed. Finally, the transformation matrix of the inter-frame motion obtained by the above region-based tracking method is used to initialize the model point cloud, and an improved point cloud registration method is adopted to achieve accurate registration between the model point cloud and the object point cloud to further realize accurate object tracking. The experiments are conducted on the region-based object tracking (RBOT) dataset and the real scenes, respectively. The results demonstrate that the proposed method outperforms the state-of-the-art region-based 3D object tracking method. On the RBOT dataset, the average tracking success rate is improved by 0.5% across five image sequences. In addition, in real scenes with similarly colored backgrounds and partial occlusion, the average tracking accuracy is improved by 0.28 and 0.26 mm, respectively.
在三维(3D)空间中跟踪刚性物体和 6DoF 姿态估计是计算机视觉领域的基本任务。一般来说,近年来基于区域的三维跟踪方法已成为复杂场景中弱纹理物体跟踪的最佳解决方案。然而,在部分遮挡和背景颜色相似等情况下,跟踪的鲁棒性相对较差。针对这一问题,我们提出了一种改进的基于区域的跟踪方法,以在部分遮挡和背景颜色相似的情况下实现精确的三维物体跟踪。首先,采用了基于对应线的区域代价函数,并提出了一个阶跃函数来减轻场景中采样点的误分类。然后,为了减少相似颜色背景和局部遮挡对跟踪性能的影响,提出了一种融合物体轮廓颜色和距离信息的权重函数。最后,利用上述基于区域的跟踪方法得到的帧间运动变换矩阵初始化模型点云,并采用改进的点云配准方法实现模型点云与物体点云的精确配准,进一步实现精确的物体跟踪。实验分别在基于区域的物体跟踪(RBOT)数据集和真实场景中进行。结果表明,所提出的方法优于最先进的基于区域的三维物体跟踪方法。在 RBOT 数据集上,五个图像序列的平均跟踪成功率提高了 0.5%。此外,在具有相似颜色背景和部分遮挡的真实场景中,平均跟踪精度分别提高了 0.28 毫米和 0.26 毫米。
{"title":"Fusion 3D object tracking method based on region and point cloud registration","authors":"Yixin Jin, Jiawei Zhang, Yinhua Liu, Wei Mo, Hua Chen","doi":"10.1117/1.jei.33.4.043048","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043048","url":null,"abstract":"Tracking rigid objects in three-dimensional (3D) space and 6DoF pose estimating are essential tasks in the field of computer vision. In general, the region-based 3D tracking methods have emerged as the optimal solution for weakly textured objects tracking within intricate scenes in recent years. However, tracking robustness in situations such as partial occlusion and similarly colored backgrounds is relatively poor. To address this issue, an improved region-based tracking method is proposed for achieving accurate 3D object tracking in the presence of partial occlusion and similarly colored backgrounds. First, a regional cost function based on the correspondence line is adopted, and a step function is proposed to alleviate the misclassification of sampling points in scenes. Afterward, in order to reduce the influence of similarly colored background and partial occlusion on the tracking performance, a weight function that fuses color and distance information of the object contour is proposed. Finally, the transformation matrix of the inter-frame motion obtained by the above region-based tracking method is used to initialize the model point cloud, and an improved point cloud registration method is adopted to achieve accurate registration between the model point cloud and the object point cloud to further realize accurate object tracking. The experiments are conducted on the region-based object tracking (RBOT) dataset and the real scenes, respectively. The results demonstrate that the proposed method outperforms the state-of-the-art region-based 3D object tracking method. On the RBOT dataset, the average tracking success rate is improved by 0.5% across five image sequences. In addition, in real scenes with similarly colored backgrounds and partial occlusion, the average tracking accuracy is improved by 0.28 and 0.26 mm, respectively.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coded target recognition algorithm for vision measurement 用于视觉测量的编码目标识别算法
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043058
Peng Zhang, Qing Liu, Shengpeng Li, Fei Liu, Wenjing Liu
Circularly coded targets are widely used in 3D measurement, target tracking, augmented reality, and other fields as feature points to be measured. The traditional coded target recognition algorithm is easily affected by illumination changes and excessive shooting angles, and the recognition accuracy is significantly reduced. Therefore, a new coded target recognition algorithm is required to reduce the effects of illumination and angle on the recognition process. The influence of illumination on the recognition of coding targets was analyzed in depth, and the advantages and disadvantages of traditional algorithms are discussed. A new adaptive threshold image segmentation method was designed, which, in contrast to traditional algorithms, incorporates the feature information of coding targets in the determination of the image segmentation threshold. The experimental results show that this method significantly reduces the influence of illumination variations and cluttered backgrounds on image segmentation. Similarly, the influence of different angles on the recognition process of coding targets was studied. The coding target is decoded by radial sampling of the dense point network, which can effectively reduce the influence of angle on the recognition process and improve the recognition accuracy of coding targets and the robustness of the algorithm. In addition, further experiments verified that the proposed detection and recognition algorithm can better extract and identify with high positioning accuracy and decoding success rate. It can achieve accurate positioning even in complex environments and meet the needs of industrial measurements.
环形编码目标作为待测特征点被广泛应用于三维测量、目标跟踪、增强现实等领域。传统的编码目标识别算法容易受到光照变化和拍摄角度过大的影响,识别精度明显降低。因此,需要一种新的编码目标识别算法来减少光照和角度对识别过程的影响。本文深入分析了光照对编码目标识别的影响,并讨论了传统算法的优缺点。设计了一种新的自适应阈值图像分割方法,与传统算法相比,该方法在确定图像分割阈值时纳入了编码目标的特征信息。实验结果表明,该方法显著降低了光照变化和杂乱背景对图像分割的影响。同样,还研究了不同角度对编码目标识别过程的影响。通过密集点网络的径向采样对编码目标进行解码,可以有效降低角度对识别过程的影响,提高编码目标的识别精度和算法的鲁棒性。此外,进一步的实验验证了所提出的检测和识别算法能更好地进行提取和识别,具有较高的定位精度和解码成功率。即使在复杂环境下也能实现精确定位,满足工业测量的需求。
{"title":"Coded target recognition algorithm for vision measurement","authors":"Peng Zhang, Qing Liu, Shengpeng Li, Fei Liu, Wenjing Liu","doi":"10.1117/1.jei.33.4.043058","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043058","url":null,"abstract":"Circularly coded targets are widely used in 3D measurement, target tracking, augmented reality, and other fields as feature points to be measured. The traditional coded target recognition algorithm is easily affected by illumination changes and excessive shooting angles, and the recognition accuracy is significantly reduced. Therefore, a new coded target recognition algorithm is required to reduce the effects of illumination and angle on the recognition process. The influence of illumination on the recognition of coding targets was analyzed in depth, and the advantages and disadvantages of traditional algorithms are discussed. A new adaptive threshold image segmentation method was designed, which, in contrast to traditional algorithms, incorporates the feature information of coding targets in the determination of the image segmentation threshold. The experimental results show that this method significantly reduces the influence of illumination variations and cluttered backgrounds on image segmentation. Similarly, the influence of different angles on the recognition process of coding targets was studied. The coding target is decoded by radial sampling of the dense point network, which can effectively reduce the influence of angle on the recognition process and improve the recognition accuracy of coding targets and the robustness of the algorithm. In addition, further experiments verified that the proposed detection and recognition algorithm can better extract and identify with high positioning accuracy and decoding success rate. It can achieve accurate positioning even in complex environments and meet the needs of industrial measurements.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Settlement detection from satellite imagery using fully convolutional network 利用全卷积网络从卫星图像中探测沉降点
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043056
Tayaba Anjum, Ahsan Ali, Muhammad Tahir Naseem
Geospatial information is essential for development planning, like in the context of land and resource management. Existing research mainly focuses on multi-spectral or panchromatic images with specific sensor details. Incorporating multi-sensory panchromatic images at different scales makes the segmentation problem challenging. In this work, we propose a pixel-based globally trained model with a deep learning network to improve the segmentation results over existing patch-based networks. The proposed model consists of the encoder-decoder mechanism for semantic segmentation. Convolution and pooling layers are used at the encoding phase and transposed convolution and convolution layers are used for the decoding phase. Experiments show about 98.95% correct detection rate and 0.07% false detection rate of our proposed methodology on benchmark images. We prove the effectiveness of the proposed methodology by doing comparisons with previous work.
地理空间信息对于发展规划至关重要,例如在土地和资源管理方面。现有研究主要集中在具有特定传感器细节的多光谱或全色图像上。将不同尺度的多感光全色图像整合在一起,使分割问题变得极具挑战性。在这项工作中,我们提出了一种基于像素的全局训练模型,该模型采用深度学习网络,与现有的基于斑块的网络相比,能改善分割结果。所提出的模型包括用于语义分割的编码器-解码器机制。编码阶段使用卷积层和池化层,解码阶段使用转置卷积层和卷积层。实验表明,我们提出的方法在基准图像上的正确检测率约为 98.95%,错误检测率为 0.07%。我们通过与之前的工作进行比较,证明了所提方法的有效性。
{"title":"Settlement detection from satellite imagery using fully convolutional network","authors":"Tayaba Anjum, Ahsan Ali, Muhammad Tahir Naseem","doi":"10.1117/1.jei.33.4.043056","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043056","url":null,"abstract":"Geospatial information is essential for development planning, like in the context of land and resource management. Existing research mainly focuses on multi-spectral or panchromatic images with specific sensor details. Incorporating multi-sensory panchromatic images at different scales makes the segmentation problem challenging. In this work, we propose a pixel-based globally trained model with a deep learning network to improve the segmentation results over existing patch-based networks. The proposed model consists of the encoder-decoder mechanism for semantic segmentation. Convolution and pooling layers are used at the encoding phase and transposed convolution and convolution layers are used for the decoding phase. Experiments show about 98.95% correct detection rate and 0.07% false detection rate of our proposed methodology on benchmark images. We prove the effectiveness of the proposed methodology by doing comparisons with previous work.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepLab-Rail: semantic segmentation network for railway scenes based on encoder-decoder structure DeepLab-Rail:基于编码器-解码器结构的铁路场景语义分割网络
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043038
Qingsong Zeng, Linxuan Zhang, Yuan Wang, Xiaolong Luo, Yannan Chen
Understanding the perimeter objects and environment changes in railway scenes is crucial for ensuring the safety of train operation. Semantic segmentation is the basis of intelligent perception and scene understanding. Railway scene categories are complex and effective features are challenging to extract. This work proposes a semantic segmentation network DeepLab-Rail based on classic yet effective encoder-decoder structure. It contains a lightweight feature extraction backbone embedded with channel attention (CA) mechanism to keep computational complexity low. To enrich the receptive fields of convolutional modules, we design a parallel and cascade convolution module called compound-atrous spatial pyramid pooling and a combination of dilated convolution ratio is selected through experiments to obtain multi-scale features. To fully use the shallow features and the high-level features, efficient CA mechanism is introduced and also the mixed loss function is designed for the problem of unbalanced label categories of the dataset. Finally, the experimental results on the RailSem19 railway dataset show that the mean intersection over union reaches 65.52% and the PA reaches 88.48%. The segmentation performance of railway confusing facilities, such as signal lights and catenary pillars, has been significantly improved and surpasses other advanced methods to our best knowledge.
理解铁路场景中的周边物体和环境变化对于确保列车运行安全至关重要。语义分割是智能感知和场景理解的基础。铁路场景类别复杂,提取有效特征具有挑战性。本研究基于经典而有效的编码器-解码器结构,提出了语义分割网络 DeepLab-Rail。它包含一个嵌入通道注意(CA)机制的轻量级特征提取骨干网,以保持较低的计算复杂度。为了丰富卷积模块的感受野,我们设计了一种并行和级联卷积模块,称为复合无性空间金字塔池化,并通过实验选择了扩张卷积比的组合,以获得多尺度特征。为了充分利用浅层特征和高层特征,引入了高效的 CA 机制,并针对数据集标签类别不平衡的问题设计了混合损失函数。最后,RailSem19 铁路数据集的实验结果表明,平均交叉率达到 65.52%,PA 达到 88.48%。据我们所知,信号灯和导体支柱等铁路迷惑设施的分割性能得到了显著提高,并超越了其他先进方法。
{"title":"DeepLab-Rail: semantic segmentation network for railway scenes based on encoder-decoder structure","authors":"Qingsong Zeng, Linxuan Zhang, Yuan Wang, Xiaolong Luo, Yannan Chen","doi":"10.1117/1.jei.33.4.043038","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043038","url":null,"abstract":"Understanding the perimeter objects and environment changes in railway scenes is crucial for ensuring the safety of train operation. Semantic segmentation is the basis of intelligent perception and scene understanding. Railway scene categories are complex and effective features are challenging to extract. This work proposes a semantic segmentation network DeepLab-Rail based on classic yet effective encoder-decoder structure. It contains a lightweight feature extraction backbone embedded with channel attention (CA) mechanism to keep computational complexity low. To enrich the receptive fields of convolutional modules, we design a parallel and cascade convolution module called compound-atrous spatial pyramid pooling and a combination of dilated convolution ratio is selected through experiments to obtain multi-scale features. To fully use the shallow features and the high-level features, efficient CA mechanism is introduced and also the mixed loss function is designed for the problem of unbalanced label categories of the dataset. Finally, the experimental results on the RailSem19 railway dataset show that the mean intersection over union reaches 65.52% and the PA reaches 88.48%. The segmentation performance of railway confusing facilities, such as signal lights and catenary pillars, has been significantly improved and surpasses other advanced methods to our best knowledge.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep inner-knuckle-print recognition using lightweight Siamese network 利用轻量级连体网络深度识别关节内侧指纹
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043034
Hongxia Wang, Hongwu Yuan
Texture features and stability have attracted much attention in the field of biometric recognition. The inner-knuckle print is unique and not easy to forge, so it is widely used in personal identity authentication, criminal detection, and other fields. In recent years, the rapid development of deep learning technology has brought new opportunities for internal-knuckle recognition. We propose a deep inner-knuckle print recognition method named LSKNet network. By establishing a lightweight Siamese network model and combining it with a robust cost function, we can realize efficient and accurate recognition of the inner-knuckle print. Compared to traditional methods and other deep learning methods, the network has lower model complexity and computational resource requirements, which enables it to run under lower hardware configurations. In addition, this paper also uses all the knuckle prints of four fingers for concatenated fusion recognition. Experimental results demonstrate that this method has achieved satisfactory results in the task of internal-knuckle print recognition.
纹理特征和稳定性在生物识别领域备受关注。指关节内侧指纹具有唯一性、不易伪造等特点,因此被广泛应用于个人身份认证、犯罪侦查等领域。近年来,深度学习技术的快速发展为指关节内侧识别带来了新的机遇。我们提出了一种名为 LSKNet 网络的深度指关节内侧指纹识别方法。通过建立轻量级连体网络模型,并将其与鲁棒成本函数相结合,我们可以实现高效、准确的指关节内侧指纹识别。与传统方法和其他深度学习方法相比,该网络的模型复杂度更低,对计算资源的要求也更低,因此可以在较低的硬件配置下运行。此外,本文还利用四个手指的所有指关节指纹进行了串联融合识别。实验结果表明,该方法在内侧指关节指纹识别任务中取得了令人满意的结果。
{"title":"Deep inner-knuckle-print recognition using lightweight Siamese network","authors":"Hongxia Wang, Hongwu Yuan","doi":"10.1117/1.jei.33.4.043034","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043034","url":null,"abstract":"Texture features and stability have attracted much attention in the field of biometric recognition. The inner-knuckle print is unique and not easy to forge, so it is widely used in personal identity authentication, criminal detection, and other fields. In recent years, the rapid development of deep learning technology has brought new opportunities for internal-knuckle recognition. We propose a deep inner-knuckle print recognition method named LSKNet network. By establishing a lightweight Siamese network model and combining it with a robust cost function, we can realize efficient and accurate recognition of the inner-knuckle print. Compared to traditional methods and other deep learning methods, the network has lower model complexity and computational resource requirements, which enables it to run under lower hardware configurations. In addition, this paper also uses all the knuckle prints of four fingers for concatenated fusion recognition. Experimental results demonstrate that this method has achieved satisfactory results in the task of internal-knuckle print recognition.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-tuned Siamese neural network–based multimodal vein biometric system with hybrid firefly–particle swarm optimization 基于暹罗神经网络的微调型多模态静脉生物识别系统与混合萤火虫-粒子群优化技术
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043035
Gurunathan Velliangiri, Sudhakar Radhakrishnan
Recent advancements in biometric recognition focus on vein pattern–based person authentication systems. We present a multimodal biometric system using dorsal and finger vein images. By combining Siamese neural networks (SNNs) with hybrid firefly–particle swarm optimization (FF-PSO), we optimize finger and dorsal vein identification and classification. Using FF-PSO to tune SNN parameters is an innovative hybrid optimization approach designed to address the complexities of vein pattern recognition. The proposed system is tested with two public databases: the SDUMLA-HMT finger vein dataset and the Dr. Badawi hand vein dataset. The efficacy of the method is assessed using performance measures such as recall, accuracy, precision, F1 score, false acceptance rate, false rejection rate, and equal error rate. The experimental findings demonstrate that the proposed system achieves an accuracy of 99.5% with the fine-tune SNN and FF-PSO techniques and preprocessing module. The proposed system is also compared with various existing state-of-the-art techniques.
生物识别技术的最新进展主要集中在基于静脉模式的身份验证系统上。我们提出了一种使用背静脉和手指静脉图像的多模态生物识别系统。通过将连体神经网络(SNN)与混合萤火虫-粒子群优化(FF-PSO)相结合,我们优化了手指和背静脉的识别和分类。使用 FF-PSO 调整 SNN 参数是一种创新的混合优化方法,旨在解决复杂的静脉模式识别问题。我们使用两个公共数据库对所提出的系统进行了测试:SDUMLA-HMT 手指静脉数据集和 Dr. Badawi 手部静脉数据集。使用召回率、准确率、精确度、F1 分数、错误接受率、错误拒绝率和相等错误率等性能指标评估了该方法的功效。实验结果表明,利用微调 SNN 和 FF-PSO 技术以及预处理模块,拟议系统的准确率达到了 99.5%。此外,还将提议的系统与现有的各种先进技术进行了比较。
{"title":"Fine-tuned Siamese neural network–based multimodal vein biometric system with hybrid firefly–particle swarm optimization","authors":"Gurunathan Velliangiri, Sudhakar Radhakrishnan","doi":"10.1117/1.jei.33.4.043035","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043035","url":null,"abstract":"Recent advancements in biometric recognition focus on vein pattern–based person authentication systems. We present a multimodal biometric system using dorsal and finger vein images. By combining Siamese neural networks (SNNs) with hybrid firefly–particle swarm optimization (FF-PSO), we optimize finger and dorsal vein identification and classification. Using FF-PSO to tune SNN parameters is an innovative hybrid optimization approach designed to address the complexities of vein pattern recognition. The proposed system is tested with two public databases: the SDUMLA-HMT finger vein dataset and the Dr. Badawi hand vein dataset. The efficacy of the method is assessed using performance measures such as recall, accuracy, precision, F1 score, false acceptance rate, false rejection rate, and equal error rate. The experimental findings demonstrate that the proposed system achieves an accuracy of 99.5% with the fine-tune SNN and FF-PSO techniques and preprocessing module. The proposed system is also compared with various existing state-of-the-art techniques.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint merging and pruning: adaptive selection of better token compression strategy 联合合并和剪枝:自适应选择更好的标记压缩策略
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-01 DOI: 10.1117/1.jei.33.4.043045
Wei Peng, Liancheng Zeng, Lizhuo Zhang, Yue Shen
Vision transformer (ViT) is widely used to handle artificial intelligence tasks, making significant advances in a variety of computer vision tasks. However, due to the secondary interaction between tokens, the ViT model is inefficient, which greatly limits the application of the ViT model in real scenarios. In recent years, people have noticed that not all tokens contribute equally to the final prediction of the model, so token compression methods have been proposed, which are mainly divided into token pruning and token merging. Yet, we believe that neither pruning only to reduce non-critical tokens nor merging to reduce similar tokens are optimal strategies for token compression. To overcome this challenge, this work proposes a token compression framework: joint merging and pruning (JMP), which adaptively selects a better token compression strategy based on the similarity between critical tokens and non-critical tokens in each sample. JMP effectively reduces computational complexity while maintaining model performance and does not require the introduction of additional trainable parameters, achieving a good balance between efficiency and performance. Taking DeiT-S as an example, JMP reduces floating point operations by 35% and increases throughput by more than 45% while only decreasing accuracy by 0.2% on ImageNet.
视觉变换器(ViT)被广泛用于处理人工智能任务,在各种计算机视觉任务中取得了重大进展。然而,由于令牌之间的二次交互,ViT 模型效率低下,这极大地限制了 ViT 模型在实际场景中的应用。近年来,人们注意到并非所有标记都能对模型的最终预测做出同样的贡献,因此有人提出了标记压缩方法,主要分为标记剪枝和标记合并。然而,我们认为,无论是只剪枝以减少非关键标记,还是合并以减少相似标记,都不是标记压缩的最佳策略。为了克服这一挑战,这项工作提出了一种标记压缩框架:联合合并和剪枝(JMP),它可以根据每个样本中关键标记和非关键标记之间的相似性,自适应地选择更好的标记压缩策略。JMP 在保持模型性能的同时有效降低了计算复杂度,并且不需要引入额外的可训练参数,在效率和性能之间实现了良好的平衡。以 DeiT-S 为例,JMP 在 ImageNet 上的浮点运算减少了 35%,吞吐量提高了 45%,而准确率仅降低了 0.2%。
{"title":"Joint merging and pruning: adaptive selection of better token compression strategy","authors":"Wei Peng, Liancheng Zeng, Lizhuo Zhang, Yue Shen","doi":"10.1117/1.jei.33.4.043045","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043045","url":null,"abstract":"Vision transformer (ViT) is widely used to handle artificial intelligence tasks, making significant advances in a variety of computer vision tasks. However, due to the secondary interaction between tokens, the ViT model is inefficient, which greatly limits the application of the ViT model in real scenarios. In recent years, people have noticed that not all tokens contribute equally to the final prediction of the model, so token compression methods have been proposed, which are mainly divided into token pruning and token merging. Yet, we believe that neither pruning only to reduce non-critical tokens nor merging to reduce similar tokens are optimal strategies for token compression. To overcome this challenge, this work proposes a token compression framework: joint merging and pruning (JMP), which adaptively selects a better token compression strategy based on the similarity between critical tokens and non-critical tokens in each sample. JMP effectively reduces computational complexity while maintaining model performance and does not require the introduction of additional trainable parameters, achieving a good balance between efficiency and performance. Taking DeiT-S as an example, JMP reduces floating point operations by 35% and increases throughput by more than 45% while only decreasing accuracy by 0.2% on ImageNet.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Electronic Imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1