Journal of Electronic Imaging最新文献_第10页

Receptive field enhancement and attention feature fusion network for underwater object detection 用于水下物体探测的感知场增强和注意力特征融合网络

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033007

Huipu Xu, Zegang He, Shuo Chen

Underwater environments have characteristics such as unclear imaging and complex backgrounds that lead to poor performance when applying mainstream object detection models directly. To improve the accuracy of underwater object detection, we propose an object detection model, RF-YOLO, which uses a receptive field enhancement (RFE) module in the backbone network to finish RFE and extract more effective features. We design the free-channel iterative attention feature fusion module to reconstruct the neck network and fuse different scales of feature layers to achieve cross-channel attention feature fusion. We use Scylla-intersection over union (SIoU) as the loss function of the model, which makes the model converge to the optimal direction of training through the angle cost, distance cost, shape cost, and IoU cost. The network parameters increase after adding modules, and the model is not easy to converge to the optimal state, so we propose a training method that effectively mines the performance of the detection network. Experiments show that the proposed RF-YOLO achieves a mean average precision of 87.56% and 86.39% on the URPC2019 and URPC2020 datasets, respectively. Through comparative experiments and ablation experiments, it was verified that the proposed network model has a higher detection accuracy in complex underwater environments.

水下环境具有成像不清晰、背景复杂等特点，直接应用主流的物体检测模型性能较差。为了提高水下物体检测的准确性，我们提出了一种物体检测模型 RF-YOLO，它在骨干网络中使用感受野增强（RFE）模块来完成 RFE 并提取更有效的特征。我们设计了自由通道迭代注意力特征融合模块来重构颈部网络，并融合不同尺度的特征层，实现跨通道注意力特征融合。我们使用Scylla-intersection over union（SIoU）作为模型的损失函数，通过角度代价、距离代价、形状代价和IoU代价使模型收敛到最佳训练方向。增加模块后网络参数增加，模型不易收敛到最优状态，因此我们提出了一种训练方法，有效地挖掘了检测网络的性能。实验表明，所提出的 RF-YOLO 在 URPC2019 和 URPC2020 数据集上的平均精度分别达到了 87.56% 和 86.39%。通过对比实验和烧蚀实验，验证了所提出的网络模型在复杂水下环境中具有更高的检测精度。

{"title":"Receptive field enhancement and attention feature fusion network for underwater object detection","authors":"Huipu Xu, Zegang He, Shuo Chen","doi":"10.1117/1.jei.33.3.033007","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033007","url":null,"abstract":"Underwater environments have characteristics such as unclear imaging and complex backgrounds that lead to poor performance when applying mainstream object detection models directly. To improve the accuracy of underwater object detection, we propose an object detection model, RF-YOLO, which uses a receptive field enhancement (RFE) module in the backbone network to finish RFE and extract more effective features. We design the free-channel iterative attention feature fusion module to reconstruct the neck network and fuse different scales of feature layers to achieve cross-channel attention feature fusion. We use Scylla-intersection over union (SIoU) as the loss function of the model, which makes the model converge to the optimal direction of training through the angle cost, distance cost, shape cost, and IoU cost. The network parameters increase after adding modules, and the model is not easy to converge to the optimal state, so we propose a training method that effectively mines the performance of the detection network. Experiments show that the proposed RF-YOLO achieves a mean average precision of 87.56% and 86.39% on the URPC2019 and URPC2020 datasets, respectively. Through comparative experiments and ablation experiments, it was verified that the proposed network model has a higher detection accuracy in complex underwater environments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"18 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Posture-guided part learning for fine-grained image categorization 用于细粒度图像分类的姿态引导部件学习

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033013

Wei Song, Dongmei Chen

The challenge in fine-grained image classification tasks lies in distinguishing subtle differences among fine-grained images. Existing image classification methods often only explore information in isolated regions without considering the relationships among these parts, resulting in incomplete information and a tendency to focus on individual parts. Posture information is hidden among these parts, so it plays a crucial role in differentiating among similar categories. Therefore, we propose a posture-guided part learning framework capable of extracting hidden posture information among regions. In this framework, the dual-branch feature enhancement module (DBFEM) highlights discriminative information related to fine-grained objects by extracting attention information between the feature space and channels. The part selection module selects multiple discriminative parts based on the attention information from DBFEM. Building upon this, the posture feature fusion module extracts semantic features from discriminative parts and constructs posture features among different parts based on these semantic features. Finally, by fusing part semantic features with posture features, a comprehensive representation of fine-grained object features is obtained, aiding in differentiating among similar categories. Extensive evaluations on three benchmark datasets demonstrate the competitiveness of the proposed framework compared with state-of-the-art methods.

细粒度图像分类任务的挑战在于区分细粒度图像之间的细微差别。现有的图像分类方法往往只探索孤立区域的信息，而不考虑这些部分之间的关系，从而导致信息不完整，并倾向于关注单个部分。姿态信息隐藏在这些部分中，因此在区分相似类别时起着至关重要的作用。因此，我们提出了一种姿态引导的部件学习框架，能够提取区域之间隐藏的姿态信息。在这个框架中，双分支特征增强模块（DBFEM）通过提取特征空间和通道之间的注意力信息，突出与细粒度对象相关的判别信息。部件选择模块根据 DBFEM 中的注意力信息选择多个具有辨别力的部件。在此基础上，姿态特征融合模块从可区分的部件中提取语义特征，并根据这些语义特征在不同部件中构建姿态特征。最后，通过融合部件语义特征和姿态特征，就能获得细粒度对象特征的综合表示，从而帮助区分相似类别。在三个基准数据集上进行的广泛评估表明，与最先进的方法相比，所提出的框架具有很强的竞争力。

{"title":"Posture-guided part learning for fine-grained image categorization","authors":"Wei Song, Dongmei Chen","doi":"10.1117/1.jei.33.3.033013","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033013","url":null,"abstract":"The challenge in fine-grained image classification tasks lies in distinguishing subtle differences among fine-grained images. Existing image classification methods often only explore information in isolated regions without considering the relationships among these parts, resulting in incomplete information and a tendency to focus on individual parts. Posture information is hidden among these parts, so it plays a crucial role in differentiating among similar categories. Therefore, we propose a posture-guided part learning framework capable of extracting hidden posture information among regions. In this framework, the dual-branch feature enhancement module (DBFEM) highlights discriminative information related to fine-grained objects by extracting attention information between the feature space and channels. The part selection module selects multiple discriminative parts based on the attention information from DBFEM. Building upon this, the posture feature fusion module extracts semantic features from discriminative parts and constructs posture features among different parts based on these semantic features. Finally, by fusing part semantic features with posture features, a comprehensive representation of fine-grained object features is obtained, aiding in differentiating among similar categories. Extensive evaluations on three benchmark datasets demonstrate the competitiveness of the proposed framework compared with state-of-the-art methods.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"23 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms 通过具有稀疏注意力机制的变异自动编码器富化变换器，实现高效且富有表现力的高分辨率图像合成

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033002

Bingyin Tang, Fan Feng

We introduce a method for efficient and expressive high-resolution image synthesis, harnessing the power of variational autoencoders (VAEs) and transformers with sparse attention (SA) mechanisms. By utilizing VAEs, we can establish a context-rich vocabulary of image constituents, thereby capturing intricate image features in a superior manner compared with traditional techniques. Subsequently, we employ SA mechanisms within our transformer model, improving computational efficiency while dealing with long sequences inherent to high-resolution images. Extending beyond traditional conditional synthesis, our model successfully integrates both nonspatial and spatial information while also incorporating temporal dynamics, enabling sequential image synthesis. Through rigorous experiments, we demonstrate our method’s effectiveness in semantically guided synthesis of megapixel images. Our findings substantiate this method as a significant contribution to the field of high-resolution image synthesis.

我们利用变异自动编码器（VAE）和具有稀疏关注（SA）机制的变换器的强大功能，介绍了一种高效且富有表现力的高分辨率图像合成方法。通过利用变异自编码器，我们可以建立一个上下文丰富的图像成分词汇表，从而以优于传统技术的方式捕捉错综复杂的图像特征。随后，我们在变换器模型中采用 SA 机制，在处理高分辨率图像固有的长序列时提高了计算效率。我们的模型超越了传统的条件合成，成功地整合了非空间和空间信息，同时还结合了时间动态，实现了顺序图像合成。通过严格的实验，我们证明了我们的方法在百万像素图像的语义引导合成中的有效性。我们的研究结果证明了这种方法对高分辨率图像合成领域的重大贡献。

引用次数: 0

Test-time adaptation via self-training with future information 通过未来信息的自我训练实现测试时间适应性

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033012

Xin Wen, Hao Shen, Zhongqiu Zhao

Test-time adaptation (TTA) aims to address potential differences in data distribution between the training and testing phases by modifying a pretrained model based on each specific test sample. This process is especially crucial for deep learning models, as they often encounter frequent changes in the testing environment. Currently, popular TTA methods rely primarily on pseudo-labels (PLs) as supervision signals and fine-tune the model through backpropagation. Consequently, the success of the model’s adaptation depends directly on the quality of the PLs. High-quality PLs can enhance the model’s performance, whereas low-quality ones may lead to poor adaptation results. Intuitively, if the PLs predicted by the model for a given sample remain consistent in both the current and future states, it suggests a higher confidence in that prediction. Using such consistent PLs as supervision signals can greatly benefit long-term adaptation. Nevertheless, this approach may induce overconfidence in the model’s predictions. To counter this, we introduce a regularization term that penalizes overly confident predictions. Our proposed method is highly versatile and can be seamlessly integrated with various TTA strategies, making it immensely practical. We investigate different TTA methods on three widely used datasets (CIFAR10C, CIFAR100C, and ImageNetC) with different scenarios and show that our method achieves competitive or state-of-the-art accuracies on all of them.

测试时间适应（TTA）旨在根据每个特定测试样本修改预训练模型，从而解决训练和测试阶段数据分布的潜在差异。这一过程对深度学习模型尤为重要，因为它们经常会遇到测试环境的频繁变化。目前，流行的 TTA 方法主要依赖伪标签（PL）作为监督信号，并通过反向传播对模型进行微调。因此，模型适应的成功与否直接取决于伪标签的质量。高质量的 PLs 可以提高模型的性能，而低质量的 PLs 则可能导致较差的适应结果。直观地说，如果模型对给定样本预测的 PL 在当前和未来状态下都保持一致，则表明该预测的可信度较高。使用这种一致的 PL 作为监督信号，对长期适应大有裨益。然而，这种方法可能会导致对模型预测的过度信任。为了解决这个问题，我们引入了一个正则化项，对过于自信的预测进行惩罚。我们提出的方法具有很强的通用性，可以与各种 TTA 策略无缝集成，因此非常实用。我们在三个广泛使用的数据集（CIFAR10C、CIFAR100C 和 ImageNetC）上研究了不同场景下的不同 TTA 方法，结果表明我们的方法在所有数据集上都达到了具有竞争力或最先进的准确度。

{"title":"Test-time adaptation via self-training with future information","authors":"Xin Wen, Hao Shen, Zhongqiu Zhao","doi":"10.1117/1.jei.33.3.033012","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033012","url":null,"abstract":"Test-time adaptation (TTA) aims to address potential differences in data distribution between the training and testing phases by modifying a pretrained model based on each specific test sample. This process is especially crucial for deep learning models, as they often encounter frequent changes in the testing environment. Currently, popular TTA methods rely primarily on pseudo-labels (PLs) as supervision signals and fine-tune the model through backpropagation. Consequently, the success of the model’s adaptation depends directly on the quality of the PLs. High-quality PLs can enhance the model’s performance, whereas low-quality ones may lead to poor adaptation results. Intuitively, if the PLs predicted by the model for a given sample remain consistent in both the current and future states, it suggests a higher confidence in that prediction. Using such consistent PLs as supervision signals can greatly benefit long-term adaptation. Nevertheless, this approach may induce overconfidence in the model’s predictions. To counter this, we introduce a regularization term that penalizes overly confident predictions. Our proposed method is highly versatile and can be seamlessly integrated with various TTA strategies, making it immensely practical. We investigate different TTA methods on three widely used datasets (CIFAR10C, CIFAR100C, and ImageNetC) with different scenarios and show that our method achieves competitive or state-of-the-art accuracies on all of them.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Motion trajectory reconstruction degree: a key frame selection criterion for surveillance video 运动轨迹重建度：监控视频的关键帧选择标准

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033009

Yunzuo Zhang, Yameng Liu, Jiayu Zhang, Shasha Zhang, Shuangshuang Wang, Yu Cheng

The primary focus of key frame extraction lies in extracting changes in the motion state from surveillance videos and considering them to be crucial content. However, existing key frame evaluation indicators cannot accurately assess whether the algorithm can capture them. Hence, key frame extraction methods are assessed from the viewpoint of target trajectory reconstruction. The motion trajectory reconstruction degree (MTRD), a key frame selection criterion based on maintaining target global and local motion information, is then put forth. Initially, this evaluation indicator extracts key frames using various key frame extraction methods and reconstructs the motion trajectory based on these key frames using a linear interpolation algorithm. Then, the original motion trajectories of the target are quantified and compared with the reconstructed set of motion trajectories. The more minor the MTRD discrepancy is, the better the trajectory overlap is, and the more accurate the key frames extracted with this method will be for the description of the video content. Finally, inspired by the novel MTRD criterion, we develop an MTRD-oriented key frame extraction method for the surveillance video. The outcomes of the simulations demonstrate that MTRD can more accurately capture the variations in the global and local motion states and is more compatible with the human visual perception.

关键帧提取的主要重点在于从监控视频中提取运动状态的变化，并将其视为关键内容。然而，现有的关键帧评价指标无法准确评估算法是否能捕捉到关键帧。因此，关键帧提取方法要从目标轨迹重构的角度进行评估。运动轨迹重构度（MTRD）是一种基于保持目标全局和局部运动信息的关键帧选择标准。首先，该评价指标使用各种关键帧提取方法提取关键帧，并根据这些关键帧使用线性插值算法重建运动轨迹。然后，对目标的原始运动轨迹进行量化，并与重建的运动轨迹集进行比较。MTRD 差异越小，轨迹重叠越好，用这种方法提取的关键帧对视频内容的描述就越准确。最后，在新颖的 MTRD 准则的启发下，我们开发了一种面向 MTRD 的监控视频关键帧提取方法。模拟结果表明，MTRD 可以更准确地捕捉全局和局部运动状态的变化，并且更符合人类的视觉感知。

{"title":"Motion trajectory reconstruction degree: a key frame selection criterion for surveillance video","authors":"Yunzuo Zhang, Yameng Liu, Jiayu Zhang, Shasha Zhang, Shuangshuang Wang, Yu Cheng","doi":"10.1117/1.jei.33.3.033009","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033009","url":null,"abstract":"The primary focus of key frame extraction lies in extracting changes in the motion state from surveillance videos and considering them to be crucial content. However, existing key frame evaluation indicators cannot accurately assess whether the algorithm can capture them. Hence, key frame extraction methods are assessed from the viewpoint of target trajectory reconstruction. The motion trajectory reconstruction degree (MTRD), a key frame selection criterion based on maintaining target global and local motion information, is then put forth. Initially, this evaluation indicator extracts key frames using various key frame extraction methods and reconstructs the motion trajectory based on these key frames using a linear interpolation algorithm. Then, the original motion trajectories of the target are quantified and compared with the reconstructed set of motion trajectories. The more minor the MTRD discrepancy is, the better the trajectory overlap is, and the more accurate the key frames extracted with this method will be for the description of the video content. Finally, inspired by the novel MTRD criterion, we develop an MTRD-oriented key frame extraction method for the surveillance video. The outcomes of the simulations demonstrate that MTRD can more accurately capture the variations in the global and local motion states and is more compatible with the human visual perception.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HMNNet: research on exposure-based nighttime semantic segmentation HMNNet：基于曝光的夜间语义分割研究

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033015

Yang Yang, Changjiang Liu, Hao Li, Chuan Liu

In recent years, various segmentation models have been developed successively. However, due to the limited availability of nighttime datasets and the complexity of nighttime scenes, there remains a scarcity of high-performance nighttime semantic segmentation models. Analysis of nighttime scenes has revealed that the primary challenges encountered are overexposure and underexposure. In view of this, our proposed Histogram Multi-scale Retinex with Color Restoration and No-Exposure Semantic Segmentation Network model is based on semantic segmentation of nighttime scenes and consists of three modules and a multi-head decoder. The three modules—Histogram, Multi-Scale Retinex with Color Restoration (MSRCR), and No Exposure (N-EX)—aim to enhance the robustness of image segmentation under different lighting conditions. The Histogram module prevents over-fitting to well-lit images, and the MSRCR module enhances images with insufficient lighting, improving object recognition and facilitating segmentation. The N-EX module uses a dark channel prior method to remove excess light covering the surface of an object. Extensive experiments show that the three modules are suitable for different network models and can be inserted and used at will. They significantly improve the model’s segmentation ability for nighttime images while having good generalization ability. When added to the multi-head decoder network, mean intersection over union increases by 6.2% on the nighttime dataset Rebecca and 1.5% on the daytime dataset CamVid.

近年来，各种分割模型相继问世。然而，由于夜景数据集的有限性和夜景的复杂性，高性能的夜景语义分割模型仍然十分匮乏。对夜间场景的分析表明，遇到的主要挑战是曝光过度和曝光不足。有鉴于此，我们提出了基于夜间场景语义分割的直方图多尺度 Retinex 色彩还原和无曝光语义分割网络模型，该模型由三个模块和一个多头解码器组成。这三个模块--直方图、多尺度色彩还原 Retinex（MSRCR）和无曝光（N-EX）--旨在增强不同光照条件下图像分割的鲁棒性。直方图模块可防止过度拟合光线充足的图像，而 MSRCR 模块则可增强光线不足的图像，从而提高物体识别率并促进分割。N-EX 模块使用暗通道先验法去除覆盖物体表面的多余光线。大量实验表明，这三个模块适用于不同的网络模型，可以随意插入和使用。它们大大提高了模型对夜间图像的分割能力，同时具有良好的泛化能力。添加到多头解码器网络中后，在夜间数据集 Rebecca 上，平均交叉比联合增加了 6.2%，在白天数据集 CamVid 上，平均交叉比联合增加了 1.5%。

{"title":"HMNNet: research on exposure-based nighttime semantic segmentation","authors":"Yang Yang, Changjiang Liu, Hao Li, Chuan Liu","doi":"10.1117/1.jei.33.3.033015","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033015","url":null,"abstract":"In recent years, various segmentation models have been developed successively. However, due to the limited availability of nighttime datasets and the complexity of nighttime scenes, there remains a scarcity of high-performance nighttime semantic segmentation models. Analysis of nighttime scenes has revealed that the primary challenges encountered are overexposure and underexposure. In view of this, our proposed Histogram Multi-scale Retinex with Color Restoration and No-Exposure Semantic Segmentation Network model is based on semantic segmentation of nighttime scenes and consists of three modules and a multi-head decoder. The three modules—Histogram, Multi-Scale Retinex with Color Restoration (MSRCR), and No Exposure (N-EX)—aim to enhance the robustness of image segmentation under different lighting conditions. The Histogram module prevents over-fitting to well-lit images, and the MSRCR module enhances images with insufficient lighting, improving object recognition and facilitating segmentation. The N-EX module uses a dark channel prior method to remove excess light covering the surface of an object. Extensive experiments show that the three modules are suitable for different network models and can be inserted and used at will. They significantly improve the model’s segmentation ability for nighttime images while having good generalization ability. When added to the multi-head decoder network, mean intersection over union increases by 6.2% on the nighttime dataset Rebecca and 1.5% on the daytime dataset CamVid.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"27 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep metric learning method for open-set iris recognition 用于开放集虹膜识别的深度度量学习方法

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033016

Guang Huo, Ruyuan Li, Jianlou Lou, Xiaolu Yu, Jiajun Wang, Xinlei He, Yue Wang

The existing iris recognition methods offer excellent recognition performance for known classes, but they do not perform well when faced with unknown classes. The process of identifying unknown classes is referred to as open-set recognition. To improve the robustness of iris recognition system, this work integrates a hash center to construct a deep metric learning method for open-set iris recognition, called central similarity based deep hash. It first maps each iris category into defined hash centers using a generation hash center algorithm. Then, OiNet is trained to each iris texture to cluster around the corresponding hash center. For testing, cosine similarity is calculated for each pair of iris textures to estimate their similarity. Based on experiments conducted on public datasets, along with evaluations of performance within the dataset and across different datasets, our method demonstrates substantial performance advantages compared with other algorithms for open-set iris recognition.

现有的虹膜识别方法对已知类别的识别性能极佳，但在面对未知类别时性能不佳。识别未知类别的过程被称为开放集识别。为了提高虹膜识别系统的鲁棒性，这项工作整合了哈希中心，构建了一种用于开放集虹膜识别的深度度量学习方法，称为基于中心相似性的深度哈希。它首先使用一代哈希中心算法将每个虹膜类别映射到定义的哈希中心。然后，对每个虹膜纹理进行 OiNet 训练，使其围绕相应的哈希中心聚类。测试时，计算每对虹膜纹理的余弦相似度，以估计它们的相似度。根据在公共数据集上进行的实验，以及数据集内部和不同数据集之间的性能评估，与其他用于开放集虹膜识别的算法相比，我们的方法具有显著的性能优势。

引用次数: 0

KT-NeRF: multi-view anti-motion blur neural radiance fields KT-NeRF：多视角抗运动模糊神经辐射场

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033006

Yining Wang, Jinyi Zhang, Yuxi Jiang

In the field of three-dimensional (3D) reconstruction, neural radiation fields (NeRF) can implicitly represent high-quality 3D scenes. However, traditional neural radiation fields place very high demands on the quality of the input images. When motion blurred images are input, the requirement of NeRF for multi-view consistency cannot be met, which results in a significant degradation in the quality of the 3D reconstruction. To address this problem, we propose KT-NeRF that extends NeRF to motion blur scenes. Based on the principle of motion blur, the method is derived from two-dimensional (2D) motion blurred images to 3D space. Then, Gaussian process regression model is introduced to estimate the motion trajectory of the camera for each motion blurred image, with the aim of learning accurate camera poses at key time stamps during the exposure time. The camera poses at the key time stamps are used as inputs to the NeRF in order to allow the NeRF to learn the blur information embedded in the images. Finally, the parameters of the Gaussian process regression model and the NeRF are jointly optimized to achieve multi-view anti-motion blur. The experiment shows that KT-NeRF achieved a peak signal-to-noise ratio of 29.4 and a structural similarity index of 0.85, an increase of 3.5% and 2.4%, respectively, over existing advanced methods. The learned perceptual image patch similarity was also reduced by 7.1% to 0.13.

在三维（3D）重建领域，神经辐射场（NeRF）可以隐含地表示高质量的三维场景。然而，传统的神经辐射场对输入图像的质量要求非常高。当输入运动模糊图像时，神经辐射场无法满足多视角一致性的要求，从而导致三维重建的质量显著下降。为了解决这个问题，我们提出了 KT-NeRF，将 NeRF 扩展到运动模糊场景。基于运动模糊原理，该方法从二维（2D）运动模糊图像衍生到三维空间。然后，引入高斯过程回归模型来估计每张运动模糊图像的摄像机运动轨迹，目的是学习曝光时间内关键时间点的精确摄像机姿态。关键时间点上的摄像机姿态将作为 NeRF 的输入，以便 NeRF 学习图像中蕴含的模糊信息。最后，对高斯过程回归模型和 NeRF 的参数进行联合优化，以实现多视角防运动模糊。实验结果表明，KT-NeRF 的峰值信噪比为 29.4，结构相似度指数为 0.85，分别比现有的先进方法提高了 3.5% 和 2.4%。学习到的感知图像补丁相似度也降低了 7.1%，达到 0.13。

{"title":"KT-NeRF: multi-view anti-motion blur neural radiance fields","authors":"Yining Wang, Jinyi Zhang, Yuxi Jiang","doi":"10.1117/1.jei.33.3.033006","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033006","url":null,"abstract":"In the field of three-dimensional (3D) reconstruction, neural radiation fields (NeRF) can implicitly represent high-quality 3D scenes. However, traditional neural radiation fields place very high demands on the quality of the input images. When motion blurred images are input, the requirement of NeRF for multi-view consistency cannot be met, which results in a significant degradation in the quality of the 3D reconstruction. To address this problem, we propose KT-NeRF that extends NeRF to motion blur scenes. Based on the principle of motion blur, the method is derived from two-dimensional (2D) motion blurred images to 3D space. Then, Gaussian process regression model is introduced to estimate the motion trajectory of the camera for each motion blurred image, with the aim of learning accurate camera poses at key time stamps during the exposure time. The camera poses at the key time stamps are used as inputs to the NeRF in order to allow the NeRF to learn the blur information embedded in the images. Finally, the parameters of the Gaussian process regression model and the NeRF are jointly optimized to achieve multi-view anti-motion blur. The experiment shows that KT-NeRF achieved a peak signal-to-noise ratio of 29.4 and a structural similarity index of 0.85, an increase of 3.5% and 2.4%, respectively, over existing advanced methods. The learned perceptual image patch similarity was also reduced by 7.1% to 0.13.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"105 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Three-dimensional shape estimation of wires from three-dimensional X-ray computed tomography images of electrical cables 从电缆的三维 X 射线计算机断层扫描图像估算导线的三维形状

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-04-01 DOI: 10.1117/1.jei.33.3.031209

Shiori Ueda, Kanon Sato, Hideo Saito, Yutaka Hoshina

Electrical cables consist of numerous wires, the three-dimensional (3D) shape of which significantly impacts the cables’ overall properties, such as bending stiffness. Although X-ray computed tomography (CT) provides a non-destructive method to assess these properties, accurately determining the 3D shape of individual wires from CT images is challenging due to the large number of wires, low image resolution, and indistinguishable appearance of the wires. Previous research lacked quantitative evaluation for wire tracking, and its overall accuracy heavily relied on the accuracy of wire detection. In this study, we present a long short-term memory-based approach for wire tracking that improves robustness against detection errors. The proposed method predicts wire positions in subsequent frames based on previous frames. We evaluate the performance of the proposed method using both actual annotated cables and artificially noised annotations. Our method exhibits greater tracking accuracy and robustness to detection errors compared with the previous method.

电缆由许多导线组成，其三维（3D）形状对电缆的整体性能（如弯曲刚度）有很大影响。虽然 X 射线计算机断层扫描（CT）提供了一种非破坏性的方法来评估这些特性，但从 CT 图像中准确确定单根导线的三维形状具有挑战性，因为导线数量多、图像分辨率低、导线外观难以区分。以往的研究缺乏对导线追踪的定量评估，其整体准确性在很大程度上依赖于导线检测的准确性。在本研究中，我们提出了一种基于长短期记忆的电线跟踪方法，该方法提高了对检测错误的鲁棒性。所提出的方法会根据之前的帧来预测后续帧中的电线位置。我们使用实际注释的线缆和人为噪声注释对所提方法的性能进行了评估。与之前的方法相比，我们的方法具有更高的跟踪精度和对检测错误的鲁棒性。

引用次数: 0

Flexible machine/deep learning microservice architecture for industrial vision-based quality control on a low-cost device 灵活的机器/深度学习微服务架构，在低成本设备上实现基于工业视觉的质量控制

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-03-01 DOI: 10.1117/1.jei.33.3.031208

Stefano Toigo, Brendon Kasi, Daniele Fornasier, Angelo Cenedese

This paper aims to delineate a comprehensive method that integrates machine vision and deep learning for quality control within an industrial setting. The proposed innovative approach leverages a microservice architecture that ensures adaptability and flexibility to different scenarios while focusing on the employment of affordable, compact hardware, and it achieves exceptionally high accuracy in performing the quality control task and keeping a minimal computation time. Consequently, the developed system operates entirely on a portable smart camera, eliminating the need for additional sensors such as photocells and external computation, which simplifies the setup and commissioning phases and reduces the overall impact on the production line. By leveraging the integration of the embedded system with the machinery, this approach offers real-time monitoring and analysis capabilities, facilitating the swift detection of defects and deviations from desired standards. Moreover, the low-cost nature of the solution makes it accessible to a wider range of manufacturing enterprises, democratizing quality processes in Industry 5.0. The system was successfully implemented and is fully operational in a real industrial environment, and the experimental results obtained from this implementation are presented in this work.

本文旨在阐述一种将机器视觉与深度学习相结合的综合方法，用于工业环境中的质量控制。所提出的创新方法利用微服务架构，确保了对不同场景的适应性和灵活性，同时注重采用经济实惠的紧凑型硬件，在执行质量控制任务时实现了极高的准确性，并将计算时间保持在最低水平。因此，所开发的系统完全依靠便携式智能相机运行，无需光电池等额外传感器和外部计算，从而简化了设置和调试阶段，降低了对生产线的整体影响。通过利用嵌入式系统与机器的集成，这种方法提供了实时监控和分析功能，便于迅速检测缺陷和与预期标准的偏差。此外，该解决方案的低成本特性使其能够为更广泛的制造企业所使用，实现了工业 5.0 质量流程的民主化。该系统已在实际工业环境中成功实施并全面运行，本作品将介绍实施过程中获得的实验结果。

{"title":"Flexible machine/deep learning microservice architecture for industrial vision-based quality control on a low-cost device","authors":"Stefano Toigo, Brendon Kasi, Daniele Fornasier, Angelo Cenedese","doi":"10.1117/1.jei.33.3.031208","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.031208","url":null,"abstract":"This paper aims to delineate a comprehensive method that integrates machine vision and deep learning for quality control within an industrial setting. The proposed innovative approach leverages a microservice architecture that ensures adaptability and flexibility to different scenarios while focusing on the employment of affordable, compact hardware, and it achieves exceptionally high accuracy in performing the quality control task and keeping a minimal computation time. Consequently, the developed system operates entirely on a portable smart camera, eliminating the need for additional sensors such as photocells and external computation, which simplifies the setup and commissioning phases and reduces the overall impact on the production line. By leveraging the integration of the embedded system with the machinery, this approach offers real-time monitoring and analysis capabilities, facilitating the swift detection of defects and deviations from desired standards. Moreover, the low-cost nature of the solution makes it accessible to a wider range of manufacturing enterprises, democratizing quality processes in Industry 5.0. The system was successfully implemented and is fully operational in a real industrial environment, and the experimental results obtained from this implementation are presented in this work.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"88 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0