Pub Date : 2024-05-01DOI: 10.1117/1.jei.33.3.033007
Huipu Xu, Zegang He, Shuo Chen
Underwater environments have characteristics such as unclear imaging and complex backgrounds that lead to poor performance when applying mainstream object detection models directly. To improve the accuracy of underwater object detection, we propose an object detection model, RF-YOLO, which uses a receptive field enhancement (RFE) module in the backbone network to finish RFE and extract more effective features. We design the free-channel iterative attention feature fusion module to reconstruct the neck network and fuse different scales of feature layers to achieve cross-channel attention feature fusion. We use Scylla-intersection over union (SIoU) as the loss function of the model, which makes the model converge to the optimal direction of training through the angle cost, distance cost, shape cost, and IoU cost. The network parameters increase after adding modules, and the model is not easy to converge to the optimal state, so we propose a training method that effectively mines the performance of the detection network. Experiments show that the proposed RF-YOLO achieves a mean average precision of 87.56% and 86.39% on the URPC2019 and URPC2020 datasets, respectively. Through comparative experiments and ablation experiments, it was verified that the proposed network model has a higher detection accuracy in complex underwater environments.
{"title":"Receptive field enhancement and attention feature fusion network for underwater object detection","authors":"Huipu Xu, Zegang He, Shuo Chen","doi":"10.1117/1.jei.33.3.033007","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033007","url":null,"abstract":"Underwater environments have characteristics such as unclear imaging and complex backgrounds that lead to poor performance when applying mainstream object detection models directly. To improve the accuracy of underwater object detection, we propose an object detection model, RF-YOLO, which uses a receptive field enhancement (RFE) module in the backbone network to finish RFE and extract more effective features. We design the free-channel iterative attention feature fusion module to reconstruct the neck network and fuse different scales of feature layers to achieve cross-channel attention feature fusion. We use Scylla-intersection over union (SIoU) as the loss function of the model, which makes the model converge to the optimal direction of training through the angle cost, distance cost, shape cost, and IoU cost. The network parameters increase after adding modules, and the model is not easy to converge to the optimal state, so we propose a training method that effectively mines the performance of the detection network. Experiments show that the proposed RF-YOLO achieves a mean average precision of 87.56% and 86.39% on the URPC2019 and URPC2020 datasets, respectively. Through comparative experiments and ablation experiments, it was verified that the proposed network model has a higher detection accuracy in complex underwater environments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"18 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1117/1.jei.33.3.033013
Wei Song, Dongmei Chen
The challenge in fine-grained image classification tasks lies in distinguishing subtle differences among fine-grained images. Existing image classification methods often only explore information in isolated regions without considering the relationships among these parts, resulting in incomplete information and a tendency to focus on individual parts. Posture information is hidden among these parts, so it plays a crucial role in differentiating among similar categories. Therefore, we propose a posture-guided part learning framework capable of extracting hidden posture information among regions. In this framework, the dual-branch feature enhancement module (DBFEM) highlights discriminative information related to fine-grained objects by extracting attention information between the feature space and channels. The part selection module selects multiple discriminative parts based on the attention information from DBFEM. Building upon this, the posture feature fusion module extracts semantic features from discriminative parts and constructs posture features among different parts based on these semantic features. Finally, by fusing part semantic features with posture features, a comprehensive representation of fine-grained object features is obtained, aiding in differentiating among similar categories. Extensive evaluations on three benchmark datasets demonstrate the competitiveness of the proposed framework compared with state-of-the-art methods.
{"title":"Posture-guided part learning for fine-grained image categorization","authors":"Wei Song, Dongmei Chen","doi":"10.1117/1.jei.33.3.033013","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033013","url":null,"abstract":"The challenge in fine-grained image classification tasks lies in distinguishing subtle differences among fine-grained images. Existing image classification methods often only explore information in isolated regions without considering the relationships among these parts, resulting in incomplete information and a tendency to focus on individual parts. Posture information is hidden among these parts, so it plays a crucial role in differentiating among similar categories. Therefore, we propose a posture-guided part learning framework capable of extracting hidden posture information among regions. In this framework, the dual-branch feature enhancement module (DBFEM) highlights discriminative information related to fine-grained objects by extracting attention information between the feature space and channels. The part selection module selects multiple discriminative parts based on the attention information from DBFEM. Building upon this, the posture feature fusion module extracts semantic features from discriminative parts and constructs posture features among different parts based on these semantic features. Finally, by fusing part semantic features with posture features, a comprehensive representation of fine-grained object features is obtained, aiding in differentiating among similar categories. Extensive evaluations on three benchmark datasets demonstrate the competitiveness of the proposed framework compared with state-of-the-art methods.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"23 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1117/1.jei.33.3.033002
Bingyin Tang, Fan Feng
We introduce a method for efficient and expressive high-resolution image synthesis, harnessing the power of variational autoencoders (VAEs) and transformers with sparse attention (SA) mechanisms. By utilizing VAEs, we can establish a context-rich vocabulary of image constituents, thereby capturing intricate image features in a superior manner compared with traditional techniques. Subsequently, we employ SA mechanisms within our transformer model, improving computational efficiency while dealing with long sequences inherent to high-resolution images. Extending beyond traditional conditional synthesis, our model successfully integrates both nonspatial and spatial information while also incorporating temporal dynamics, enabling sequential image synthesis. Through rigorous experiments, we demonstrate our method’s effectiveness in semantically guided synthesis of megapixel images. Our findings substantiate this method as a significant contribution to the field of high-resolution image synthesis.
我们利用变异自动编码器(VAE)和具有稀疏关注(SA)机制的变换器的强大功能,介绍了一种高效且富有表现力的高分辨率图像合成方法。通过利用变异自编码器,我们可以建立一个上下文丰富的图像成分词汇表,从而以优于传统技术的方式捕捉错综复杂的图像特征。随后,我们在变换器模型中采用 SA 机制,在处理高分辨率图像固有的长序列时提高了计算效率。我们的模型超越了传统的条件合成,成功地整合了非空间和空间信息,同时还结合了时间动态,实现了顺序图像合成。通过严格的实验,我们证明了我们的方法在百万像素图像的语义引导合成中的有效性。我们的研究结果证明了这种方法对高分辨率图像合成领域的重大贡献。
{"title":"Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms","authors":"Bingyin Tang, Fan Feng","doi":"10.1117/1.jei.33.3.033002","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033002","url":null,"abstract":"We introduce a method for efficient and expressive high-resolution image synthesis, harnessing the power of variational autoencoders (VAEs) and transformers with sparse attention (SA) mechanisms. By utilizing VAEs, we can establish a context-rich vocabulary of image constituents, thereby capturing intricate image features in a superior manner compared with traditional techniques. Subsequently, we employ SA mechanisms within our transformer model, improving computational efficiency while dealing with long sequences inherent to high-resolution images. Extending beyond traditional conditional synthesis, our model successfully integrates both nonspatial and spatial information while also incorporating temporal dynamics, enabling sequential image synthesis. Through rigorous experiments, we demonstrate our method’s effectiveness in semantically guided synthesis of megapixel images. Our findings substantiate this method as a significant contribution to the field of high-resolution image synthesis.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"15 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1117/1.jei.33.3.033012
Xin Wen, Hao Shen, Zhongqiu Zhao
Test-time adaptation (TTA) aims to address potential differences in data distribution between the training and testing phases by modifying a pretrained model based on each specific test sample. This process is especially crucial for deep learning models, as they often encounter frequent changes in the testing environment. Currently, popular TTA methods rely primarily on pseudo-labels (PLs) as supervision signals and fine-tune the model through backpropagation. Consequently, the success of the model’s adaptation depends directly on the quality of the PLs. High-quality PLs can enhance the model’s performance, whereas low-quality ones may lead to poor adaptation results. Intuitively, if the PLs predicted by the model for a given sample remain consistent in both the current and future states, it suggests a higher confidence in that prediction. Using such consistent PLs as supervision signals can greatly benefit long-term adaptation. Nevertheless, this approach may induce overconfidence in the model’s predictions. To counter this, we introduce a regularization term that penalizes overly confident predictions. Our proposed method is highly versatile and can be seamlessly integrated with various TTA strategies, making it immensely practical. We investigate different TTA methods on three widely used datasets (CIFAR10C, CIFAR100C, and ImageNetC) with different scenarios and show that our method achieves competitive or state-of-the-art accuracies on all of them.
{"title":"Test-time adaptation via self-training with future information","authors":"Xin Wen, Hao Shen, Zhongqiu Zhao","doi":"10.1117/1.jei.33.3.033012","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033012","url":null,"abstract":"Test-time adaptation (TTA) aims to address potential differences in data distribution between the training and testing phases by modifying a pretrained model based on each specific test sample. This process is especially crucial for deep learning models, as they often encounter frequent changes in the testing environment. Currently, popular TTA methods rely primarily on pseudo-labels (PLs) as supervision signals and fine-tune the model through backpropagation. Consequently, the success of the model’s adaptation depends directly on the quality of the PLs. High-quality PLs can enhance the model’s performance, whereas low-quality ones may lead to poor adaptation results. Intuitively, if the PLs predicted by the model for a given sample remain consistent in both the current and future states, it suggests a higher confidence in that prediction. Using such consistent PLs as supervision signals can greatly benefit long-term adaptation. Nevertheless, this approach may induce overconfidence in the model’s predictions. To counter this, we introduce a regularization term that penalizes overly confident predictions. Our proposed method is highly versatile and can be seamlessly integrated with various TTA strategies, making it immensely practical. We investigate different TTA methods on three widely used datasets (CIFAR10C, CIFAR100C, and ImageNetC) with different scenarios and show that our method achieves competitive or state-of-the-art accuracies on all of them.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The primary focus of key frame extraction lies in extracting changes in the motion state from surveillance videos and considering them to be crucial content. However, existing key frame evaluation indicators cannot accurately assess whether the algorithm can capture them. Hence, key frame extraction methods are assessed from the viewpoint of target trajectory reconstruction. The motion trajectory reconstruction degree (MTRD), a key frame selection criterion based on maintaining target global and local motion information, is then put forth. Initially, this evaluation indicator extracts key frames using various key frame extraction methods and reconstructs the motion trajectory based on these key frames using a linear interpolation algorithm. Then, the original motion trajectories of the target are quantified and compared with the reconstructed set of motion trajectories. The more minor the MTRD discrepancy is, the better the trajectory overlap is, and the more accurate the key frames extracted with this method will be for the description of the video content. Finally, inspired by the novel MTRD criterion, we develop an MTRD-oriented key frame extraction method for the surveillance video. The outcomes of the simulations demonstrate that MTRD can more accurately capture the variations in the global and local motion states and is more compatible with the human visual perception.
{"title":"Motion trajectory reconstruction degree: a key frame selection criterion for surveillance video","authors":"Yunzuo Zhang, Yameng Liu, Jiayu Zhang, Shasha Zhang, Shuangshuang Wang, Yu Cheng","doi":"10.1117/1.jei.33.3.033009","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033009","url":null,"abstract":"The primary focus of key frame extraction lies in extracting changes in the motion state from surveillance videos and considering them to be crucial content. However, existing key frame evaluation indicators cannot accurately assess whether the algorithm can capture them. Hence, key frame extraction methods are assessed from the viewpoint of target trajectory reconstruction. The motion trajectory reconstruction degree (MTRD), a key frame selection criterion based on maintaining target global and local motion information, is then put forth. Initially, this evaluation indicator extracts key frames using various key frame extraction methods and reconstructs the motion trajectory based on these key frames using a linear interpolation algorithm. Then, the original motion trajectories of the target are quantified and compared with the reconstructed set of motion trajectories. The more minor the MTRD discrepancy is, the better the trajectory overlap is, and the more accurate the key frames extracted with this method will be for the description of the video content. Finally, inspired by the novel MTRD criterion, we develop an MTRD-oriented key frame extraction method for the surveillance video. The outcomes of the simulations demonstrate that MTRD can more accurately capture the variations in the global and local motion states and is more compatible with the human visual perception.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"37 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1117/1.jei.33.3.033015
Yang Yang, Changjiang Liu, Hao Li, Chuan Liu
In recent years, various segmentation models have been developed successively. However, due to the limited availability of nighttime datasets and the complexity of nighttime scenes, there remains a scarcity of high-performance nighttime semantic segmentation models. Analysis of nighttime scenes has revealed that the primary challenges encountered are overexposure and underexposure. In view of this, our proposed Histogram Multi-scale Retinex with Color Restoration and No-Exposure Semantic Segmentation Network model is based on semantic segmentation of nighttime scenes and consists of three modules and a multi-head decoder. The three modules—Histogram, Multi-Scale Retinex with Color Restoration (MSRCR), and No Exposure (N-EX)—aim to enhance the robustness of image segmentation under different lighting conditions. The Histogram module prevents over-fitting to well-lit images, and the MSRCR module enhances images with insufficient lighting, improving object recognition and facilitating segmentation. The N-EX module uses a dark channel prior method to remove excess light covering the surface of an object. Extensive experiments show that the three modules are suitable for different network models and can be inserted and used at will. They significantly improve the model’s segmentation ability for nighttime images while having good generalization ability. When added to the multi-head decoder network, mean intersection over union increases by 6.2% on the nighttime dataset Rebecca and 1.5% on the daytime dataset CamVid.
{"title":"HMNNet: research on exposure-based nighttime semantic segmentation","authors":"Yang Yang, Changjiang Liu, Hao Li, Chuan Liu","doi":"10.1117/1.jei.33.3.033015","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033015","url":null,"abstract":"In recent years, various segmentation models have been developed successively. However, due to the limited availability of nighttime datasets and the complexity of nighttime scenes, there remains a scarcity of high-performance nighttime semantic segmentation models. Analysis of nighttime scenes has revealed that the primary challenges encountered are overexposure and underexposure. In view of this, our proposed Histogram Multi-scale Retinex with Color Restoration and No-Exposure Semantic Segmentation Network model is based on semantic segmentation of nighttime scenes and consists of three modules and a multi-head decoder. The three modules—Histogram, Multi-Scale Retinex with Color Restoration (MSRCR), and No Exposure (N-EX)—aim to enhance the robustness of image segmentation under different lighting conditions. The Histogram module prevents over-fitting to well-lit images, and the MSRCR module enhances images with insufficient lighting, improving object recognition and facilitating segmentation. The N-EX module uses a dark channel prior method to remove excess light covering the surface of an object. Extensive experiments show that the three modules are suitable for different network models and can be inserted and used at will. They significantly improve the model’s segmentation ability for nighttime images while having good generalization ability. When added to the multi-head decoder network, mean intersection over union increases by 6.2% on the nighttime dataset Rebecca and 1.5% on the daytime dataset CamVid.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"27 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The existing iris recognition methods offer excellent recognition performance for known classes, but they do not perform well when faced with unknown classes. The process of identifying unknown classes is referred to as open-set recognition. To improve the robustness of iris recognition system, this work integrates a hash center to construct a deep metric learning method for open-set iris recognition, called central similarity based deep hash. It first maps each iris category into defined hash centers using a generation hash center algorithm. Then, OiNet is trained to each iris texture to cluster around the corresponding hash center. For testing, cosine similarity is calculated for each pair of iris textures to estimate their similarity. Based on experiments conducted on public datasets, along with evaluations of performance within the dataset and across different datasets, our method demonstrates substantial performance advantages compared with other algorithms for open-set iris recognition.
{"title":"Deep metric learning method for open-set iris recognition","authors":"Guang Huo, Ruyuan Li, Jianlou Lou, Xiaolu Yu, Jiajun Wang, Xinlei He, Yue Wang","doi":"10.1117/1.jei.33.3.033016","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033016","url":null,"abstract":"The existing iris recognition methods offer excellent recognition performance for known classes, but they do not perform well when faced with unknown classes. The process of identifying unknown classes is referred to as open-set recognition. To improve the robustness of iris recognition system, this work integrates a hash center to construct a deep metric learning method for open-set iris recognition, called central similarity based deep hash. It first maps each iris category into defined hash centers using a generation hash center algorithm. Then, OiNet is trained to each iris texture to cluster around the corresponding hash center. For testing, cosine similarity is calculated for each pair of iris textures to estimate their similarity. Based on experiments conducted on public datasets, along with evaluations of performance within the dataset and across different datasets, our method demonstrates substantial performance advantages compared with other algorithms for open-set iris recognition.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"127 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-01DOI: 10.1117/1.jei.33.3.033006
Yining Wang, Jinyi Zhang, Yuxi Jiang
In the field of three-dimensional (3D) reconstruction, neural radiation fields (NeRF) can implicitly represent high-quality 3D scenes. However, traditional neural radiation fields place very high demands on the quality of the input images. When motion blurred images are input, the requirement of NeRF for multi-view consistency cannot be met, which results in a significant degradation in the quality of the 3D reconstruction. To address this problem, we propose KT-NeRF that extends NeRF to motion blur scenes. Based on the principle of motion blur, the method is derived from two-dimensional (2D) motion blurred images to 3D space. Then, Gaussian process regression model is introduced to estimate the motion trajectory of the camera for each motion blurred image, with the aim of learning accurate camera poses at key time stamps during the exposure time. The camera poses at the key time stamps are used as inputs to the NeRF in order to allow the NeRF to learn the blur information embedded in the images. Finally, the parameters of the Gaussian process regression model and the NeRF are jointly optimized to achieve multi-view anti-motion blur. The experiment shows that KT-NeRF achieved a peak signal-to-noise ratio of 29.4 and a structural similarity index of 0.85, an increase of 3.5% and 2.4%, respectively, over existing advanced methods. The learned perceptual image patch similarity was also reduced by 7.1% to 0.13.
{"title":"KT-NeRF: multi-view anti-motion blur neural radiance fields","authors":"Yining Wang, Jinyi Zhang, Yuxi Jiang","doi":"10.1117/1.jei.33.3.033006","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033006","url":null,"abstract":"In the field of three-dimensional (3D) reconstruction, neural radiation fields (NeRF) can implicitly represent high-quality 3D scenes. However, traditional neural radiation fields place very high demands on the quality of the input images. When motion blurred images are input, the requirement of NeRF for multi-view consistency cannot be met, which results in a significant degradation in the quality of the 3D reconstruction. To address this problem, we propose KT-NeRF that extends NeRF to motion blur scenes. Based on the principle of motion blur, the method is derived from two-dimensional (2D) motion blurred images to 3D space. Then, Gaussian process regression model is introduced to estimate the motion trajectory of the camera for each motion blurred image, with the aim of learning accurate camera poses at key time stamps during the exposure time. The camera poses at the key time stamps are used as inputs to the NeRF in order to allow the NeRF to learn the blur information embedded in the images. Finally, the parameters of the Gaussian process regression model and the NeRF are jointly optimized to achieve multi-view anti-motion blur. The experiment shows that KT-NeRF achieved a peak signal-to-noise ratio of 29.4 and a structural similarity index of 0.85, an increase of 3.5% and 2.4%, respectively, over existing advanced methods. The learned perceptual image patch similarity was also reduced by 7.1% to 0.13.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"105 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electrical cables consist of numerous wires, the three-dimensional (3D) shape of which significantly impacts the cables’ overall properties, such as bending stiffness. Although X-ray computed tomography (CT) provides a non-destructive method to assess these properties, accurately determining the 3D shape of individual wires from CT images is challenging due to the large number of wires, low image resolution, and indistinguishable appearance of the wires. Previous research lacked quantitative evaluation for wire tracking, and its overall accuracy heavily relied on the accuracy of wire detection. In this study, we present a long short-term memory-based approach for wire tracking that improves robustness against detection errors. The proposed method predicts wire positions in subsequent frames based on previous frames. We evaluate the performance of the proposed method using both actual annotated cables and artificially noised annotations. Our method exhibits greater tracking accuracy and robustness to detection errors compared with the previous method.
电缆由许多导线组成,其三维(3D)形状对电缆的整体性能(如弯曲刚度)有很大影响。虽然 X 射线计算机断层扫描(CT)提供了一种非破坏性的方法来评估这些特性,但从 CT 图像中准确确定单根导线的三维形状具有挑战性,因为导线数量多、图像分辨率低、导线外观难以区分。以往的研究缺乏对导线追踪的定量评估,其整体准确性在很大程度上依赖于导线检测的准确性。在本研究中,我们提出了一种基于长短期记忆的电线跟踪方法,该方法提高了对检测错误的鲁棒性。所提出的方法会根据之前的帧来预测后续帧中的电线位置。我们使用实际注释的线缆和人为噪声注释对所提方法的性能进行了评估。与之前的方法相比,我们的方法具有更高的跟踪精度和对检测错误的鲁棒性。
{"title":"Three-dimensional shape estimation of wires from three-dimensional X-ray computed tomography images of electrical cables","authors":"Shiori Ueda, Kanon Sato, Hideo Saito, Yutaka Hoshina","doi":"10.1117/1.jei.33.3.031209","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.031209","url":null,"abstract":"Electrical cables consist of numerous wires, the three-dimensional (3D) shape of which significantly impacts the cables’ overall properties, such as bending stiffness. Although X-ray computed tomography (CT) provides a non-destructive method to assess these properties, accurately determining the 3D shape of individual wires from CT images is challenging due to the large number of wires, low image resolution, and indistinguishable appearance of the wires. Previous research lacked quantitative evaluation for wire tracking, and its overall accuracy heavily relied on the accuracy of wire detection. In this study, we present a long short-term memory-based approach for wire tracking that improves robustness against detection errors. The proposed method predicts wire positions in subsequent frames based on previous frames. We evaluate the performance of the proposed method using both actual annotated cables and artificially noised annotations. Our method exhibits greater tracking accuracy and robustness to detection errors compared with the previous method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"17 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper aims to delineate a comprehensive method that integrates machine vision and deep learning for quality control within an industrial setting. The proposed innovative approach leverages a microservice architecture that ensures adaptability and flexibility to different scenarios while focusing on the employment of affordable, compact hardware, and it achieves exceptionally high accuracy in performing the quality control task and keeping a minimal computation time. Consequently, the developed system operates entirely on a portable smart camera, eliminating the need for additional sensors such as photocells and external computation, which simplifies the setup and commissioning phases and reduces the overall impact on the production line. By leveraging the integration of the embedded system with the machinery, this approach offers real-time monitoring and analysis capabilities, facilitating the swift detection of defects and deviations from desired standards. Moreover, the low-cost nature of the solution makes it accessible to a wider range of manufacturing enterprises, democratizing quality processes in Industry 5.0. The system was successfully implemented and is fully operational in a real industrial environment, and the experimental results obtained from this implementation are presented in this work.
{"title":"Flexible machine/deep learning microservice architecture for industrial vision-based quality control on a low-cost device","authors":"Stefano Toigo, Brendon Kasi, Daniele Fornasier, Angelo Cenedese","doi":"10.1117/1.jei.33.3.031208","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.031208","url":null,"abstract":"This paper aims to delineate a comprehensive method that integrates machine vision and deep learning for quality control within an industrial setting. The proposed innovative approach leverages a microservice architecture that ensures adaptability and flexibility to different scenarios while focusing on the employment of affordable, compact hardware, and it achieves exceptionally high accuracy in performing the quality control task and keeping a minimal computation time. Consequently, the developed system operates entirely on a portable smart camera, eliminating the need for additional sensors such as photocells and external computation, which simplifies the setup and commissioning phases and reduces the overall impact on the production line. By leveraging the integration of the embedded system with the machinery, this approach offers real-time monitoring and analysis capabilities, facilitating the swift detection of defects and deviations from desired standards. Moreover, the low-cost nature of the solution makes it accessible to a wider range of manufacturing enterprises, democratizing quality processes in Industry 5.0. The system was successfully implemented and is fully operational in a real industrial environment, and the experimental results obtained from this implementation are presented in this work.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"88 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140076147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}