Journal of Electronic Imaging最新文献_第9页

Length and salient losses co-supported content-based commodity retrieval neural network 长度和突出损失共同支持的基于内容的商品检索神经网络

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-06-01 DOI: 10.1117/1.jei.33.3.033036

Mengqi Chen, Yifan Wang, Qian Sun, Weiming Wang, Fu Lee Wang

Content-based commodity retrieval (CCR) faces two major challenges: (1) commodities in real-world scenarios are often captured randomly by users, resulting in significant variations in image backgrounds, poses, shooting angles, and brightness; and (2) many commodities in the CCR dataset have similar appearances but belong to different brands or distinct products within the same brand. We introduce a CCR neural network called CCR-Net, which incorporates both length loss and salient loss. These two losses can operate independently or collaboratively to enhance retrieval quality. CCR-Net offers several advantages, including the ability to (1) minimize data variations in real-world captured images; and (2) differentiate between images containing highly similar but fundamentally distinct commodities, resulting in improved commodity retrieval capabilities. Comprehensive experiments demonstrate that our CCR-Net achieves state-of-the-art performance on the CUB200-2011, Perfect500k, and Stanford Online Products datasets for commodity retrieval tasks.

基于内容的商品检索（CCR）面临两大挑战：(1) 现实世界场景中的商品往往是由用户随机拍摄的，因此图像的背景、姿势、拍摄角度和亮度都有很大差异；(2) CCR 数据集中的许多商品外观相似，但属于不同品牌或同一品牌中的不同产品。我们引入了一种名为 CCR-Net 的 CCR 神经网络，其中包含长度损失和显著性损失。这两种损失可以独立或协同运作，以提高检索质量。CCR-Net 具有多种优势，包括：（1）最大限度地减少真实世界捕获图像中的数据变化；（2）区分包含高度相似但本质不同的商品的图像，从而提高商品检索能力。综合实验证明，我们的 CCR-Net 在 CUB200-2011、Perfect500k 和斯坦福在线产品数据集的商品检索任务中取得了一流的性能。

引用次数: 0

Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification 尖峰 ViT：用于钢材表面缺陷分类的尖峰神经网络与变压器注意事项

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033001

Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge

Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.

在整个钢铁生产过程中，不可避免地会出现各种表面缺陷。这些缺陷会损害钢铁产品的质量，降低生产效率。因此，对钢带表面的多种缺陷进行研究和分类至关重要。视觉变换器（ViT）是一种基于自我注意机制的独特神经网络模型，被广泛应用于许多不同学科。传统的 ViT 忽略了大脑信号传递的具体细节，而是使用激活函数来模拟真正的神经元。尖峰神经网络的基本构件之一是泄漏整合与发射（LIF），它具有与真正神经元类似的生物动力学特征。LIF 神经元以事件驱动的方式工作，因此能以更低的功耗实现更高的性能。这项工作的目标是整合 ViT 神经元和 LIF 神经元，构建并训练一种端到端混合网络架构--尖峰视觉转换器（S-ViT），用于对钢铁表面缺陷进行分类。该框架以 ViT 架构为基础，用 LIF 神经元替换了 ViT 中使用的激活函数，构建了全局尖峰特征融合模块尖峰变换器编码器和尖峰-MLP 分类头，以实现分类功能，并将其作为 S-ViT 的基本构建模块。根据实验结果，我们的方法在所有指标上都表现出了出色的分类性能。在 NEU-CLS 上，S-ViT 的总体测试准确率分别为 99.41%、99.65%、99.54% 和 99.77%；在 XSDD 上，S-ViT 的总体测试准确率分别为 95.70%、95.93%、96.94% 和 97.19%。与卷积神经网络和最新研究成果相比，S-ViT 的分类性能更为出色。与原始 ViT 模型相比，其性能也有所提高。此外，S-ViT 的鲁棒性测试结果表明，在识别含有高斯噪声的图像时，S-ViT 仍能保持可靠的准确性。

{"title":"Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification","authors":"Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge","doi":"10.1117/1.jei.33.3.033001","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033001","url":null,"abstract":"Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"29 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flotation froth image deblurring algorithm based on disentangled representations 基于分解表示的浮选泡沫图像去毛刺算法

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033011

Xianwu Huang, Yuxiao Wang, Zhao Cao, Haili Shang, Jinshan Zhang, Dahua Yu

The deblurring of flotation froth images significantly aids in the characterization of coal flotation and fault diagnosis. Images of froth acquired at a flotation site contain considerable noise and blurring, making feature extraction and segmentation processing difficult. We present an effective method for deblurring froth images based on disentangled representations. Disentangled representation is achieved by separating the content and blur features in the blurred image using a content encoder and a blur encoder. Then, the separated feature vectors are embedded into a deblurring framework to deblur the froth image. The experimental results show that this method achieves a superior deblurring effect on froth images under various conditions, which lays the foundation for the intelligent adjustment of parameters to guide the flotation site.

浮选矿沫图像的去模糊处理大大有助于煤炭浮选的特征描述和故障诊断。在浮选现场获取的浮选矿沫图像含有大量噪声和模糊，给特征提取和分割处理带来困难。我们提出了一种基于解缠表示法的有效方法来去除浮渣图像的模糊。通过使用内容编码器和模糊编码器来分离模糊图像中的内容特征和模糊特征，从而实现分离表示。然后，将分离出的特征向量嵌入去模糊框架，对模糊图像进行去模糊处理。实验结果表明，该方法在各种条件下都能对浮选图像实现出色的去毛刺效果，这为智能调整参数以指导浮选现场奠定了基础。

引用次数: 0

Effective grasp detection method based on Swin transformer 基于斯温变换器的有效抓取检测方法

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033008

Jing Zhang, Yulin Tang, Yusong Luo, Yukun Du, Mingju Chen

Grasp detection within unstructured environments encounters challenges that lead to a reduced success rate in grasping attempts, attributable to factors including object uncertainty, random positions, and differences in perspective. This work proposes a grasp detection algorithm framework, Swin-transNet, which adopts a hypothesis treating graspable objects as a generalized category and distinguishing between graspable and non-graspable objects. The utilization of the Swin transformer module in this framework augments the feature extraction process, enabling the capture of global relationships within images. Subsequently, the integration of a decoupled head with attention mechanisms further refines the channel and spatial representation of features. This strategic combination markedly improves the system’s adaptability to uncertain object categories and random positions, culminating in the precise output of grasping information. Moreover, we elucidate their roles in grasping tasks. We evaluate the grasp detection framework using the Cornell grasp dataset, which is divided into image and object levels. The experiment indicated a detection accuracy of 98.1% and a detection speed of 52 ms. Swin-transNet shows robust generalization on the Jacquard dataset, attaining a detection accuracy of 95.2%. It demonstrates an 87.8% success rate in real-world grasping testing on a visual grasping system, confirming its effectiveness for robotic grasping tasks.

非结构化环境中的抓取检测会遇到一些挑战，导致抓取尝试的成功率降低，这些因素包括物体的不确定性、位置的随机性和视角的差异。本研究提出了一种抓取检测算法框架--Swin-transNet，它采用了一种假设，将可抓取物体视为一个广义类别，并区分可抓取物体和不可抓取物体。该框架利用 Swin 变换器模块增强了特征提取过程，从而能够捕捉图像中的全局关系。随后，去耦头部与注意力机制的整合进一步完善了特征的通道和空间表示。这种策略性组合显著提高了系统对不确定物体类别和随机位置的适应能力，最终实现了抓取信息的精确输出。此外，我们还阐明了它们在抓取任务中的作用。我们使用康奈尔抓取数据集对抓取检测框架进行了评估，该数据集分为图像和物体两个层面。实验结果表明，检测准确率为 98.1%，检测速度为 52 毫秒。Swin-transNet 在 Jacquard 数据集上显示出强大的泛化能力，检测准确率达到 95.2%。在视觉抓取系统的实际抓取测试中，它的成功率达到了 87.8%，证明了它在机器人抓取任务中的有效性。

{"title":"Effective grasp detection method based on Swin transformer","authors":"Jing Zhang, Yulin Tang, Yusong Luo, Yukun Du, Mingju Chen","doi":"10.1117/1.jei.33.3.033008","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033008","url":null,"abstract":"Grasp detection within unstructured environments encounters challenges that lead to a reduced success rate in grasping attempts, attributable to factors including object uncertainty, random positions, and differences in perspective. This work proposes a grasp detection algorithm framework, Swin-transNet, which adopts a hypothesis treating graspable objects as a generalized category and distinguishing between graspable and non-graspable objects. The utilization of the Swin transformer module in this framework augments the feature extraction process, enabling the capture of global relationships within images. Subsequently, the integration of a decoupled head with attention mechanisms further refines the channel and spatial representation of features. This strategic combination markedly improves the system’s adaptability to uncertain object categories and random positions, culminating in the precise output of grasping information. Moreover, we elucidate their roles in grasping tasks. We evaluate the grasp detection framework using the Cornell grasp dataset, which is divided into image and object levels. The experiment indicated a detection accuracy of 98.1% and a detection speed of 52 ms. Swin-transNet shows robust generalization on the Jacquard dataset, attaining a detection accuracy of 95.2%. It demonstrates an 87.8% success rate in real-world grasping testing on a visual grasping system, confirming its effectiveness for robotic grasping tasks.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"42 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Truss tomato detection under artificial lighting in greenhouse using BiSR_YOLOv5 利用 BiSR_YOLOv5 在温室人工照明下检测桁架番茄

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033014

Xiaoyou Yu, Zixiao Wang, Zhonghua Miao, Nan Li, Teng Sun

The visual characteristics of greenhouse-grown tomatoes undergo significant alterations under artificial lighting, presenting substantial challenges in accurately detecting targets. To address the diverse appearances of targets, we propose an improved You Only Look Once Version 5 (YOLOv5) model named BiSR_YOLOv5, incorporating the single-point and regional feature fusion module (SRFM) and the bidirectional spatial pyramid pooling fast (Bi-SPPF) module. In addition, the model adopts SCYLLA-intersection over union loss instead of complete intersection over union loss. Experimental results reveal that the BiSR_YOLOv5 model achieves F1 and mAP@0.5 scores of 0.867 and 0.894, respectively, for detecting truss tomatoes. These scores are 2.36 and 1.82 percentage points higher than those achieved by the baseline YOLOv5 algorithm. Notably, the model maintains a size of 13.8M and achieves real-time performance at 35.1 frames per second. Analysis of detection results for both large and small objects indicates that the Bi-SPPF module, which emphasizes finer feature details, is better suited for detecting small-sized targets. Conversely, the SRFM module, with a larger receptive field, is better suited for detecting larger targets. In summary, the BiSR YOLOv5 test results validate the positive impact of accurate identification on subsequent agricultural operations, such as yield estimation or harvest. This is achieved through the implementation of a simple maturity algorithm that utilizes the process of “finding flaws.”

温室栽培番茄的视觉特征在人工照明下会发生显著变化，这给准确检测目标带来了巨大挑战。针对目标的多样化外观，我们提出了一种改进的 "只看一次 "模型 5（YOLOv5），并将其命名为 BiSR_YOLOv5，该模型融合了单点和区域特征融合模块（SRFM）以及双向空间金字塔池化快速模块（Bi-SPPF）。此外，该模型还采用了 SCYLLA-intersection over union loss，而不是完全相交 over union loss。实验结果表明，BiSR_YOLOv5 模型在检测桁架番茄方面的 F1 和 mAP@0.5 分数分别为 0.867 和 0.894。这些分数比基准 YOLOv5 算法分别高出 2.36 和 1.82 个百分点。值得注意的是，该模型的大小保持在 13.8M，并实现了每秒 35.1 帧的实时性能。对大型和小型物体的检测结果分析表明，Bi-SPPF 模块强调更精细的特征细节，更适合检测小型目标。相反，SRFM 模块的感受野较大，更适合检测较大的目标。总之，BiSR YOLOv5 的测试结果验证了准确识别对后续农业操作（如估产或收割）的积极影响。这是通过实施一种利用 "发现缺陷 "过程的简单成熟算法实现的。

{"title":"Truss tomato detection under artificial lighting in greenhouse using BiSR_YOLOv5","authors":"Xiaoyou Yu, Zixiao Wang, Zhonghua Miao, Nan Li, Teng Sun","doi":"10.1117/1.jei.33.3.033014","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033014","url":null,"abstract":"The visual characteristics of greenhouse-grown tomatoes undergo significant alterations under artificial lighting, presenting substantial challenges in accurately detecting targets. To address the diverse appearances of targets, we propose an improved You Only Look Once Version 5 (YOLOv5) model named BiSR_YOLOv5, incorporating the single-point and regional feature fusion module (SRFM) and the bidirectional spatial pyramid pooling fast (Bi-SPPF) module. In addition, the model adopts SCYLLA-intersection over union loss instead of complete intersection over union loss. Experimental results reveal that the BiSR_YOLOv5 model achieves F1 and mAP@0.5 scores of 0.867 and 0.894, respectively, for detecting truss tomatoes. These scores are 2.36 and 1.82 percentage points higher than those achieved by the baseline YOLOv5 algorithm. Notably, the model maintains a size of 13.8M and achieves real-time performance at 35.1 frames per second. Analysis of detection results for both large and small objects indicates that the Bi-SPPF module, which emphasizes finer feature details, is better suited for detecting small-sized targets. Conversely, the SRFM module, with a larger receptive field, is better suited for detecting larger targets. In summary, the BiSR YOLOv5 test results validate the positive impact of accurate identification on subsequent agricultural operations, such as yield estimation or harvest. This is achieved through the implementation of a simple maturity algorithm that utilizes the process of “finding flaws.”","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"60 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing steganography capacity through multi-stage generator model in generative adversarial network based image concealment 在基于生成对抗网络的图像隐藏技术中，通过多级生成器模型提高隐写能力

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033026

Bisma Sultan, Mohd. Arif Wani

Traditional steganography algorithms use procedures created by human experts to conceal the secret message inside a cover medium. Generative adversarial networks (GANs) have recently been used to automate this process. However, GAN based steganography has some limitations. The capacity of these models is limited. By increasing the steganography capacity, security is decreased, and distortion is increased. The performance of the extractor network also decreases with increasing the steganography capacity. In this work, an approach for developing a generator model for image steganography is proposed. The approach involves building a generator model, called the late embedding generator model, in two stages. The first stage of the generator model uses only the flattened cover image, and second stage uses a secret message and the first stage’s output to generate the stego image. Furthermore, a dual-training strategy is employed to train the generator network: the first stage focuses on learning fundamental image features through a reconstruction loss, and the second stage is trained with three loss terms, including an adversarial loss, to incorporate the secret message. The proposed approach demonstrates that hiding data only in the deeper layers of the generator network boosts capacity without requiring complex architectures, reducing computational storage requirements. The efficacy of the proposed approach is evaluated by varying the depth of these two stages, resulting in four generator models. A comprehensive set of experiments was performed on the CelebA dataset, which contains more than 200,000 samples. The results show that the late embedding model performs better than the state-of-the-art models. Also, it increases the steganography capacity to more than four times compared with the existing GAN-based steganography methods. The extracted payload achieves an accuracy of 99.98%, with the extractor model successfully decoding the secret message.

传统的隐写术算法使用人类专家创建的程序将秘密信息隐藏在覆盖介质中。最近，生成对抗网络（GANs）被用来自动完成这一过程。然而，基于生成式对抗网络的隐写术有一些局限性。这些模型的容量有限。通过增加隐写术的容量，安全性会降低，失真会增加。提取网络的性能也会随着隐写术容量的增加而降低。本研究提出了一种开发图像隐写术生成器模型的方法。该方法包括分两个阶段建立一个生成器模型，称为后期嵌入生成器模型。生成器模型的第一阶段只使用扁平化的覆盖图像，第二阶段使用密文和第一阶段的输出来生成隐去图像。此外，还采用了双重训练策略来训练生成器网络：第一阶段侧重于通过重构损失来学习基本图像特征，第二阶段则通过三个损失项（包括对抗损失）来训练，以纳入秘密信息。所提出的方法证明，只在生成器网络的深层隐藏数据可以提高容量，而不需要复杂的架构，从而降低了计算存储要求。通过改变这两个阶段的深度，产生了四种生成器模型，从而评估了所提方法的功效。在包含 20 多万个样本的 CelebA 数据集上进行了一系列综合实验。结果表明，后期嵌入模型的性能优于最先进的模型。同时，与现有的基于 GAN 的隐写方法相比，它将隐写能力提高了四倍多。提取有效载荷的准确率达到 99.98%，提取模型成功解码了秘密信息。

{"title":"Enhancing steganography capacity through multi-stage generator model in generative adversarial network based image concealment","authors":"Bisma Sultan, Mohd. Arif Wani","doi":"10.1117/1.jei.33.3.033026","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033026","url":null,"abstract":"Traditional steganography algorithms use procedures created by human experts to conceal the secret message inside a cover medium. Generative adversarial networks (GANs) have recently been used to automate this process. However, GAN based steganography has some limitations. The capacity of these models is limited. By increasing the steganography capacity, security is decreased, and distortion is increased. The performance of the extractor network also decreases with increasing the steganography capacity. In this work, an approach for developing a generator model for image steganography is proposed. The approach involves building a generator model, called the late embedding generator model, in two stages. The first stage of the generator model uses only the flattened cover image, and second stage uses a secret message and the first stage’s output to generate the stego image. Furthermore, a dual-training strategy is employed to train the generator network: the first stage focuses on learning fundamental image features through a reconstruction loss, and the second stage is trained with three loss terms, including an adversarial loss, to incorporate the secret message. The proposed approach demonstrates that hiding data only in the deeper layers of the generator network boosts capacity without requiring complex architectures, reducing computational storage requirements. The efficacy of the proposed approach is evaluated by varying the depth of these two stages, resulting in four generator models. A comprehensive set of experiments was performed on the CelebA dataset, which contains more than 200,000 samples. The results show that the late embedding model performs better than the state-of-the-art models. Also, it increases the steganography capacity to more than four times compared with the existing GAN-based steganography methods. The extracted payload achieves an accuracy of 99.98%, with the extractor model successfully decoding the secret message.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"46 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141189241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-view stereo of an object immersed in a refractive medium 沉浸在折射介质中的物体的多视角立体成像

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033005

Robin Bruneau, Baptiste Brument, Lilian Calvet, Matthew Cassidy, Jean Mélou, Yvain Quéau, Jean-Denis Durou, François Lauze

In this article, we show how to extend the multi-view stereo technique when the object to be reconstructed is inside a transparent but refractive medium, which causes distortions in the images. We provide a theoretical formulation of the problem accounting for a general non-planar shape of the refractive interface, and then a discrete solving method. We also present a pipeline to recover precisely the geometry of the refractive interface, considered as a convex polyhedral object. It is based on the extraction of visible polyhedron vertices from silhouette images and matching across a sequence of images acquired under circular camera motion. These contributions are validated by tests on synthetic and real data.

在本文中，我们展示了当要重建的物体位于透明但具有折射性的介质中时，如何扩展多视角立体技术，因为折射介质会导致图像失真。我们提供了折射界面一般非平面形状的问题理论表述，以及离散求解方法。我们还提出了一种精确恢复折射界面几何形状的方法，将其视为一个凸多面体物体。该方法基于从剪影图像中提取可见的多面体顶点，并在相机圆周运动的情况下对获取的图像序列进行匹配。对合成数据和真实数据的测试验证了这些贡献。

引用次数: 0

Polarization spatial and semantic learning lightweight network for underwater salient object detection 用于水下突出物体探测的偏振空间和语义学习轻量级网络

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033010

Xiaowen Yang, Qingwu Li, Dabing Yu, Zheng Gao, Guanying Huo

The absorption by a water body and the scattering of suspended particles cause blurring of object features, which results in a reduced accuracy of underwater salient object detection (SOD). Thus, we propose a polarization spatial and semantic learning lightweight network for underwater SOD. The proposed method is based on a lightweight MobileNetV2 network. Because lightweight networks are not as capable as deep networks in capturing and learning features of complex objects, we build specific feature extraction and fusion modules at different depth stages of backbone network feature extraction to enhance the feature learning capability of the lightweight backbone network. Specifically, we embed a structural feature learning module in the low-level feature extraction stage and a semantic feature learning module in the high-level feature extraction stage to maintain the spatial consistency of low-level features and the semantic commonality of high-level features. We acquired polarized images of underwater objects in natural underwater scenes and constructed a polarized object detection dataset (PODD) for object detection in the underwater environment. Experimental results show that the detection effect of the proposed method on the PODD is better than other SOD methods. Also, we conduct comparative experiments on the RGB-thermal (RGB-T) and RGB-depth (RGB-D) datasets to verify the generalization of the proposed method.

水体的吸收和悬浮颗粒的散射会导致物体特征模糊，从而降低水下突出物体检测（SOD）的精度。因此，我们提出了一种用于水下 SOD 的极化空间和语义学习轻量级网络。所提出的方法基于轻量级 MobileNetV2 网络。由于轻量级网络捕捉和学习复杂物体特征的能力不如深度网络，我们在骨干网络特征提取的不同深度阶段建立了特定的特征提取和融合模块，以增强轻量级骨干网络的特征学习能力。具体来说，我们在低层次特征提取阶段嵌入结构化特征学习模块，在高层次特征提取阶段嵌入语义特征学习模块，以保持低层次特征的空间一致性和高层次特征的语义共通性。我们获取了自然水下场景中水下物体的偏振图像，并构建了偏振物体检测数据集（PODD），用于水下环境中的物体检测。实验结果表明，本文提出的方法在 PODD 上的检测效果优于其他 SOD 方法。此外，我们还在 RGB-热（RGB-T）和 RGB-深度（RGB-D）数据集上进行了对比实验，以验证所提方法的通用性。

{"title":"Polarization spatial and semantic learning lightweight network for underwater salient object detection","authors":"Xiaowen Yang, Qingwu Li, Dabing Yu, Zheng Gao, Guanying Huo","doi":"10.1117/1.jei.33.3.033010","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033010","url":null,"abstract":"The absorption by a water body and the scattering of suspended particles cause blurring of object features, which results in a reduced accuracy of underwater salient object detection (SOD). Thus, we propose a polarization spatial and semantic learning lightweight network for underwater SOD. The proposed method is based on a lightweight MobileNetV2 network. Because lightweight networks are not as capable as deep networks in capturing and learning features of complex objects, we build specific feature extraction and fusion modules at different depth stages of backbone network feature extraction to enhance the feature learning capability of the lightweight backbone network. Specifically, we embed a structural feature learning module in the low-level feature extraction stage and a semantic feature learning module in the high-level feature extraction stage to maintain the spatial consistency of low-level features and the semantic commonality of high-level features. We acquired polarized images of underwater objects in natural underwater scenes and constructed a polarized object detection dataset (PODD) for object detection in the underwater environment. Experimental results show that the detection effect of the proposed method on the PODD is better than other SOD methods. Also, we conduct comparative experiments on the RGB-thermal (RGB-T) and RGB-depth (RGB-D) datasets to verify the generalization of the proposed method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"14 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image watermarking scheme employing Gerchberg–Saxton algorithm and integer wavelet transform 采用 Gerchberg-Saxton 算法和整数小波变换的图像水印方案

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033003

Chaoxia Zhang, Kaiqi Liang, Shangzhou Zhang, Zhihao Chen

Image watermarking technology plays a key role in the protection of intellectual property rights. In addition to digital watermarking, optical watermarking has also been widely considered. A watermarking scheme based on Gerchberg–Saxton (GS) algorithm and integer wavelet transform (IWT) is proposed for image encryption. The scheme uses the unique phase reconstruction characteristics of GS algorithm, which makes it able to deal with a variety of complex local attacks in the process of protection and effectively restore the original image information. The obfuscation of position and numerical value information is realized by means of variable step Joseph space scrambling and pixel value bit processing. The carrier image is decomposed into subbands with different frequencies using IWT, all the information of the secret image is embedded bit by bit, which realizes the hiding of the image information. In addition, the SHA-256 function, the RSA algorithm, and the hyperchaotic system are combined to generate the cipher stream. The experimental results show that the algorithm has good imperceptibility and security, as well as strong robustness to cutting and salt and pepper noise attacks, and can restore the image quality well.

图像水印技术在知识产权保护方面发挥着关键作用。除了数字水印，光学水印也被广泛考虑。本文提出了一种基于 Gerchberg-Saxton 算法和整数小波变换（IWT）的图像加密水印方案。该方案利用了 GS 算法独特的相位重构特性，使其能够应对保护过程中各种复杂的局部攻击，有效还原原始图像信息。通过变步长约瑟夫空间加扰和像素值比特处理，实现对位置和数值信息的混淆。利用 IWT 将载波图像分解为不同频率的子带，逐位嵌入秘密图像的所有信息，实现图像信息的隐藏。此外，还结合了 SHA-256 函数、RSA 算法和超混沌系统来生成密码流。实验结果表明，该算法具有良好的不可感知性和安全性，对切割和椒盐噪声攻击具有很强的鲁棒性，能很好地还原图像质量。

{"title":"Image watermarking scheme employing Gerchberg–Saxton algorithm and integer wavelet transform","authors":"Chaoxia Zhang, Kaiqi Liang, Shangzhou Zhang, Zhihao Chen","doi":"10.1117/1.jei.33.3.033003","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033003","url":null,"abstract":"Image watermarking technology plays a key role in the protection of intellectual property rights. In addition to digital watermarking, optical watermarking has also been widely considered. A watermarking scheme based on Gerchberg–Saxton (GS) algorithm and integer wavelet transform (IWT) is proposed for image encryption. The scheme uses the unique phase reconstruction characteristics of GS algorithm, which makes it able to deal with a variety of complex local attacks in the process of protection and effectively restore the original image information. The obfuscation of position and numerical value information is realized by means of variable step Joseph space scrambling and pixel value bit processing. The carrier image is decomposed into subbands with different frequencies using IWT, all the information of the secret image is embedded bit by bit, which realizes the hiding of the image information. In addition, the SHA-256 function, the RSA algorithm, and the hyperchaotic system are combined to generate the cipher stream. The experimental results show that the algorithm has good imperceptibility and security, as well as strong robustness to cutting and salt and pepper noise attacks, and can restore the image quality well.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"56 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140839765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving image super-resolution with structured knowledge distillation-based multimodal denoising diffusion probabilistic model 利用基于结构化知识提炼的多模态去噪扩散概率模型提高图像超分辨率

IF 1.1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

Journal of Electronic Imaging

Pub Date : 2024-05-01 DOI: 10.1117/1.jei.33.3.033004

Li Huang, JingKe Yan, Min Wang, Qin Wang

In the realm of low-resolution (LR) to high-resolution (HR) image reconstruction, denoising diffusion probabilistic models (DDPMs) are recognized for their superior perceptual quality over other generative models, attributed to their adept handling of various degradation factors in LR images, such as noise and blur. However, DDPMs predominantly focus on a single modality in the super-resolution (SR) image reconstruction from LR images, thus overlooking the rich potential information in multimodal data. This lack of integration and comprehensive processing of multimodal data can impede the full utilization of the complementary characteristics of different data types, limiting their effectiveness across a broad range of applications. Moreover, DDPMs require thousands of evaluations to reconstruct high-quality SR images, which significantly impacts their efficiency. In response to these challenges, a novel multimodal DDPM based on structured knowledge distillation (MKDDPM) is introduced. This approach features a multimodal-based DDPM that effectively leverages sparse prior information from another modality, integrated into the MKDDPM network architecture to optimize the solution space and detail features of the reconstructed image. Furthermore, a structured knowledge distillation method is proposed, leveraging a well-trained DDPM and iteratively learning a new DDPM, with each iteration requiring only half the original sampling steps. This method significantly reduces the number of model sampling steps without compromising on sampling quality. Experimental results demonstrate that MKDDPM, even with a substantially reduced number of diffusion steps, still achieves superior performance, providing a novel solution for single-image SR tasks.

在从低分辨率（LR）到高分辨率（HR）的图像重建领域，去噪扩散概率模型（DDPMs）因其比其他生成模型更优越的感知质量而得到认可，这归功于它们对 LR 图像中各种退化因素（如噪声和模糊）的出色处理。然而，DDPM 在从 LR 图像重建超分辨率（SR）图像时主要关注单一模态，从而忽略了多模态数据中丰富的潜在信息。缺乏对多模态数据的整合和综合处理，会妨碍充分利用不同数据类型的互补特性，限制其在广泛应用中的有效性。此外，DDPM 需要经过数千次评估才能重建高质量的 SR 图像，这极大地影响了其效率。为了应对这些挑战，我们引入了一种基于结构化知识提炼的新型多模态 DDPM（MKDDPM）。这种方法的特点是基于多模态的 DDPM 能有效利用来自另一种模态的稀疏先验信息，并将其集成到 MKDDPM 网络架构中，以优化解空间和重建图像的细节特征。此外，还提出了一种结构化知识提炼方法，利用训练有素的 DDPM，迭代学习新的 DDPM，每次迭代只需原来一半的采样步骤。这种方法在不影响采样质量的前提下，大大减少了模型采样步骤的数量。实验结果表明，MKDDPM 即使大幅减少了扩散步骤的数量，仍能实现卓越的性能，为单图像 SR 任务提供了一种新颖的解决方案。

{"title":"Improving image super-resolution with structured knowledge distillation-based multimodal denoising diffusion probabilistic model","authors":"Li Huang, JingKe Yan, Min Wang, Qin Wang","doi":"10.1117/1.jei.33.3.033004","DOIUrl":"https://doi.org/10.1117/1.jei.33.3.033004","url":null,"abstract":"In the realm of low-resolution (LR) to high-resolution (HR) image reconstruction, denoising diffusion probabilistic models (DDPMs) are recognized for their superior perceptual quality over other generative models, attributed to their adept handling of various degradation factors in LR images, such as noise and blur. However, DDPMs predominantly focus on a single modality in the super-resolution (SR) image reconstruction from LR images, thus overlooking the rich potential information in multimodal data. This lack of integration and comprehensive processing of multimodal data can impede the full utilization of the complementary characteristics of different data types, limiting their effectiveness across a broad range of applications. Moreover, DDPMs require thousands of evaluations to reconstruct high-quality SR images, which significantly impacts their efficiency. In response to these challenges, a novel multimodal DDPM based on structured knowledge distillation (MKDDPM) is introduced. This approach features a multimodal-based DDPM that effectively leverages sparse prior information from another modality, integrated into the MKDDPM network architecture to optimize the solution space and detail features of the reconstructed image. Furthermore, a structured knowledge distillation method is proposed, leveraging a well-trained DDPM and iteratively learning a new DDPM, with each iteration requiring only half the original sampling steps. This method significantly reduces the number of model sampling steps without compromising on sampling quality. Experimental results demonstrate that MKDDPM, even with a substantially reduced number of diffusion steps, still achieves superior performance, providing a novel solution for single-image SR tasks.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"68 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0