首页 > 最新文献

Journal of Electronic Imaging最新文献

英文 中文
Small space target detection using megapixel resolution CeleX-V camera 利用百万像素分辨率的 CeleX-V 摄像机探测小空间目标
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053002
Yuanyuan Lv, Liang Zhou, Zhaohui Liu, Wenlong Qiao, Haiyang Zhang
An event camera (EC) is a bioinspired vision sensor with the advantages of a high temporal resolution, high dynamic range, and low latency. Due to the inherent sparsity of space target imaging data, EC becomes an ideal imaging sensor for space target detection. In this work, we conduct detection of small space targets using a CeleX-V camera with a megapixel resolution. We propose a target detection method based on field segmentation, utilizing the event output characteristics of an EC. This method enables real-time monitoring of the spatial positions of space targets within the camera’s field of view. The effectiveness of this approach is validated through experiments involving real-world observations of space targets. Using the proposed method, real-time observation of space targets with a megapixel resolution EC becomes feasible, demonstrating substantial practical potential in the field of space target detection.
事件相机(EC)是一种生物启发视觉传感器,具有高时间分辨率、高动态范围和低延迟等优点。由于空间目标成像数据固有的稀缺性,事件相机成为空间目标探测的理想成像传感器。在这项工作中,我们使用具有百万像素分辨率的 CeleX-V 相机对小型空间目标进行了探测。我们利用电子镇流器的事件输出特性,提出了一种基于场分割的目标检测方法。这种方法可以实时监控摄像头视场内空间目标的空间位置。通过对空间目标的实际观测实验,验证了这种方法的有效性。利用所提出的方法,使用百万像素分辨率的电子镇流器对空间目标进行实时观测是可行的,显示了空间目标探测领域的巨大实用潜力。
{"title":"Small space target detection using megapixel resolution CeleX-V camera","authors":"Yuanyuan Lv, Liang Zhou, Zhaohui Liu, Wenlong Qiao, Haiyang Zhang","doi":"10.1117/1.jei.33.5.053002","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053002","url":null,"abstract":"An event camera (EC) is a bioinspired vision sensor with the advantages of a high temporal resolution, high dynamic range, and low latency. Due to the inherent sparsity of space target imaging data, EC becomes an ideal imaging sensor for space target detection. In this work, we conduct detection of small space targets using a CeleX-V camera with a megapixel resolution. We propose a target detection method based on field segmentation, utilizing the event output characteristics of an EC. This method enables real-time monitoring of the spatial positions of space targets within the camera’s field of view. The effectiveness of this approach is validated through experiments involving real-world observations of space targets. Using the proposed method, real-time observation of space targets with a megapixel resolution EC becomes feasible, demonstrating substantial practical potential in the field of space target detection.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Light field salient object detection network based on feature enhancement and mutual attention 基于特征增强和相互注意的光场突出物体检测网络
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053001
Xi Zhu, Huai Xia, Xucheng Wang, Zhenrong Zheng
Light field salient object detection (SOD) is an essential research topic in computer vision, but robust saliency detection in complex scenes is still very challenging. We propose a new method for accurate and robust light field SOD via convolutional neural networks containing feature enhancement modules. First, the light field dataset is extended by geometric transformations such as stretching, cropping, flipping, and rotating. Next, two feature enhancement modules are designed to extract features from RGB images and depth maps, respectively. The obtained feature maps are fed into a two-stream network to train the light field SOD. We propose a mutual attention approach in this process, extracting and fusing features from RGB images and depth maps. Therefore, our network can generate an accurate saliency map from the input light field images after training. The obtained saliency map can provide reliable a priori information for tasks such as semantic segmentation, target recognition, and visual tracking. Experimental results show that the proposed method achieves excellent detection performance in public benchmark datasets and outperforms the state-of-the-art methods. We also verify the generalization and stability of the method in real-world experiments.
光场突出物检测(SOD)是计算机视觉领域的一个重要研究课题,但在复杂场景中进行鲁棒性突出物检测仍然非常具有挑战性。我们提出了一种新方法,通过包含特征增强模块的卷积神经网络实现准确、稳健的光场 SOD。首先,通过拉伸、裁剪、翻转和旋转等几何变换扩展光场数据集。接着,设计了两个特征增强模块,分别从 RGB 图像和深度图中提取特征。获得的特征图被输入双流网络,以训练光场 SOD。在这一过程中,我们提出了一种相互关注的方法,从 RGB 图像和深度图中提取并融合特征。因此,经过训练后,我们的网络可以从输入的光场图像中生成精确的显著性图。获得的显著性图可以为语义分割、目标识别和视觉跟踪等任务提供可靠的先验信息。实验结果表明,所提出的方法在公共基准数据集上取得了优异的检测性能,优于最先进的方法。我们还在实际实验中验证了该方法的通用性和稳定性。
{"title":"Light field salient object detection network based on feature enhancement and mutual attention","authors":"Xi Zhu, Huai Xia, Xucheng Wang, Zhenrong Zheng","doi":"10.1117/1.jei.33.5.053001","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053001","url":null,"abstract":"Light field salient object detection (SOD) is an essential research topic in computer vision, but robust saliency detection in complex scenes is still very challenging. We propose a new method for accurate and robust light field SOD via convolutional neural networks containing feature enhancement modules. First, the light field dataset is extended by geometric transformations such as stretching, cropping, flipping, and rotating. Next, two feature enhancement modules are designed to extract features from RGB images and depth maps, respectively. The obtained feature maps are fed into a two-stream network to train the light field SOD. We propose a mutual attention approach in this process, extracting and fusing features from RGB images and depth maps. Therefore, our network can generate an accurate saliency map from the input light field images after training. The obtained saliency map can provide reliable a priori information for tasks such as semantic segmentation, target recognition, and visual tracking. Experimental results show that the proposed method achieves excellent detection performance in public benchmark datasets and outperforms the state-of-the-art methods. We also verify the generalization and stability of the method in real-world experiments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video anomaly detection based on frame memory bank and decoupled asymmetric convolutions 基于帧记忆库和解耦非对称卷积的视频异常检测
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053006
Min Zhao, Chuanxu Wang, Jiajiong Li, Zitai Jiang
Video anomaly detection (VAD) is essential for monitoring systems. The prediction-based methods identify anomalies by comparing differences between the predicted and real frames. We propose an unsupervised VAD method based on frame memory bank (FMB) and decoupled asymmetric convolution (DAConv), which addresses three problems encountered with auto-encoders (AE) in VAD: (1) how to mitigate the noise resulting from jittering between frames, which is ignored; (2) how to alleviate the insufficient utilization of temporal information by traditional two-dimensional (2D) convolution and the burden for more computing resources in three-dimensional (3D) convolution; and (3) how to make full use of normal data to improve the reliability of anomaly discrimination. Specifically, we initially design a separate network to calibrate video frames within the dataset. Second, we design DAConv to extract features from the video, addressing the absence of temporal dimension information in 2D convolutions and the high computational complexity of 3D convolutions. Concurrently, the interval-frame mechanism mitigates the problem of information redundancy caused by data reuse. Finally, we embed an FMB to store features of normal events, amplifying the contrast between normal and abnormal frames. We conduct extensive experiments on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, achieving AUC values of 98.7%, 90.4%, and 74.8%, respectively, which fully demonstrates the rationality and effectiveness of the proposed method.
视频异常检测(VAD)对监控系统至关重要。基于预测的方法通过比较预测帧和真实帧之间的差异来识别异常。我们提出了一种基于帧记忆库(FMB)和解耦非对称卷积(DAConv)的无监督 VAD 方法,该方法解决了自动编码器(AE)在 VAD 中遇到的三个问题:(1) 如何降低被忽略的帧间抖动产生的噪声;(2) 如何减轻传统二维卷积(2D)对时间信息利用不足和三维卷积(3D)对更多计算资源造成的负担;以及 (3) 如何充分利用正常数据来提高异常判别的可靠性。具体来说,我们首先设计了一个单独的网络来校准数据集中的视频帧。其次,我们设计了 DAConv 从视频中提取特征,解决了二维卷积中缺乏时间维度信息和三维卷积计算复杂度高的问题。同时,间隔帧机制减轻了数据重复使用造成的信息冗余问题。最后,我们嵌入了一个 FMB 来存储正常事件的特征,从而扩大了正常帧和异常帧之间的对比度。我们在 UCSD Ped2、CUHK Avenue 和 ShanghaiTech 数据集上进行了大量实验,AUC 值分别达到 98.7%、90.4% 和 74.8%,充分证明了所提方法的合理性和有效性。
{"title":"Video anomaly detection based on frame memory bank and decoupled asymmetric convolutions","authors":"Min Zhao, Chuanxu Wang, Jiajiong Li, Zitai Jiang","doi":"10.1117/1.jei.33.5.053006","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053006","url":null,"abstract":"Video anomaly detection (VAD) is essential for monitoring systems. The prediction-based methods identify anomalies by comparing differences between the predicted and real frames. We propose an unsupervised VAD method based on frame memory bank (FMB) and decoupled asymmetric convolution (DAConv), which addresses three problems encountered with auto-encoders (AE) in VAD: (1) how to mitigate the noise resulting from jittering between frames, which is ignored; (2) how to alleviate the insufficient utilization of temporal information by traditional two-dimensional (2D) convolution and the burden for more computing resources in three-dimensional (3D) convolution; and (3) how to make full use of normal data to improve the reliability of anomaly discrimination. Specifically, we initially design a separate network to calibrate video frames within the dataset. Second, we design DAConv to extract features from the video, addressing the absence of temporal dimension information in 2D convolutions and the high computational complexity of 3D convolutions. Concurrently, the interval-frame mechanism mitigates the problem of information redundancy caused by data reuse. Finally, we embed an FMB to store features of normal events, amplifying the contrast between normal and abnormal frames. We conduct extensive experiments on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, achieving AUC values of 98.7%, 90.4%, and 74.8%, respectively, which fully demonstrates the rationality and effectiveness of the proposed method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-injective scale aggregation network for crowd counting 用于人群计数的注意力注入式规模聚合网络
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053008
Haojie Zou, Yingchun Kuang, Jianqiang Luo, Mingwei Yao, Haoyu Zhou, Sha Yang
Crowd counting has gained widespread attention in the fields of public safety management, video surveillance, and emergency response. Currently, background interference and scale variation of the head are still intractable problems. We propose an attention-injective scale aggregation network (ASANet) to cope with the above problems. ASANet consists of three parts: shallow feature attention network (SFAN), multi-level feature aggregation (MLFA) module, and density map generation (DMG) network. SFAN effectively overcomes the noise impact of a cluttered background by cross-injecting the attention module in the truncated VGG16 structure. To fully utilize the multi-scale crowd information embedded in the feature layers at different positions, we densely connect the multi-layer feature maps in the MLFA module to solve the scale variation problem. In addition, to capture large-scale head information, the DMG network introduces successive dilated convolutional layers to further expand the receptive field of the model, thus improving the accuracy of crowd counting. We conduct extensive experiments on five public datasets (ShanghaiTech Part_A, ShanghaiTech Part_B, UCF_QNRF, UCF_CC_50, JHU-Crowd++), and the results show that ASANet outperforms most of the existing methods in terms of counting and at the same time demonstrates satisfactory superiority in dealing with background noise in different scenes.
人群计数在公共安全管理、视频监控和应急响应等领域受到广泛关注。目前,背景干扰和头部规模变化仍是难以解决的问题。针对上述问题,我们提出了一种注意力注入式规模聚合网络(ASANet)。ASANet 由三部分组成:浅层特征注意网络(SFAN)、多层次特征聚合(MLFA)模块和密度图生成网络(DMG)。SFAN 通过在截断的 VGG16 结构中交叉注入注意力模块,有效克服了杂乱背景带来的噪声影响。为了充分利用不同位置特征层中蕴含的多尺度人群信息,我们在 MLFA 模块中密集连接了多层特征图,以解决尺度变化问题。此外,为了捕捉大规模的头部信息,DMG 网络引入了连续的扩张卷积层,进一步扩大了模型的感受野,从而提高了人群计数的准确性。我们在五个公开数据集(ShanghaiTech Part_A、ShanghaiTech Part_B、UCF_QNRF、UCF_CC_50、JHU-Crowd++)上进行了大量实验,结果表明 ASANet 在计数方面优于大多数现有方法,同时在处理不同场景的背景噪声方面也表现出令人满意的优势。
{"title":"Attention-injective scale aggregation network for crowd counting","authors":"Haojie Zou, Yingchun Kuang, Jianqiang Luo, Mingwei Yao, Haoyu Zhou, Sha Yang","doi":"10.1117/1.jei.33.5.053008","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053008","url":null,"abstract":"Crowd counting has gained widespread attention in the fields of public safety management, video surveillance, and emergency response. Currently, background interference and scale variation of the head are still intractable problems. We propose an attention-injective scale aggregation network (ASANet) to cope with the above problems. ASANet consists of three parts: shallow feature attention network (SFAN), multi-level feature aggregation (MLFA) module, and density map generation (DMG) network. SFAN effectively overcomes the noise impact of a cluttered background by cross-injecting the attention module in the truncated VGG16 structure. To fully utilize the multi-scale crowd information embedded in the feature layers at different positions, we densely connect the multi-layer feature maps in the MLFA module to solve the scale variation problem. In addition, to capture large-scale head information, the DMG network introduces successive dilated convolutional layers to further expand the receptive field of the model, thus improving the accuracy of crowd counting. We conduct extensive experiments on five public datasets (ShanghaiTech Part_A, ShanghaiTech Part_B, UCF_QNRF, UCF_CC_50, JHU-Crowd++), and the results show that ASANet outperforms most of the existing methods in terms of counting and at the same time demonstrates satisfactory superiority in dealing with background noise in different scenes.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Infrared and visible image fusion based on global context network 基于全球背景网络的红外和可见光图像融合
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053016
Yonghong Li, Yu Shi, Xingcheng Pu, Suqiang Zhang
Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.
在红外图像和可见光图像的融合过程中,通常会将来自两种不同传感器类型的热辐射和纹理数据结合起来,生成单一图像。近年来,基于深度学习的卷积神经网络(CNN)已成为许多红外图像与可见光图像融合方法的主流技术,它往往提取浅层特征,忽略了长距离依赖关系在融合任务中的作用。然而,由于其局部感知特性,CNN 只能通过不断堆叠卷积层来获取全局上下文信息,导致网络效率低、优化困难。针对这一问题,我们提出了全局上下文融合网络(GCFN),利用全局注意力池建立上下文模型,采用两阶段策略。首先,训练基于 GCFN 的自动编码器网络,以提取多尺度的局部和全局上下文特征。为了有效整合输入图像的互补信息,第二步设计了一个结合 CNN 和变换器的双分支融合网络。在公开数据集上的实验结果表明,所提出的方法在主观和客观指标上的融合性能均优于九种先进方法。
{"title":"Infrared and visible image fusion based on global context network","authors":"Yonghong Li, Yu Shi, Xingcheng Pu, Suqiang Zhang","doi":"10.1117/1.jei.33.5.053016","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053016","url":null,"abstract":"Thermal radiation and texture data from two different sensor types are usually combined in the fusion of infrared and visible images for generating a single image. In recent years, convolutional neural network (CNN) based on deep learning has become the mainstream technology for many infrared and visible image fusion methods, which often extracts shallow features and ignores the role of long-range dependencies in the fusion task. However, due to its local perception characteristics, CNN can only obtain global contextual information by continuously stacking convolutional layers, which leads to low network efficiency and difficulty in optimization. To address this issue, we proposed a global context fusion network (GCFN) to model context using a global attention pool, which adopts a two-stage strategy. First, a GCFN-based autoencoder network is trained for extracting multi-scale local and global contextual features. To effectively incorporate the complementary information of the input image, a dual branch fusion network combining CNN and transformer is designed in the second step. Experimental results on a publicly available dataset demonstrate that the proposed method outperforms nine advanced methods in fusion performance on both subjective and objective metrics.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative object separation in X-ray images X 射线图像中的生成物体分离
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053004
Xiaolong Zheng, Yu Zhou, Jia Yao, Liang Zheng
X-ray imaging is essential for security inspection; nevertheless, the penetrability of X-rays can cause objects within a package to overlap in X-ray images, leading to reduced accuracy in manual inspection and increased difficulty in auxiliary inspection techniques. Existing methods mainly focus on object detection to enhance the detection ability of models for overlapping regions by augmenting image features, including color, texture, and semantic information. However, these approaches do not address the underlying issue of overlap. We propose a novel method for separating overlapping objects in X-ray images from the perspective of image inpainting. Specifically, the separation method involves using a vision transformer (ViT) to construct a generative adversarial network (GAN) model that requires a hand-created trimap as input. In addition, we present an end-to-end approach that integrates Mask Region-based Convolutional Neural Network with the separation network to achieve fully automated separation of overlapping objects. Given the lack of datasets appropriate for training separation networks, we created MaskXray, a collection of X-ray images that includes overlapping images, trimap, and individual object images. Our proposed generative separation network was tested in experiments and demonstrated its ability to accurately separate overlapping objects in X-ray images. These results demonstrate the efficacy of our approach and make significant contributions to the field of X-ray image analysis.
X 射线成像在安全检查中是必不可少的;然而,X 射线的穿透性会导致包装内的物体在 X 射线图像中重叠,从而降低人工检查的准确性,增加辅助检查技术的难度。现有方法主要侧重于物体检测,通过增强图像特征(包括颜色、纹理和语义信息)来提高模型对重叠区域的检测能力。然而,这些方法并没有解决重叠的根本问题。我们从图像内绘的角度出发,提出了一种分离 X 射线图像中重叠物体的新方法。具体来说,分离方法包括使用视觉转换器(ViT)来构建生成式对抗网络(GAN)模型,该模型需要一个手工创建的三维图作为输入。此外,我们还提出了一种端到端的方法,将基于掩码区域的卷积神经网络与分离网络集成在一起,从而实现重叠对象的全自动分离。鉴于缺乏适合训练分离网络的数据集,我们创建了 MaskXray,这是一个 X 射线图像集,其中包括重叠图像、trimap 和单个物体图像。我们提出的生成式分离网络在实验中进行了测试,证明其有能力准确分离 X 射线图像中的重叠物体。这些结果证明了我们方法的有效性,并为 X 射线图像分析领域做出了重大贡献。
{"title":"Generative object separation in X-ray images","authors":"Xiaolong Zheng, Yu Zhou, Jia Yao, Liang Zheng","doi":"10.1117/1.jei.33.5.053004","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053004","url":null,"abstract":"X-ray imaging is essential for security inspection; nevertheless, the penetrability of X-rays can cause objects within a package to overlap in X-ray images, leading to reduced accuracy in manual inspection and increased difficulty in auxiliary inspection techniques. Existing methods mainly focus on object detection to enhance the detection ability of models for overlapping regions by augmenting image features, including color, texture, and semantic information. However, these approaches do not address the underlying issue of overlap. We propose a novel method for separating overlapping objects in X-ray images from the perspective of image inpainting. Specifically, the separation method involves using a vision transformer (ViT) to construct a generative adversarial network (GAN) model that requires a hand-created trimap as input. In addition, we present an end-to-end approach that integrates Mask Region-based Convolutional Neural Network with the separation network to achieve fully automated separation of overlapping objects. Given the lack of datasets appropriate for training separation networks, we created MaskXray, a collection of X-ray images that includes overlapping images, trimap, and individual object images. Our proposed generative separation network was tested in experiments and demonstrated its ability to accurately separate overlapping objects in X-ray images. These results demonstrate the efficacy of our approach and make significant contributions to the field of X-ray image analysis.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USDAP: universal source-free domain adaptation based on prompt learning USDAP:基于即时学习的通用无源域适应
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053015
Xun Shao, Mingwen Shao, Sijie Chen, Yuanyuan Liu
Universal source-free domain adaptation (USFDA) aims to explore transferring domain-consistent knowledge in the presence of domain shift and category shift, without access to a source domain. Existing works mainly rely on prior domain-invariant knowledge provided by the source model, ignoring the significant discrepancy between the source and target domains. However, directly utilizing the source model will generate noisy pseudo-labels on the target domain, resulting in erroneous decision boundaries. To alleviate the aforementioned issue, we propose a two-stage USFDA approach based on prompt learning, named USDAP. Primarily, to reduce domain differences, during the prompt learning stage, we introduce a learnable prompt designed to align the target domain distribution with the source. Furthermore, for more discriminative decision boundaries, in the feature alignment stage, we propose an adaptive global-local clustering strategy. This strategy utilizes one-versus-all clustering globally to separate different categories and neighbor-to-neighbor clustering locally to prevent incorrect pseudo-label assignments at cluster boundaries. Based on the above two-stage method, target data are adapted to the classification network under the prompt’s guidance, forming more compact category clusters, thus achieving excellent migration performance for the model. We conduct experiments on various datasets with diverse category shift scenarios to illustrate the superiority of our USDAP.
通用无源域适应(USFDA)旨在探索在不访问源域的情况下,在存在域转移和类别转移的情况下转移与域一致的知识。现有研究主要依赖源模型提供的先验领域不变知识,忽略了源领域和目标领域之间的显著差异。然而,直接利用源模型会在目标域上生成噪声伪标签,从而导致错误的决策边界。为了缓解上述问题,我们提出了一种基于即时学习的两阶段 USFDA 方法,命名为 USDAP。首先,为了减少域差异,在提示学习阶段,我们引入了可学习的提示,旨在使目标域分布与源分布保持一致。此外,为了提高决策边界的区分度,在特征对齐阶段,我们提出了一种自适应全局-局部聚类策略。该策略在全局范围内利用 "一对全 "聚类来区分不同类别,在局部范围内利用 "邻居对邻居 "聚类来防止在聚类边界出现错误的伪标签分配。基于上述两阶段方法,目标数据在提示的引导下适应分类网络,形成更紧凑的类别聚类,从而为模型实现出色的迁移性能。我们在各种数据集上进行了不同类别迁移场景的实验,以说明我们的 USDAP 的优越性。
{"title":"USDAP: universal source-free domain adaptation based on prompt learning","authors":"Xun Shao, Mingwen Shao, Sijie Chen, Yuanyuan Liu","doi":"10.1117/1.jei.33.5.053015","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053015","url":null,"abstract":"Universal source-free domain adaptation (USFDA) aims to explore transferring domain-consistent knowledge in the presence of domain shift and category shift, without access to a source domain. Existing works mainly rely on prior domain-invariant knowledge provided by the source model, ignoring the significant discrepancy between the source and target domains. However, directly utilizing the source model will generate noisy pseudo-labels on the target domain, resulting in erroneous decision boundaries. To alleviate the aforementioned issue, we propose a two-stage USFDA approach based on prompt learning, named USDAP. Primarily, to reduce domain differences, during the prompt learning stage, we introduce a learnable prompt designed to align the target domain distribution with the source. Furthermore, for more discriminative decision boundaries, in the feature alignment stage, we propose an adaptive global-local clustering strategy. This strategy utilizes one-versus-all clustering globally to separate different categories and neighbor-to-neighbor clustering locally to prevent incorrect pseudo-label assignments at cluster boundaries. Based on the above two-stage method, target data are adapted to the classification network under the prompt’s guidance, forming more compact category clusters, thus achieving excellent migration performance for the model. We conduct experiments on various datasets with diverse category shift scenarios to illustrate the superiority of our USDAP.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Double-level deep multi-view collaborative learning for image clustering 图像聚类的双层深度多视角协作学习
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053012
Liang Xiao, Wenzhe Liu
Multi-view clustering has garnered significant attention due to its ability to explore shared information from multiple views. Applications of multi-view clustering include image and video analysis, bioinformatics, and social network analysis, in which integrating diverse data sources enhances data understanding and insights. However, existing multi-view models suffer from the following limitations: (1) directly extracting latent representations from raw data using encoders is susceptible to interference from noise and other factors and (2) complementary information among different views is often overlooked, resulting in the loss of crucial unique information from each view. Therefore, we propose a distinctive double-level deep multi-view collaborative learning approach. Our method further processes the latent representations learned by the encoder through multiple layers of perceptrons to obtain richer semantic information. In addition, we introduce dual-path guidance at both the feature and label levels to facilitate the learning of complementary information across different views. Furthermore, we introduce pre-clustering methods to guide mutual learning among different views through pseudo-labels. Experimental results on four image datasets (Caltech-5V, STL10, Cifar10, Cifar100) demonstrate that our method achieves state-of-the-art clustering performance, evaluated using standard metrics, including accuracy, normalized mutual information, and purity. We compare our proposed method with existing clustering algorithms to validate its effectiveness.
多视图聚类能够从多个视图中探索共享信息,因此备受关注。多视图聚类的应用包括图像和视频分析、生物信息学和社交网络分析,在这些应用中,整合不同的数据源可以增强对数据的理解和洞察力。然而,现有的多视图模型存在以下局限性:(1) 使用编码器直接从原始数据中提取潜在表征容易受到噪声和其他因素的干扰;(2) 不同视图之间的互补信息往往被忽视,导致每个视图中关键的独特信息丢失。因此,我们提出了一种独特的双层深度多视图协作学习方法。我们的方法通过多层感知器进一步处理编码器学习到的潜在表征,从而获得更丰富的语义信息。此外,我们还在特征层和标签层引入了双路径引导,以促进不同视图间互补信息的学习。此外,我们还引入了预聚类方法,通过伪标签引导不同视图之间的相互学习。在四个图像数据集(Caltech-5V、STL10、Cifar10、Cifar100)上的实验结果表明,我们的方法达到了最先进的聚类性能,并使用标准指标进行了评估,包括准确率、归一化互信息和纯度。我们将所提出的方法与现有的聚类算法进行了比较,以验证其有效性。
{"title":"Double-level deep multi-view collaborative learning for image clustering","authors":"Liang Xiao, Wenzhe Liu","doi":"10.1117/1.jei.33.5.053012","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053012","url":null,"abstract":"Multi-view clustering has garnered significant attention due to its ability to explore shared information from multiple views. Applications of multi-view clustering include image and video analysis, bioinformatics, and social network analysis, in which integrating diverse data sources enhances data understanding and insights. However, existing multi-view models suffer from the following limitations: (1) directly extracting latent representations from raw data using encoders is susceptible to interference from noise and other factors and (2) complementary information among different views is often overlooked, resulting in the loss of crucial unique information from each view. Therefore, we propose a distinctive double-level deep multi-view collaborative learning approach. Our method further processes the latent representations learned by the encoder through multiple layers of perceptrons to obtain richer semantic information. In addition, we introduce dual-path guidance at both the feature and label levels to facilitate the learning of complementary information across different views. Furthermore, we introduce pre-clustering methods to guide mutual learning among different views through pseudo-labels. Experimental results on four image datasets (Caltech-5V, STL10, Cifar10, Cifar100) demonstrate that our method achieves state-of-the-art clustering performance, evaluated using standard metrics, including accuracy, normalized mutual information, and purity. We compare our proposed method with existing clustering algorithms to validate its effectiveness.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vis-YOLO: a lightweight and efficient image detector for unmanned aerial vehicle small objects Vis-YOLO:轻便高效的无人飞行器小型物体图像探测器
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053003
Xiangyu Deng, Jiangyong Du
Yolo series models are extensive within the domain of object detection. Aiming at the challenge of small object detection, we analyze the limitations of existing detection models and propose a Vis-YOLO object detection algorithm based on YOLOv8s. First, the down-sampling times are reduced to retain more features, and the detection head is replaced to adapt to the small object. Then, deformable convolutional networks are used to improve the C2f module, improving its feature extraction ability. Finally, the separation and enhancement attention module is introduced to the model to give more weight to the useful information. Experiments show that the improved Vis-YOLO model outperforms the YOLOv8s model on the visdrone-2019 dataset. The precision improved by 5.4%, the recall by 6.3%, and the mAP50 by 6.8%. Moreover, Vis-YOLO models are smaller and suitable for mobile deployment. This research provides a new method and idea for small object detection, which has excellent potential application value.
Yolo 系列模型在物体检测领域应用广泛。针对小物体检测的挑战,我们分析了现有检测模型的局限性,提出了基于 YOLOv8s 的 Vis-YOLO 物体检测算法。首先,减少下采样时间以保留更多特征,并更换检测头以适应小物体。然后,使用可变形卷积网络改进 C2f 模块,提高其特征提取能力。最后,在模型中引入分离和增强注意模块,以提高有用信息的权重。实验表明,在 visdrone-2019 数据集上,改进后的 Vis-YOLO 模型优于 YOLOv8s 模型。精确度提高了 5.4%,召回率提高了 6.3%,mAP50 提高了 6.8%。此外,Vis-YOLO 模型体积更小,适合移动部署。这项研究为小物体检测提供了一种新的方法和思路,具有极高的潜在应用价值。
{"title":"Vis-YOLO: a lightweight and efficient image detector for unmanned aerial vehicle small objects","authors":"Xiangyu Deng, Jiangyong Du","doi":"10.1117/1.jei.33.5.053003","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053003","url":null,"abstract":"Yolo series models are extensive within the domain of object detection. Aiming at the challenge of small object detection, we analyze the limitations of existing detection models and propose a Vis-YOLO object detection algorithm based on YOLOv8s. First, the down-sampling times are reduced to retain more features, and the detection head is replaced to adapt to the small object. Then, deformable convolutional networks are used to improve the C2f module, improving its feature extraction ability. Finally, the separation and enhancement attention module is introduced to the model to give more weight to the useful information. Experiments show that the improved Vis-YOLO model outperforms the YOLOv8s model on the visdrone-2019 dataset. The precision improved by 5.4%, the recall by 6.3%, and the mAP50 by 6.8%. Moreover, Vis-YOLO models are smaller and suitable for mobile deployment. This research provides a new method and idea for small object detection, which has excellent potential application value.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Appearance flow based structure prior guided image inpainting 基于外观流的结构先导图像着色
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-01 DOI: 10.1117/1.jei.33.5.053011
Weirong Liu, Zhijun Li, Changhong Shi, Xiongfei Jia, Jie Liu
Image inpainting techniques based on deep learning have shown significant improvements by introducing structure priors, but still generate structure distortion or textures fuzzy for large missing areas. This is mainly because series networks have inherent disadvantages: employing unreasonable structural priors will inevitably lead to severe mistakes in the second stage of cascade inpainting framework. To address this issue, an appearance flow-based structure prior (AFSP) guided image inpainting is proposed. In the first stage, a structure generator regards edge-preserved smooth images as global structures of images and then appearance flow warps small-scale features in input and flows to corrupted regions. In the second stage, a texture generator using contextual attention is designed to yield image high-frequency details after obtaining reasonable structure priors. Compared with state-of-the-art approaches, the proposed AFSP achieved visually more realistic results. Compared on the Places2 dataset, the most challenging with 1.8 million high-resolution images of 365 complex scenes, shows that AFSP was 1.1731 dB higher than the average peak signal-to-noise ratio for EdgeConnect.
基于深度学习的图像内绘技术通过引入结构先验有了显著改善,但仍会产生结构失真或大面积缺失区域的纹理模糊。这主要是因为系列网络具有固有的缺点:采用不合理的结构先验势必会导致级联内绘框架第二阶段出现严重错误。为了解决这个问题,我们提出了一种基于外观流的结构先验(AFSP)引导的图像内绘方法。在第一阶段,结构生成器将边缘保留的平滑图像视为图像的全局结构,然后外观流对输入中的小尺度特征进行扭曲,并流向损坏区域。在第二阶段,利用上下文注意设计纹理生成器,在获得合理的结构先验后生成图像的高频细节。与最先进的方法相比,所提出的 AFSP 取得了视觉上更逼真的结果。在最具挑战性的 Places2 数据集(包含 365 个复杂场景的 180 万张高分辨率图像)上的对比显示,AFSP 比 EdgeConnect 的平均峰值信噪比高出 1.1731 dB。
{"title":"Appearance flow based structure prior guided image inpainting","authors":"Weirong Liu, Zhijun Li, Changhong Shi, Xiongfei Jia, Jie Liu","doi":"10.1117/1.jei.33.5.053011","DOIUrl":"https://doi.org/10.1117/1.jei.33.5.053011","url":null,"abstract":"Image inpainting techniques based on deep learning have shown significant improvements by introducing structure priors, but still generate structure distortion or textures fuzzy for large missing areas. This is mainly because series networks have inherent disadvantages: employing unreasonable structural priors will inevitably lead to severe mistakes in the second stage of cascade inpainting framework. To address this issue, an appearance flow-based structure prior (AFSP) guided image inpainting is proposed. In the first stage, a structure generator regards edge-preserved smooth images as global structures of images and then appearance flow warps small-scale features in input and flows to corrupted regions. In the second stage, a texture generator using contextual attention is designed to yield image high-frequency details after obtaining reasonable structure priors. Compared with state-of-the-art approaches, the proposed AFSP achieved visually more realistic results. Compared on the Places2 dataset, the most challenging with 1.8 million high-resolution images of 365 complex scenes, shows that AFSP was 1.1731 dB higher than the average peak signal-to-noise ratio for EdgeConnect.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Electronic Imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1