首页 > 最新文献

Journal of Electronic Imaging最新文献

英文 中文
Improved self-supervised learning for disease identification in chest X-ray images 改进自我监督学习,识别胸部 X 光图像中的疾病
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043006
Yongjun Ma, Shi Dong, Yuchao Jiang
The utilization of chest X-ray (CXR) image data analysis for assisting in disease diagnosis is an important application of artificial intelligence. Supervised learning faces challenges due to a lack of large-scale labeled datasets and inaccuracies. Self-supervised learning offers a potential solution, but current research in this area is limited, and the diagnostic accuracy remains unsatisfactory. We propose an approach that integrates the self-supervised Bidirectional Encoder Representations from Image Transformers version 2 (BEiTv2) method with the vector quantization-based knowledge distillation (VQ-KD) strategy into CXR image data to enhance disease diagnosis accuracy. Our methodology demonstrates superior performance compared with existing self-supervised methods, showcasing its efficacy in improving diagnostic outcomes. Through transfer and ablation studies, we elucidate the benefits of the VQ-KD strategy in enhancing model performance and transferability to downstream tasks.
利用胸部 X 光(CXR)图像数据分析辅助疾病诊断是人工智能的一项重要应用。由于缺乏大规模标注数据集和不准确性,监督学习面临着挑战。自我监督学习提供了一个潜在的解决方案,但目前在这一领域的研究还很有限,诊断准确率仍不尽如人意。我们提出了一种方法,将自我监督的图像变换器双向编码器表征第二版(BEiTv2)方法与基于向量量化的知识提炼(VQ-KD)策略整合到 CXR 图像数据中,以提高疾病诊断的准确性。与现有的自监督方法相比,我们的方法性能更优越,展示了其在改善诊断结果方面的功效。通过转移和消融研究,我们阐明了 VQ-KD 策略在提高模型性能和向下游任务转移方面的优势。
{"title":"Improved self-supervised learning for disease identification in chest X-ray images","authors":"Yongjun Ma, Shi Dong, Yuchao Jiang","doi":"10.1117/1.jei.33.4.043006","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043006","url":null,"abstract":"The utilization of chest X-ray (CXR) image data analysis for assisting in disease diagnosis is an important application of artificial intelligence. Supervised learning faces challenges due to a lack of large-scale labeled datasets and inaccuracies. Self-supervised learning offers a potential solution, but current research in this area is limited, and the diagnostic accuracy remains unsatisfactory. We propose an approach that integrates the self-supervised Bidirectional Encoder Representations from Image Transformers version 2 (BEiTv2) method with the vector quantization-based knowledge distillation (VQ-KD) strategy into CXR image data to enhance disease diagnosis accuracy. Our methodology demonstrates superior performance compared with existing self-supervised methods, showcasing its efficacy in improving diagnostic outcomes. Through transfer and ablation studies, we elucidate the benefits of the VQ-KD strategy in enhancing model performance and transferability to downstream tasks.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-dimensional human pose estimation based on contact pressure 基于接触压力的三维人体姿态估计
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043022
Ning Yin, Ke Wang, Nian Wang, Jun Tang, Wenxia Bao
Various daily behaviors usually exert pressure on the contact surface, such as lying, walking, and sitting. Obviously, the pressure data from the contact surface contain some important biological information for an individual. Recently, a computer vision task, i.e., pose estimation from contact pressure (PECP), has received more and more attention from researchers. Although several deep learning-based methods have been put forward in this field, they cannot achieve accurate prediction using the limited pressure information. To address this issue, we present a multi-task-based PECP model. Specifically, the autoencoder is introduced into our model for reconstructing input pressure data (i.e., the additional task), which can help our model generate high-quality features for the pressure data. Moreover, both the mean squared error and the spectral angle distance are adopted to construct the final loss function, whose aim is to eliminate the Euclidean distance and angle differences between the prediction and ground truth. Extensive experiments on the public dataset show that our method outperforms existing methods significantly in pose prediction from contact pressure.
各种日常行为通常会对接触面产生压力,如躺卧、行走和坐姿。显然,来自接触面的压力数据包含了个人的一些重要生物信息。最近,一项计算机视觉任务,即从接触压力进行姿势估计(PECP),受到了越来越多研究人员的关注。虽然这一领域已经提出了几种基于深度学习的方法,但它们无法利用有限的压力信息实现准确的预测。为了解决这个问题,我们提出了一种基于多任务的 PECP 模型。具体来说,我们在模型中引入了自动编码器来重构输入压力数据(即附加任务),这有助于我们的模型为压力数据生成高质量的特征。此外,我们还采用了均方误差和频谱角度距离来构建最终损失函数,其目的是消除预测结果与地面实况之间的欧氏距离和角度差异。在公共数据集上进行的大量实验表明,我们的方法在根据接触压力进行姿态预测方面明显优于现有方法。
{"title":"Three-dimensional human pose estimation based on contact pressure","authors":"Ning Yin, Ke Wang, Nian Wang, Jun Tang, Wenxia Bao","doi":"10.1117/1.jei.33.4.043022","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043022","url":null,"abstract":"Various daily behaviors usually exert pressure on the contact surface, such as lying, walking, and sitting. Obviously, the pressure data from the contact surface contain some important biological information for an individual. Recently, a computer vision task, i.e., pose estimation from contact pressure (PECP), has received more and more attention from researchers. Although several deep learning-based methods have been put forward in this field, they cannot achieve accurate prediction using the limited pressure information. To address this issue, we present a multi-task-based PECP model. Specifically, the autoencoder is introduced into our model for reconstructing input pressure data (i.e., the additional task), which can help our model generate high-quality features for the pressure data. Moreover, both the mean squared error and the spectral angle distance are adopted to construct the final loss function, whose aim is to eliminate the Euclidean distance and angle differences between the prediction and ground truth. Extensive experiments on the public dataset show that our method outperforms existing methods significantly in pose prediction from contact pressure.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pose-guided node and trajectory construction transformer for occluded person re-identification 用于模糊人物再识别的姿态引导节点和轨迹构建转换器
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043021
Chentao Hu, Yanbing Chen, Lingyi Guo, Lingbing Tao, Zhixin Tie, Wei Ke
Occluded person re-identification (re-id) is a task in pedestrian retrieval where occluded person images are matched with holistic person images. Most methods leverage semantic cues from external models to align the availability of visible parts in the feature space. However, presenting visible parts while discarding occluded parts can lead to the loss of semantics in the occluded regions, and in severely crowded regions of occlusion, it will introduce inaccurate features that pollute the overall person features. Thus, constructing person features for occluded regions based on the features of its holistic parts has the potential to address the above issues. In this work, we propose a pose-guided node and trajectory construction transformer (PNTCT). The part feature extraction module extracts parts feature of the person and incorporates pose information to activate key visible local features. However, this is not sufficient to completely separate occluded regions. To further distinguish visible and occluded parts, the skeleton graph module adopts a graph topology to represent local features as graph nodes, enhancing the network’s sensitivity to local features by constructing a skeleton feature graph, which is further utilized to weaken the occlusion noise. The node and trajectory construction module (NTC) mines the relationships between skeleton nodes and aggregates the information of the person’s skeleton to construct a novel skeleton graph. The features of the occluded regions can be reconstructed via the features of the corresponding nodes in the novel skeleton graph. Extensive experiments and analyses confirm the effectiveness and superiority of our PNTCT method.
隐蔽人物再识别(re-id)是行人检索中的一项任务,需要将隐蔽人物图像与整体人物图像进行匹配。大多数方法利用外部模型的语义线索来调整特征空间中可见部分的可用性。然而,在呈现可见部分的同时丢弃闭塞部分会导致闭塞区域的语义损失,而且在严重拥挤的闭塞区域,会引入不准确的特征,污染整体人物特征。因此,根据整体部分的特征来构建闭塞区域的人物特征有可能解决上述问题。在这项工作中,我们提出了一种姿态引导的节点和轨迹构造转换器(PNTCT)。部分特征提取模块提取人物的部分特征,并结合姿势信息激活关键的可见局部特征。然而,这还不足以完全分离闭塞区域。为了进一步区分可见和闭塞部分,骨架图模块采用图拓扑结构,将局部特征表示为图节点,通过构建骨架特征图增强网络对局部特征的敏感性,并进一步利用骨架特征图削弱闭塞噪声。节点和轨迹构建模块(NTC)挖掘骨架节点之间的关系,并汇总人物骨架的信息,从而构建新的骨架图。闭塞区域的特征可以通过新骨架图中相应节点的特征进行重建。大量的实验和分析证实了我们的 PNTCT 方法的有效性和优越性。
{"title":"Pose-guided node and trajectory construction transformer for occluded person re-identification","authors":"Chentao Hu, Yanbing Chen, Lingyi Guo, Lingbing Tao, Zhixin Tie, Wei Ke","doi":"10.1117/1.jei.33.4.043021","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043021","url":null,"abstract":"Occluded person re-identification (re-id) is a task in pedestrian retrieval where occluded person images are matched with holistic person images. Most methods leverage semantic cues from external models to align the availability of visible parts in the feature space. However, presenting visible parts while discarding occluded parts can lead to the loss of semantics in the occluded regions, and in severely crowded regions of occlusion, it will introduce inaccurate features that pollute the overall person features. Thus, constructing person features for occluded regions based on the features of its holistic parts has the potential to address the above issues. In this work, we propose a pose-guided node and trajectory construction transformer (PNTCT). The part feature extraction module extracts parts feature of the person and incorporates pose information to activate key visible local features. However, this is not sufficient to completely separate occluded regions. To further distinguish visible and occluded parts, the skeleton graph module adopts a graph topology to represent local features as graph nodes, enhancing the network’s sensitivity to local features by constructing a skeleton feature graph, which is further utilized to weaken the occlusion noise. The node and trajectory construction module (NTC) mines the relationships between skeleton nodes and aggregates the information of the person’s skeleton to construct a novel skeleton graph. The features of the occluded regions can be reconstructed via the features of the corresponding nodes in the novel skeleton graph. Extensive experiments and analyses confirm the effectiveness and superiority of our PNTCT method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep degradation-aware up-sampling-based depth video coding 基于深度降级感知的上采样深度视频编码
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043009
Zhaoqing Pan, Yuqing Niu, Bo Peng, Ge Li, Sam Kwong, Jianjun Lei
The smooth regions in depth videos contain a significant proportion of homogeneous content, resulting in many spatial redundancies. To improve the coding efficiency of depth videos, this paper proposes a deep degradation-aware up-sampling-based depth video coding method. For reducing spatial redundancies effectively, the proposed method compresses the depth video at a low resolution, and restores the resolution by utilizing the learning-based up-sampling technology. To recover high-quality depth videos, a degradation-aware up-sampling network is proposed, which explores the degradation information of compression artifacts and sampling artifacts to restore the resolution. Specifically, the compression artifact removal module is used to obtain refined low-resolution depth frames by learning the representation of compression artifacts. Meanwhile, a jointly optimized learning strategy is designed to enhance the capability of recovering high-frequency details, which is beneficial for up-sampling. According to the experimental results, the proposed method achieves considerable performance in depth video coding compared with 3D-HEVC.
深度视频中的平滑区域包含很大比例的同质内容,导致许多空间冗余。为了提高深度视频的编码效率,本文提出了一种基于深度降级感知的上采样深度视频编码方法。为有效减少空间冗余,本文提出的方法对深度视频进行低分辨率压缩,并利用基于学习的上采样技术恢复分辨率。为了恢复高质量的深度视频,提出了一种降级感知的上采样网络,该网络利用压缩伪像和采样伪像的降级信息来恢复分辨率。具体来说,压缩伪影去除模块用于通过学习压缩伪影的表示来获得精致的低分辨率深度帧。同时,设计了一种联合优化的学习策略,以增强恢复高频细节的能力,从而有利于向上采样。实验结果表明,与 3D-HEVC 相比,所提出的方法在深度视频编码方面取得了可观的性能。
{"title":"Deep degradation-aware up-sampling-based depth video coding","authors":"Zhaoqing Pan, Yuqing Niu, Bo Peng, Ge Li, Sam Kwong, Jianjun Lei","doi":"10.1117/1.jei.33.4.043009","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043009","url":null,"abstract":"The smooth regions in depth videos contain a significant proportion of homogeneous content, resulting in many spatial redundancies. To improve the coding efficiency of depth videos, this paper proposes a deep degradation-aware up-sampling-based depth video coding method. For reducing spatial redundancies effectively, the proposed method compresses the depth video at a low resolution, and restores the resolution by utilizing the learning-based up-sampling technology. To recover high-quality depth videos, a degradation-aware up-sampling network is proposed, which explores the degradation information of compression artifacts and sampling artifacts to restore the resolution. Specifically, the compression artifact removal module is used to obtain refined low-resolution depth frames by learning the representation of compression artifacts. Meanwhile, a jointly optimized learning strategy is designed to enhance the capability of recovering high-frequency details, which is beneficial for up-sampling. According to the experimental results, the proposed method achieves considerable performance in depth video coding compared with 3D-HEVC.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on ground-based cloud image classification combining local and global features 结合局部和全局特征的地面云图像分类研究
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043030
Xin Zhang, Wanting Zheng, Jianwei Zhang, Weibin Chen, Liangliang Chen
Clouds are an important factor in predicting future weather changes. Cloud image classification is one of the basic issues in the field of ground-based cloud meteorological observation. Deep CNN mainly focuses on the local receptive field, and the processing of global information may be relatively weak. In ground-based cloud image classification, if there is a complex background, it will help to better model the long-range dependence of the image if the relationship between different locations in the image can be globally captured. A ground-based cloud image classification method is proposed based on the fusion of local features and global features (LG_CloudNet). The ground-based cloud image classification method integrates the global feature extraction module (GF_M) and the local feature extraction module (LF_M), using the attention mechanism to weight and merge features, respectively. The LG_CloudNet model enables richer and comprehensive feature representation at lower computational complexity. In order to ensure the learning and generalization capabilities of the model during training, AdamW (Adam weight decay) is combined with learning rate warm-up and stochastic gradient descent with warm restarts methods to adjust the learning rate. The experimental results demonstrate that the proposed method achieves favorable ground-based cloud image classification outcomes and exhibits robust performance in classifying cloud images. In the datasets of GCD, CCSN, and ZNCL, the classification accuracy is 94.94%, 95.77%, and 98.87%, respectively.
云是预测未来天气变化的重要因素。云图像分类是地面云气象观测领域的基本问题之一。深度 CNN 主要关注局部感受野,对全局信息的处理可能相对较弱。在地面云图像分类中,如果存在复杂背景,如果能全局捕捉图像中不同位置之间的关系,将有助于更好地模拟图像的远距离依赖关系。本文提出了一种基于局部特征和全局特征融合的地面云图像分类方法(LG_CloudNet)。该基于地面的云图像分类方法集成了全局特征提取模块(GF_M)和局部特征提取模块(LF_M),利用注意力机制分别对特征进行加权和合并。LG_CloudNet 模型能够以较低的计算复杂度实现更丰富、更全面的特征表示。为了保证模型在训练过程中的学习能力和泛化能力,AdamW(亚当权重衰减)与学习速率预热和随机梯度下降与预热重启方法相结合来调整学习速率。实验结果表明,所提出的方法取得了良好的地面云图像分类结果,并在云图像分类中表现出稳健的性能。在 GCD、CCSN 和 ZNCL 数据集中,分类准确率分别为 94.94%、95.77% 和 98.87%。
{"title":"Research on ground-based cloud image classification combining local and global features","authors":"Xin Zhang, Wanting Zheng, Jianwei Zhang, Weibin Chen, Liangliang Chen","doi":"10.1117/1.jei.33.4.043030","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043030","url":null,"abstract":"Clouds are an important factor in predicting future weather changes. Cloud image classification is one of the basic issues in the field of ground-based cloud meteorological observation. Deep CNN mainly focuses on the local receptive field, and the processing of global information may be relatively weak. In ground-based cloud image classification, if there is a complex background, it will help to better model the long-range dependence of the image if the relationship between different locations in the image can be globally captured. A ground-based cloud image classification method is proposed based on the fusion of local features and global features (LG_CloudNet). The ground-based cloud image classification method integrates the global feature extraction module (GF_M) and the local feature extraction module (LF_M), using the attention mechanism to weight and merge features, respectively. The LG_CloudNet model enables richer and comprehensive feature representation at lower computational complexity. In order to ensure the learning and generalization capabilities of the model during training, AdamW (Adam weight decay) is combined with learning rate warm-up and stochastic gradient descent with warm restarts methods to adjust the learning rate. The experimental results demonstrate that the proposed method achieves favorable ground-based cloud image classification outcomes and exhibits robust performance in classifying cloud images. In the datasets of GCD, CCSN, and ZNCL, the classification accuracy is 94.94%, 95.77%, and 98.87%, respectively.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight human activity recognition system for resource constrained environments 用于资源有限环境的轻量级人类活动识别系统
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043025
Mihir Karandikar, Ankit Jain, Abhishek Srivastava
As the elderly population in need of assisted living arrangements continues to grow, the imperative to ensure their safety is paramount. Though effective, traditional surveillance methods, notably RGB cameras, raise significant privacy concerns. This paper highlights the advantages of a surveillance system addressing these issues by utilizing skeleton joint sequences extracted from depth data. The focus on non-intrusive parameters aims to mitigate ethical and privacy concerns. Moreover, the proposed work prioritizes resource efficiency, acknowledging the often limited computing resources in assisted living environments. We strive for a method that can run efficiently even in the most resource-constrained environments. Performance evaluation and a prototypical implementation of our method on a resource-constraint device confirm the efficacy and suitability of the proposed method in real-world applications.
随着需要辅助生活安排的老年人口不断增加,确保他们的安全已成为当务之急。传统的监控方法(尤其是 RGB 摄像机)虽然有效,但却会引发严重的隐私问题。本文强调了利用从深度数据中提取的骨架关节序列来解决这些问题的监控系统的优势。对非侵入性参数的关注旨在减轻道德和隐私方面的担忧。此外,考虑到辅助生活环境中的计算资源往往有限,我们提出的工作将资源效率放在首位。我们致力于开发一种即使在资源最紧张的环境中也能高效运行的方法。性能评估和我们的方法在资源受限设备上的原型实施证实了所提方法在实际应用中的有效性和适用性。
{"title":"Lightweight human activity recognition system for resource constrained environments","authors":"Mihir Karandikar, Ankit Jain, Abhishek Srivastava","doi":"10.1117/1.jei.33.4.043025","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043025","url":null,"abstract":"As the elderly population in need of assisted living arrangements continues to grow, the imperative to ensure their safety is paramount. Though effective, traditional surveillance methods, notably RGB cameras, raise significant privacy concerns. This paper highlights the advantages of a surveillance system addressing these issues by utilizing skeleton joint sequences extracted from depth data. The focus on non-intrusive parameters aims to mitigate ethical and privacy concerns. Moreover, the proposed work prioritizes resource efficiency, acknowledging the often limited computing resources in assisted living environments. We strive for a method that can run efficiently even in the most resource-constrained environments. Performance evaluation and a prototypical implementation of our method on a resource-constraint device confirm the efficacy and suitability of the proposed method in real-world applications.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141754003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SDANet: scale-deformation awareness network for crowd counting SDANet:用于人群计数的规模变形感知网络
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043002
Jianyong Wang, Xiangyu Guo, Qilei Li, Ahmed M. Abdelmoniem, Mingliang Gao
Crowd counting aims to derive information about crowd density by quantifying the number of individuals in an image or video. It offers crucial insights applicable to various domains, e.g., secure, efficient decision-making, and management. However, scale variation and irregular shapes of heads pose intricate challenges. To address these challenges, we propose a scale-deformation awareness network (SDANet). Specifically, a scale awareness module is introduced to address the scale variation. It can capture long-distance dependencies and preserve precise spatial information by readjusting weights in height and width directions. Concurrently, a deformation awareness module is introduced to solve the challenge of head deformation. It adjusts the sampling position of the convolution kernel through deformable convolution and learning offset. Experimental results on four crowd-counting datasets prove the superiority of SDANet in accuracy, efficiency, and robustness.
人群计数旨在通过量化图像或视频中的个体数量来获取有关人群密度的信息。它提供了适用于各种领域(如安全、高效决策和管理)的重要见解。然而,尺度变化和头部的不规则形状带来了复杂的挑战。为了应对这些挑战,我们提出了尺度变形感知网络(SDANet)。具体来说,我们引入了一个规模感知模块来解决规模变化问题。它可以捕捉长距离依赖关系,并通过重新调整高度和宽度方向的权重来保留精确的空间信息。同时,为解决头部变形的难题,还引入了变形感知模块。它通过可变形卷积和学习偏移来调整卷积核的采样位置。在四个人群计数数据集上的实验结果证明了 SDANet 在准确性、效率和鲁棒性方面的优越性。
{"title":"SDANet: scale-deformation awareness network for crowd counting","authors":"Jianyong Wang, Xiangyu Guo, Qilei Li, Ahmed M. Abdelmoniem, Mingliang Gao","doi":"10.1117/1.jei.33.4.043002","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043002","url":null,"abstract":"Crowd counting aims to derive information about crowd density by quantifying the number of individuals in an image or video. It offers crucial insights applicable to various domains, e.g., secure, efficient decision-making, and management. However, scale variation and irregular shapes of heads pose intricate challenges. To address these challenges, we propose a scale-deformation awareness network (SDANet). Specifically, a scale awareness module is introduced to address the scale variation. It can capture long-distance dependencies and preserve precise spatial information by readjusting weights in height and width directions. Concurrently, a deformation awareness module is introduced to solve the challenge of head deformation. It adjusts the sampling position of the convolution kernel through deformable convolution and learning offset. Experimental results on four crowd-counting datasets prove the superiority of SDANet in accuracy, efficiency, and robustness.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater object detection by integrating YOLOv8 and efficient transformer 通过集成 YOLOv8 和高效变压器进行水下物体探测
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043011
Jing Liu, Kaiqiong Sun, Xiao Ye, Yaokun Yun
In recent years, underwater target detection algorithms based on deep learning have greatly promoted the development of the field of marine science and underwater robotics. However, due to the complexity of the underwater environment, there are problems, such as target occlusion, overlap, background confusion, and small object, that lead to detection difficulties. To address this issue, this paper proposes an improved underwater target detection method based on YOLOv8s. First, a lightweight backbone network with efficient transformers is used to replace the original backbone network, which enhances the contextual feature extraction capability. Second, an improved bidirectional feature pyramid network is used in the later multi-scale fusion part by increasing the input of bottom-level information while reducing the model size and number of parameters. Finally, a dynamic head with an attention mechanism is introduced into the detection head to enhance the classification and localization of small and fuzzy targets. Experimental results show that the proposed method improves the mAP0.5:0.95 of 65.7%, 63.7%, and 51.2% with YOLOv8s to that of 69.2%, 66.8%, and 54.8%, on three public underwater datasets, DUO, RUOD, and URPC2020, respectively. Additionally, compared with the YOLOv8s model, the model size decreased from 21.46 to 15.56 MB, and the number of parameters decreased from 11.1 to 7.9 M.
近年来,基于深度学习的水下目标检测算法极大地推动了海洋科学和水下机器人领域的发展。然而,由于水下环境的复杂性,存在目标遮挡、重叠、背景混淆、小目标等问题,导致检测困难。针对这一问题,本文提出了一种基于 YOLOv8s 的改进型水下目标检测方法。首先,使用具有高效变换器的轻量级骨干网络来替代原有的骨干网络,从而增强了上下文特征提取能力。其次,在后面的多尺度融合部分使用了改进的双向特征金字塔网络,在减少模型大小和参数数量的同时增加了底层信息的输入。最后,在探测头中引入了带有注意机制的动态头,以增强对小型和模糊目标的分类和定位能力。实验结果表明,在三个公开水下数据集 DUO、RUOD 和 URPC2020 上,提出的方法将 YOLOv8s 的 mAP0.5:0.95(65.7%、63.7% 和 51.2%)分别提高到 69.2%、66.8% 和 54.8%。此外,与 YOLOv8s 模型相比,模型大小从 21.46 MB 减少到 15.56 MB,参数数量从 11.1 M 减少到 7.9 M。
{"title":"Underwater object detection by integrating YOLOv8 and efficient transformer","authors":"Jing Liu, Kaiqiong Sun, Xiao Ye, Yaokun Yun","doi":"10.1117/1.jei.33.4.043011","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043011","url":null,"abstract":"In recent years, underwater target detection algorithms based on deep learning have greatly promoted the development of the field of marine science and underwater robotics. However, due to the complexity of the underwater environment, there are problems, such as target occlusion, overlap, background confusion, and small object, that lead to detection difficulties. To address this issue, this paper proposes an improved underwater target detection method based on YOLOv8s. First, a lightweight backbone network with efficient transformers is used to replace the original backbone network, which enhances the contextual feature extraction capability. Second, an improved bidirectional feature pyramid network is used in the later multi-scale fusion part by increasing the input of bottom-level information while reducing the model size and number of parameters. Finally, a dynamic head with an attention mechanism is introduced into the detection head to enhance the classification and localization of small and fuzzy targets. Experimental results show that the proposed method improves the mAP0.5:0.95 of 65.7%, 63.7%, and 51.2% with YOLOv8s to that of 69.2%, 66.8%, and 54.8%, on three public underwater datasets, DUO, RUOD, and URPC2020, respectively. Additionally, compared with the YOLOv8s model, the model size decreased from 21.46 to 15.56 MB, and the number of parameters decreased from 11.1 to 7.9 M.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radar spectrum-image fusion using dual 2D-3D convolutional neural network to transformer inspired multi-headed self-attention bi-long short-term memory network for vehicle recognition 利用双 2D-3D 卷积神经网络与变压器启发的多头自注意双长短期记忆网络进行雷达频谱-图像融合,以识别车辆
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043010
Ferris I. Arnous, Ram M. Narayanan
Radar imaging techniques, such as synthetic aperture radar, are widely explored in automatic vehicle recognition algorithms for remote sensing tasks. A large basis of literature covering several machine learning methodologies using visual information transformers, self-attention, convolutional neural networks (CNN), long short-term memory (LSTM), CNN-LSTM, CNN-attention-LSTM, and CNN Bi-LSTM models for detection of military vehicles have been attributed with high performance using a combination of these approaches. Tradeoffs between differing number of poses, single/multiple feature extraction streams, use of signals and/or images, as well as the specific mechanisms used to combine them, have widely been debated. We propose the adaptation of several models towards a unique biologically inspired architecture that utilizes both multi-pose and multi-contextual image and signal radar sensor information to make vehicle assessments over time. We implement a compact multi-pose 3D CNN single stream to process and fuse multi-temporal images while a dual sister 2D CNN stream processes the same information over a lower-dimensional power-spectral domain to mimic the way multi-sequence visual imagery is combined with auditory feedback for enhanced situational awareness. These data are then fused across data domains using transformer-modified encoding blocks to Bi-LSTM segments. Classification results on a fundamentally controlled simulated dataset yielded accuracies of up to 98% and 99% in line with literature. This enhanced performance was then evaluated for robustness not previously explored for three simultaneous parameterizations of incidence angle, object orientation, and lowered signal-to-noise ratio values and found to increase recognition on all three cases for low to moderate noised environments.
雷达成像技术(如合成孔径雷达)在遥感任务的车辆自动识别算法中得到了广泛应用。大量文献涉及多种机器学习方法,包括视觉信息转换器、自注意、卷积神经网络(CNN)、长短期记忆(LSTM)、CNN-LSTM、CNN-attention-LSTM 和 CNN Bi-LSTM 模型,这些方法的组合使用可实现高性能的军用车辆检测。不同姿势数量、单一/多重特征提取流、信号和/或图像的使用以及用于组合这些方法的具体机制之间的权衡问题引起了广泛讨论。我们建议对几种模型进行调整,以建立一个独特的生物灵感架构,利用多姿态和多上下文图像以及信号雷达传感器信息,随时间推移对车辆进行评估。我们采用了紧凑型多用途三维 CNN 单数据流来处理和融合多时态图像,而双姐妹二维 CNN 数据流则在低维功率频谱域上处理相同的信息,以模仿多序列视觉图像与听觉反馈相结合的方式来增强态势感知。然后,利用变压器修改编码块,将这些数据跨数据域融合到 Bi-LSTM 片段中。在一个基本受控的模拟数据集上的分类结果显示,准确率高达 98% 和 99%,与文献一致。然后,针对入射角、物体方位和降低信噪比值的三种同步参数化,对这一增强性能的鲁棒性进行了评估,结果发现,在中低噪声环境中,这三种情况下的识别率都有所提高。
{"title":"Radar spectrum-image fusion using dual 2D-3D convolutional neural network to transformer inspired multi-headed self-attention bi-long short-term memory network for vehicle recognition","authors":"Ferris I. Arnous, Ram M. Narayanan","doi":"10.1117/1.jei.33.4.043010","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043010","url":null,"abstract":"Radar imaging techniques, such as synthetic aperture radar, are widely explored in automatic vehicle recognition algorithms for remote sensing tasks. A large basis of literature covering several machine learning methodologies using visual information transformers, self-attention, convolutional neural networks (CNN), long short-term memory (LSTM), CNN-LSTM, CNN-attention-LSTM, and CNN Bi-LSTM models for detection of military vehicles have been attributed with high performance using a combination of these approaches. Tradeoffs between differing number of poses, single/multiple feature extraction streams, use of signals and/or images, as well as the specific mechanisms used to combine them, have widely been debated. We propose the adaptation of several models towards a unique biologically inspired architecture that utilizes both multi-pose and multi-contextual image and signal radar sensor information to make vehicle assessments over time. We implement a compact multi-pose 3D CNN single stream to process and fuse multi-temporal images while a dual sister 2D CNN stream processes the same information over a lower-dimensional power-spectral domain to mimic the way multi-sequence visual imagery is combined with auditory feedback for enhanced situational awareness. These data are then fused across data domains using transformer-modified encoding blocks to Bi-LSTM segments. Classification results on a fundamentally controlled simulated dataset yielded accuracies of up to 98% and 99% in line with literature. This enhanced performance was then evaluated for robustness not previously explored for three simultaneous parameterizations of incidence angle, object orientation, and lowered signal-to-noise ratio values and found to increase recognition on all three cases for low to moderate noised environments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141568571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-reference video quality assessment based on human visual perception 基于人类视觉感知的无参照视频质量评估
IF 1.1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1117/1.jei.33.4.043029
Zhou Zhou, Guangqian Kong, Xun Duan, Huiyun Long
Conducting video quality assessment (VQA) for user-generated content (UGC) videos and achieving consistency with subjective quality assessment are highly challenging tasks. We propose a no-reference video quality assessment (NR-VQA) method for UGC scenarios by considering characteristics of human visual perception. To distinguish between varying levels of human attention within different regions of a single frame, we devise a dual-branch network. This network extracts spatial features containing positional information of moving objects from frame-level images. In addition, we employ the temporal pyramid pooling module to effectively integrate temporal features of different scales, enabling the extraction of inter-frame temporal information. To mitigate the time-lag effect in the human visual system, we introduce the temporal pyramid attention module. This module evaluates the significance of individual video frames and simulates the varying attention levels exhibited by humans towards frames. We conducted experiments on the KoNViD-1k, LIVE-VQC, CVD2014, and YouTube-UGC databases. The experimental results demonstrate the superior performance of our proposed method compared to recent NR-VQA techniques in terms of both objective assessment and consistency with subjective assessment.
对用户生成内容(UGC)视频进行视频质量评估(VQA)并实现与主观质量评估的一致性是极具挑战性的任务。考虑到人类视觉感知的特点,我们提出了一种针对 UGC 场景的无参考视频质量评估(NR-VQA)方法。为了区分单帧不同区域内人类注意力的不同水平,我们设计了一个双分支网络。该网络从帧级图像中提取包含移动物体位置信息的空间特征。此外,我们还采用了时间金字塔池化模块来有效整合不同尺度的时间特征,从而提取帧间的时间信息。为了缓解人类视觉系统中的时滞效应,我们引入了时空金字塔注意力模块。该模块可评估单个视频帧的重要性,并模拟人类对帧所表现出的不同注意力水平。我们在 KoNViD-1k、LIVE-VQC、CVD2014 和 YouTube-UGC 数据库上进行了实验。实验结果表明,与最新的 NR-VQA 技术相比,我们提出的方法在客观评估和与主观评估的一致性方面都表现出色。
{"title":"No-reference video quality assessment based on human visual perception","authors":"Zhou Zhou, Guangqian Kong, Xun Duan, Huiyun Long","doi":"10.1117/1.jei.33.4.043029","DOIUrl":"https://doi.org/10.1117/1.jei.33.4.043029","url":null,"abstract":"Conducting video quality assessment (VQA) for user-generated content (UGC) videos and achieving consistency with subjective quality assessment are highly challenging tasks. We propose a no-reference video quality assessment (NR-VQA) method for UGC scenarios by considering characteristics of human visual perception. To distinguish between varying levels of human attention within different regions of a single frame, we devise a dual-branch network. This network extracts spatial features containing positional information of moving objects from frame-level images. In addition, we employ the temporal pyramid pooling module to effectively integrate temporal features of different scales, enabling the extraction of inter-frame temporal information. To mitigate the time-lag effect in the human visual system, we introduce the temporal pyramid attention module. This module evaluates the significance of individual video frames and simulates the varying attention levels exhibited by humans towards frames. We conducted experiments on the KoNViD-1k, LIVE-VQC, CVD2014, and YouTube-UGC databases. The experimental results demonstrate the superior performance of our proposed method compared to recent NR-VQA techniques in terms of both objective assessment and consistency with subjective assessment.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141771009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Electronic Imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1