首页 > 最新文献

Displays最新文献

英文 中文
ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor ICEAP:带有增强型属性预测器的高级细粒度图像字幕网络
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-24 DOI: 10.1016/j.displa.2024.102798
Md. Bipul Hossen, Zhongfu Ye, Amr Abdussalam, Mohammad Alamgir Hossain

Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.

细粒度图像字幕是视觉转语言任务中的一个焦点,在生成准确且与上下文相关的图像字幕方面引起了广泛关注。有效的属性预测及其利用在提高图像标题性能方面起着至关重要的作用。尽管之前与属性相关的方法取得了进展,但这些方法要么侧重于预测与输入图像相关的属性,要么侧重于在语言模型的每个时间步骤中预测与语言上下文相关的属性。然而,这些方法往往忽视了平衡视觉和语言上下文的重要性,从而导致语义信息的无效利用和随之而来的性能下降。为了解决这些问题,我们引入了独立属性预测器(IAP),通过利用视觉对象和属性嵌入之间的关系来精确预测与输入图像相关的属性。随后,又提出了增强型属性预测器(EAP),首先预测与语言上下文相关的属性,然后利用 IAP 模块的先验概率重新平衡图像和语言上下文相关属性,从而生成更稳健、更增强的属性概率。这些经过改进的属性随后被整合到语言 LSTM 层,以确保在每个时间步骤中进行准确的单词预测。在我们提出的图像字幕增强属性预测器(ICEAP)模型中,IAP 和 EAP 模块的集成有效地整合了高层语义细节,从而提高了模型的整体性能。通过交叉熵优化,ICEAP 的表现优于同类模型,其在 MS-COCO 数据集、Flickr30K 数据集和 Flickr8K 数据集上的 CIDEr-D 得分平均提高了 10.62%,Flickr30K 数据集提高了 9.63%,Flickr8K 数据集提高了 7.74%,定性分析证实了其生成细粒度标题的能力。
{"title":"ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor","authors":"Md. Bipul Hossen,&nbsp;Zhongfu Ye,&nbsp;Amr Abdussalam,&nbsp;Mohammad Alamgir Hossain","doi":"10.1016/j.displa.2024.102798","DOIUrl":"10.1016/j.displa.2024.102798","url":null,"abstract":"<div><p>Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102798"},"PeriodicalIF":3.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141951620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBMKA-Net:Dual branch multi-perception kernel adaptation for underwater image enhancement DBMKA-Net:用于水下图像增强的双分支多感知内核适配
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-22 DOI: 10.1016/j.displa.2024.102797
Hongjian Wang, Suting Chen

In recent years, due to the dependence on wavelength-based light absorption and scattering, underwater photographs captured by devices often exhibit characteristics such as blurriness, faded color tones, and low contrast. To address these challenges, convolutional neural networks (CNNs) with their robust feature-capturing capabilities and adaptable structures have been employed for underwater image enhancement. However, most CNN-based studies on underwater image enhancement have not taken into account color space kernel convolution adaptability, which can significantly enhance the model’s expressive capacity. Building upon current academic research on adjusting the color space size for each perceptual field, this paper introduces a Double-Branch Multi-Perception Kernel Adaptive (DBMKA) model. A DBMKA module is constructed through two perceptual branches that adapt the kernels: channel features and local image entropy. Additionally, considering the pronounced attenuation of the red channel in underwater images, a Dependency-Capturing Feature Jump Connection module (DCFJC) has been designed to capture the red channel’s dependence on the blue and green channels for compensation. Its skip mechanism effectively preserves color contextual information. To better utilize the extracted features for enhancing underwater images, a Cross-Level Attention Feature Fusion (CLAFF) module has been designed. With the Double-Branch Multi-Perception Kernel Adaptive model, Dependency-Capturing Skip Connection module, and Cross-Level Adaptive Feature Fusion module, this network can effectively enhance various types of underwater images. Qualitative and quantitative evaluations were conducted on the UIEB and EUVP datasets. In the color correction comparison experiments, our method demonstrated a more uniform red channel distribution across all gray levels, maintaining color consistency and naturalness. Regarding image information entropy (IIE) and average gradient (AG), the data confirmed our method’s superiority in preserving image details. Furthermore, our proposed method showed performance improvements exceeding 10% on other metrics like MSE and UCIQE, further validating its effectiveness and accuracy.

近年来,由于对基于波长的光吸收和散射的依赖,设备拍摄的水下照片往往表现出模糊、色调褪色和对比度低等特征。为了应对这些挑战,卷积神经网络(CNN)凭借其强大的特征捕捉能力和适应性强的结构被用于水下图像增强。然而,大多数基于卷积神经网络的水下图像增强研究都没有考虑色彩空间内核卷积的适应性,而这种适应性可以显著增强模型的表达能力。在目前学术界关于调整各感知领域色彩空间大小的研究基础上,本文介绍了双分支多感知核自适应(DBMKA)模型。DBMKA 模块是通过两个感知分支来构建的,这两个分支分别对通道特征和局部图像熵进行内核自适应。此外,考虑到水下图像中红色通道的明显衰减,还设计了依赖捕捉特征跳转连接模块(DCFJC),以捕捉红色通道对蓝色和绿色通道的依赖性,从而进行补偿。其跳转机制可有效保留色彩上下文信息。为了更好地利用提取的特征来增强水下图像,我们设计了一个跨级别注意力特征融合(CLAFF)模块。通过双分支多感知内核自适应模型、依赖捕捉跳转连接模块和跨层自适应特征融合模块,该网络可有效增强各类水下图像。在 UIEB 和 EUVP 数据集上进行了定性和定量评估。在色彩校正对比实验中,我们的方法在各灰度级的红色通道分布更加均匀,保持了色彩的一致性和自然度。在图像信息熵(IIE)和平均梯度(AG)方面,数据证实了我们的方法在保留图像细节方面的优势。此外,我们提出的方法在 MSE 和 UCIQE 等其他指标上的性能改进超过了 10%,进一步验证了其有效性和准确性。
{"title":"DBMKA-Net:Dual branch multi-perception kernel adaptation for underwater image enhancement","authors":"Hongjian Wang,&nbsp;Suting Chen","doi":"10.1016/j.displa.2024.102797","DOIUrl":"10.1016/j.displa.2024.102797","url":null,"abstract":"<div><p>In recent years, due to the dependence on wavelength-based light absorption and scattering, underwater photographs captured by devices often exhibit characteristics such as blurriness, faded color tones, and low contrast. To address these challenges, convolutional neural networks (CNNs) with their robust feature-capturing capabilities and adaptable structures have been employed for underwater image enhancement. However, most CNN-based studies on underwater image enhancement have not taken into account color space kernel convolution adaptability, which can significantly enhance the model’s expressive capacity. Building upon current academic research on adjusting the color space size for each perceptual field, this paper introduces a Double-Branch Multi-Perception Kernel Adaptive (DBMKA) model. A DBMKA module is constructed through two perceptual branches that adapt the kernels: channel features and local image entropy. Additionally, considering the pronounced attenuation of the red channel in underwater images, a Dependency-Capturing Feature Jump Connection module (DCFJC) has been designed to capture the red channel’s dependence on the blue and green channels for compensation. Its skip mechanism effectively preserves color contextual information. To better utilize the extracted features for enhancing underwater images, a Cross-Level Attention Feature Fusion (CLAFF) module has been designed. With the Double-Branch Multi-Perception Kernel Adaptive model, Dependency-Capturing Skip Connection module, and Cross-Level Adaptive Feature Fusion module, this network can effectively enhance various types of underwater images. Qualitative and quantitative evaluations were conducted on the UIEB and EUVP datasets. In the color correction comparison experiments, our method demonstrated a more uniform red channel distribution across all gray levels, maintaining color consistency and naturalness. Regarding image information entropy (IIE) and average gradient (AG), the data confirmed our method’s superiority in preserving image details. Furthermore, our proposed method showed performance improvements exceeding 10% on other metrics like MSE and UCIQE, further validating its effectiveness and accuracy.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102797"},"PeriodicalIF":3.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141847192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-threshold image segmentation using new strategies enhanced whale optimization for lupus nephritis pathological images 针对狼疮性肾炎病理图像的多阈值图像分割新策略--增强鲸鱼优化法
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-20 DOI: 10.1016/j.displa.2024.102799
Jinge Shi , Yi Chen , Chaofan Wang , Ali Asghar Heidari , Lei Liu , Huiling Chen , Xiaowei Chen , Li Sun

Lupus Nephritis (LN) has been considered as the most prevalent form of systemic lupus erythematosus. Medical imaging plays an important role in diagnosing and treating LN, which can help doctors accurately assess the extent and extent of the lesion. However, relying solely on visual observation and judgment can introduce subjectivity and errors, especially for complex pathological images. Image segmentation techniques are used to differentiate various tissues and structures in medical images to assist doctors in diagnosis. Multi-threshold Image Segmentation (MIS) has gained widespread recognition for its direct and practical application. However, existing MIS methods still have some issues. Therefore, this study combines non-local means, 2D histogram, and 2D Renyi’s entropy to improve the performance of MIS methods. Additionally, this study introduces an improved variant of the Whale Optimization Algorithm (GTMWOA) to optimize the aforementioned MIS methods and reduce algorithm complexity. The GTMWOA fusions Gaussian Exploration (GE), Topology Mapping (TM), and Magnetic Liquid Climbing (MLC). The GE effectively amplifies the algorithm’s proficiency in local exploration and quickens the convergence rate. The TM facilitates the algorithm in escaping local optima, while the MLC mechanism emulates the physical phenomenon of MLC, refining the algorithm’s convergence precision. This study conducted an extensive series of tests using the IEEE CEC 2017 benchmark functions to demonstrate the superior performance of GTMWOA in addressing intricate optimization problems. Furthermore, this study executed an experiment using Berkeley images and LN images to verify the superiority of GTMWOA in MIS. The ultimate outcomes of the MIS experiments substantiate the algorithm’s advanced capabilities and robustness in handling complex optimization problems.

狼疮性肾炎(LN)被认为是系统性红斑狼疮中最常见的一种。医学影像在诊断和治疗 LN 方面发挥着重要作用,可以帮助医生准确评估病变的范围和程度。然而,仅仅依靠肉眼观察和判断会带来主观性和误差,尤其是对于复杂的病理图像。图像分割技术可用于区分医学图像中的各种组织和结构,从而帮助医生进行诊断。多阈值图像分割(MIS)因其直接和实际的应用而得到广泛认可。然而,现有的多阈值图像分割方法仍存在一些问题。因此,本研究将非局部均值、二维直方图和二维仁义熵结合起来,以提高 MIS 方法的性能。此外,本研究还引入了鲸鱼优化算法(GTMWOA)的改进变体,以优化上述 MIS 方法并降低算法复杂度。GTMWOA 融合了高斯探索 (GE)、拓扑映射 (TM) 和磁液爬升 (MLC)。高斯探索有效提高了算法在局部探索方面的能力,并加快了收敛速度。TM有助于算法摆脱局部最优状态,而MLC机制则模拟了MLC的物理现象,提高了算法的收敛精度。本研究使用 IEEE CEC 2017 基准函数进行了一系列广泛的测试,以证明 GTMWOA 在解决复杂优化问题方面的卓越性能。此外,本研究还使用伯克利图像和 LN 图像进行了实验,以验证 GTMWOA 在 MIS 中的优越性。MIS 实验的最终结果证实了该算法在处理复杂优化问题时的先进能力和鲁棒性。
{"title":"Multi-threshold image segmentation using new strategies enhanced whale optimization for lupus nephritis pathological images","authors":"Jinge Shi ,&nbsp;Yi Chen ,&nbsp;Chaofan Wang ,&nbsp;Ali Asghar Heidari ,&nbsp;Lei Liu ,&nbsp;Huiling Chen ,&nbsp;Xiaowei Chen ,&nbsp;Li Sun","doi":"10.1016/j.displa.2024.102799","DOIUrl":"10.1016/j.displa.2024.102799","url":null,"abstract":"<div><p>Lupus Nephritis (LN) has been considered as the most prevalent form of systemic lupus erythematosus. Medical imaging plays an important role in diagnosing and treating LN, which can help doctors accurately assess the extent and extent of the lesion. However, relying solely on visual observation and judgment can introduce subjectivity and errors, especially for complex pathological images. Image segmentation techniques are used to differentiate various tissues and structures in medical images to assist doctors in diagnosis. Multi-threshold Image Segmentation (MIS) has gained widespread recognition for its direct and practical application. However, existing MIS methods still have some issues. Therefore, this study combines non-local means, 2D histogram, and 2D Renyi’s entropy to improve the performance of MIS methods. Additionally, this study introduces an improved variant of the Whale Optimization Algorithm (GTMWOA) to optimize the aforementioned MIS methods and reduce algorithm complexity. The GTMWOA fusions Gaussian Exploration (GE), Topology Mapping (TM), and Magnetic Liquid Climbing (MLC). The GE effectively amplifies the algorithm’s proficiency in local exploration and quickens the convergence rate. The TM facilitates the algorithm in escaping local optima, while the MLC mechanism emulates the physical phenomenon of MLC, refining the algorithm’s convergence precision. This study conducted an extensive series of tests using the IEEE CEC 2017 benchmark functions to demonstrate the superior performance of GTMWOA in addressing intricate optimization problems. Furthermore, this study executed an experiment using Berkeley images and LN images to verify the superiority of GTMWOA in MIS. The ultimate outcomes of the MIS experiments substantiate the algorithm’s advanced capabilities and robustness in handling complex optimization problems.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102799"},"PeriodicalIF":3.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified architecture for super-resolution and segmentation of remote sensing images based on similarity feature fusion 基于相似性特征融合的遥感图像超分辨率和分割统一架构
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-20 DOI: 10.1016/j.displa.2024.102800
Lunqian Wang , Xinghua Wang , Weilin Liu , Hao Ding , Bo Xia , Zekai Zhang , Jinglin Zhang , Sen Xu

The resolution of the image has an important impact on the accuracy of segmentation. Integrating super-resolution (SR) techniques in the semantic segmentation of remote sensing images contributes to the improvement of precision and accuracy, especially when the images are blurred. In this paper, a novel and efficient SR semantic segmentation network (SRSEN) is designed by taking advantage of the similarity between SR and segmentation tasks in feature processing. SRSEN consists of the multi-scale feature encoder, the SR fusion decoder, and the multi-path feature refinement block, which adaptively establishes the feature associations between segmentation and SR tasks to improve the segmentation accuracy of blurred images. Experiments show that the proposed method achieves higher segmentation accuracy on fuzzy images compared to state-of-the-art models. Specifically, the mIoU of the proposed SRSEN is 3%–6% higher than other state-of-the-art models on low-resolution LoveDa, Vaihingen, and Potsdam datasets.

图像的分辨率对分割的准确性有重要影响。在遥感图像的语义分割中集成超分辨率(SR)技术有助于提高精度和准确性,尤其是在图像模糊的情况下。本文利用 SR 与特征处理中的分割任务之间的相似性,设计了一种新颖高效的 SR 语义分割网络(SRSEN)。SRSEN 由多尺度特征编码器、SR 融合解码器和多路径特征细化块组成,可自适应地建立分割任务和 SR 任务之间的特征关联,从而提高模糊图像的分割精度。实验表明,与最先进的模型相比,所提出的方法在模糊图像上实现了更高的分割精度。具体来说,在低分辨率的 LoveDa、Vaihingen 和 Potsdam 数据集上,所提出的 SRSEN 的 mIoU 比其他先进模型高出 3%-6%。
{"title":"A unified architecture for super-resolution and segmentation of remote sensing images based on similarity feature fusion","authors":"Lunqian Wang ,&nbsp;Xinghua Wang ,&nbsp;Weilin Liu ,&nbsp;Hao Ding ,&nbsp;Bo Xia ,&nbsp;Zekai Zhang ,&nbsp;Jinglin Zhang ,&nbsp;Sen Xu","doi":"10.1016/j.displa.2024.102800","DOIUrl":"10.1016/j.displa.2024.102800","url":null,"abstract":"<div><p>The resolution of the image has an important impact on the accuracy of segmentation. Integrating super-resolution (SR) techniques in the semantic segmentation of remote sensing images contributes to the improvement of precision and accuracy, especially when the images are blurred. In this paper, a novel and efficient SR semantic segmentation network (SRSEN) is designed by taking advantage of the similarity between SR and segmentation tasks in feature processing. SRSEN consists of the multi-scale feature encoder, the SR fusion decoder, and the multi-path feature refinement block, which adaptively establishes the feature associations between segmentation and SR tasks to improve the segmentation accuracy of blurred images. Experiments show that the proposed method achieves higher segmentation accuracy on fuzzy images compared to state-of-the-art models. Specifically, the mIoU of the proposed SRSEN is 3%–6% higher than other state-of-the-art models on low-resolution LoveDa, Vaihingen, and Potsdam datasets.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102800"},"PeriodicalIF":3.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiF-DETR:Remote sensing object detection based on Bidirectional information fusion BiF-DETR:基于双向信息融合的遥感物体探测
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-19 DOI: 10.1016/j.displa.2024.102802
Zhijing Xu, Chao Wang, Kan Huang

Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with mAP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.

遥感物体检测(RSOD)是遥感图像处理领域的一项基本任务。背景的复杂性、物体尺度的多样性以及卷积神经网络(CNN)的定位限制,都给遥感物体检测带来了特殊的挑战。本文提出了一种创新的混合检测器--双向信息融合检测转换器(BiF-DETR),以缓解上述问题。具体来说,BiF-DETR 以无锚检测网络 CenterNet 为基线,并行设计特征提取骨干网,使用 CNN 提取局部特征细节,并使用 Transformer 分支获取全局信息和长期依赖关系。精心设计的双向信息融合(Bidirectional Information Fusion,BIF)模块通过多层次的迭代信息交互,充分利用不同检测器的互补优势,减少不同风格特征图之间的语义差异。此外,还引入了协调注意力(CA),使检测网络能够关注小物体的显著性信息。为解决训练阶段遥感图像多样性不足的问题,设计了级联混合数据增强(CMDA),以提高模型的鲁棒性和泛化能力。在公开的 DOTA 和 NWPU VHR-10 数据集上进行了与其他前沿方法的对比实验。实验结果表明,所提方法的性能达到了最先进水平,mAP 分别达到了 77.43% 和 94.75%,远远超过了其他 25 种竞争方法。
{"title":"BiF-DETR:Remote sensing object detection based on Bidirectional information fusion","authors":"Zhijing Xu,&nbsp;Chao Wang,&nbsp;Kan Huang","doi":"10.1016/j.displa.2024.102802","DOIUrl":"10.1016/j.displa.2024.102802","url":null,"abstract":"<div><p>Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with <em>m</em>AP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102802"},"PeriodicalIF":3.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001665/pdfft?md5=e3ed1b94823f012220f1a30a72ed7985&pid=1-s2.0-S0141938224001665-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141736394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FSNet: A dual-domain network for few-shot image classification FSNet:用于少量图像分类的双域网络
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-14 DOI: 10.1016/j.displa.2024.102795
Xuewen Yan, Zhangjin Huang

Few-shot learning is a challenging task, that aims to learn and identify novel classes from a limited number of unseen labeled samples. Previous work has focused primarily on extracting features solely in the spatial domain of images. However, the compressed representation in the frequency domain which contains rich pattern information is a powerful tool in the field of signal processing. Combining the frequency and spatial domains to obtain richer information can effectively alleviate the overfitting problem. In this paper, we propose a dual-domain combined model called Frequency Space Net (FSNet), which preprocesses input images simultaneously in both the spatial and frequency domains, extracts spatial and frequency information through two feature extractors, and fuses them to a composite feature for image classification tasks. We start from a different view of frequency analysis, linking conventional average pooling to Discrete Cosine Transformation (DCT). We generalize the compression of the attention mechanism in the frequency domain. Consequently, we propose a novel Frequency Channel Spatial (FCS) attention mechanism. Extensive experiments demonstrate that frequency and spatial information are complementary in few-shot image classification, improving the performance of the model. Our method outperforms state-of-the-art approaches on miniImageNet and CUB.

少量学习是一项具有挑战性的任务,其目的是从数量有限的未见标注样本中学习和识别新的类别。以往的工作主要集中在仅提取图像空间域的特征。然而,频率域的压缩表示包含丰富的模式信息,是信号处理领域的有力工具。结合频域和空间域获取更丰富的信息可以有效缓解过拟合问题。本文提出了一种名为频率空间网(FSNet)的双域组合模型,它能同时在空间域和频率域对输入图像进行预处理,通过两个特征提取器提取空间和频率信息,并将它们融合为一个复合特征,用于图像分类任务。我们从频率分析的不同视角出发,将传统的平均集合与离散余弦变换(DCT)联系起来。我们在频域中对注意力机制的压缩进行了概括。因此,我们提出了一种新颖的频率通道空间(FCS)注意力机制。大量实验证明,频率和空间信息在少帧图像分类中是互补的,从而提高了模型的性能。我们的方法在 miniImageNet 和 CUB 上的表现优于最先进的方法。
{"title":"FSNet: A dual-domain network for few-shot image classification","authors":"Xuewen Yan,&nbsp;Zhangjin Huang","doi":"10.1016/j.displa.2024.102795","DOIUrl":"10.1016/j.displa.2024.102795","url":null,"abstract":"<div><p>Few-shot learning is a challenging task, that aims to learn and identify novel classes from a limited number of unseen labeled samples. Previous work has focused primarily on extracting features solely in the spatial domain of images. However, the compressed representation in the frequency domain which contains rich pattern information is a powerful tool in the field of signal processing. Combining the frequency and spatial domains to obtain richer information can effectively alleviate the overfitting problem. In this paper, we propose a dual-domain combined model called Frequency Space Net (FSNet), which preprocesses input images simultaneously in both the spatial and frequency domains, extracts spatial and frequency information through two feature extractors, and fuses them to a composite feature for image classification tasks. We start from a different view of frequency analysis, linking conventional average pooling to Discrete Cosine Transformation (DCT). We generalize the compression of the attention mechanism in the frequency domain. Consequently, we propose a novel Frequency Channel Spatial (FCS) attention mechanism. Extensive experiments demonstrate that frequency and spatial information are complementary in few-shot image classification, improving the performance of the model. Our method outperforms state-of-the-art approaches on miniImageNet and CUB.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102795"},"PeriodicalIF":3.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning path planning method incorporating multi-step Hindsight Experience Replay for lightweight robots 针对轻型机器人的包含多步 "后见之明 "经验回放的强化学习路径规划方法
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-14 DOI: 10.1016/j.displa.2024.102796
Jiaqi Wang, Huiyan Han, Xie Han, Liqun Kuang, Xiaowen Yang

Home service robots prioritize cost-effectiveness and convenience over the precision required for industrial tasks like autonomous driving, making their task execution more easily. Meanwhile, path planning tasks using Deep Reinforcement Learning(DRL) are commonly sparse reward problems with limited data utilization, posing challenges in obtaining meaningful rewards during training, consequently resulting in slow or challenging training. In response to these challenges, our paper introduces a lightweight end-to-end path planning algorithm employing with hindsight experience replay(HER). Initially, we optimize the reinforcement learning training process from scratch and map the complex high-dimensional action space and state space to the representative low-dimensional action space. At the same time, we improve the network structure to decouple the model navigation and obstacle avoidance module to meet the requirements of lightweight. Subsequently, we integrate HER and curriculum learning (CL) to tackle issues related to inefficient training. Additionally, we propose a multi-step hindsight experience replay (MS-HER) specifically for the path planning task, markedly enhancing both training efficiency and model generalization across diverse environments. To substantiate the enhanced training efficiency of the refined algorithm, we conducted tests within diverse Gazebo simulation environments. Results of the experiments reveal noteworthy enhancements in critical metrics, including success rate and training efficiency. To further ascertain the enhanced algorithm’s generalization capability, we evaluate its performance in some ”never-before-seen” simulation environment. Ultimately, we deploy the trained model onto a real lightweight robot for validation. The experimental outcomes indicate the model’s competence in successfully executing the path planning task, even on a small robot with constrained computational resources.

与自动驾驶等工业任务所需的精度相比,家用服务机器人更注重成本效益和便利性,这使其更容易执行任务。与此同时,使用深度强化学习(DRL)的路径规划任务通常是数据利用率有限的稀疏奖励问题,在训练过程中难以获得有意义的奖励,从而导致训练速度缓慢或训练难度增加。为了应对这些挑战,我们的论文介绍了一种采用事后经验重放(HER)的轻量级端到端路径规划算法。首先,我们从头开始优化强化学习训练过程,将复杂的高维行动空间和状态空间映射到有代表性的低维行动空间。同时,我们改进了网络结构,将模型导航和避障模块解耦,以满足轻量级的要求。随后,我们整合了 HER 和课程学习(CL),以解决训练效率低下的相关问题。此外,我们还针对路径规划任务提出了多步骤后见经验重放(MS-HER),显著提高了训练效率和模型在不同环境下的泛化能力。为了证实改进算法提高了训练效率,我们在不同的 Gazebo 仿真环境中进行了测试。实验结果表明,成功率和训练效率等关键指标都有显著提高。为了进一步确定增强算法的泛化能力,我们在一些 "前所未见 "的模拟环境中对其性能进行了评估。最后,我们将训练好的模型部署到一个真正的轻型机器人上进行验证。实验结果表明,即使在计算资源有限的小型机器人上,该模型也能成功执行路径规划任务。
{"title":"Reinforcement learning path planning method incorporating multi-step Hindsight Experience Replay for lightweight robots","authors":"Jiaqi Wang,&nbsp;Huiyan Han,&nbsp;Xie Han,&nbsp;Liqun Kuang,&nbsp;Xiaowen Yang","doi":"10.1016/j.displa.2024.102796","DOIUrl":"10.1016/j.displa.2024.102796","url":null,"abstract":"<div><p>Home service robots prioritize cost-effectiveness and convenience over the precision required for industrial tasks like autonomous driving, making their task execution more easily. Meanwhile, path planning tasks using Deep Reinforcement Learning(DRL) are commonly sparse reward problems with limited data utilization, posing challenges in obtaining meaningful rewards during training, consequently resulting in slow or challenging training. In response to these challenges, our paper introduces a lightweight end-to-end path planning algorithm employing with hindsight experience replay(HER). Initially, we optimize the reinforcement learning training process from scratch and map the complex high-dimensional action space and state space to the representative low-dimensional action space. At the same time, we improve the network structure to decouple the model navigation and obstacle avoidance module to meet the requirements of lightweight. Subsequently, we integrate HER and curriculum learning (CL) to tackle issues related to inefficient training. Additionally, we propose a multi-step hindsight experience replay (MS-HER) specifically for the path planning task, markedly enhancing both training efficiency and model generalization across diverse environments. To substantiate the enhanced training efficiency of the refined algorithm, we conducted tests within diverse Gazebo simulation environments. Results of the experiments reveal noteworthy enhancements in critical metrics, including success rate and training efficiency. To further ascertain the enhanced algorithm’s generalization capability, we evaluate its performance in some ”never-before-seen” simulation environment. Ultimately, we deploy the trained model onto a real lightweight robot for validation. The experimental outcomes indicate the model’s competence in successfully executing the path planning task, even on a small robot with constrained computational resources.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102796"},"PeriodicalIF":3.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141690713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reduction of short-time image sticking in organic light-emitting diode display through transient analysis of low-temperature polycrystalline silicon thin-film transistor 通过低温多晶硅薄膜晶体管的瞬态分析减少有机发光二极管显示屏的短时图像粘滞现象
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-09 DOI: 10.1016/j.displa.2024.102794
Jiwook Hong , Jaewon Lim , Jongwook Jeon

Accurate compensation operation of low-temperature polycrystalline-silicon (LTPS) thin-film transistor (TFT) in pixel circuits is crucial to achieve steady and uniform luminance in organic light-emitting diode (OLED) display panels. However, the device characteristics fluctuate over time due to various traps in the LTPS thin film transistor and at the interface with the gate insulator, resulting in abnormal phenomena such as short-time image sticking and luminance fluctuation, which degrade display quality during image change. Considering these phenomena, transient analysis was conducted through device simulation to optimize the pixel compensation circuit. In particular, we analyzed the behavior of traps within LTPS TFT in correlation with compensation circuit operation, and based on this, proposed a methodology for designing a reset voltage scheme for the driver TFT to reduce the image sticking phenomenon.

低温多晶硅(LTPS)薄膜晶体管(TFT)在像素电路中的精确补偿操作对于实现有机发光二极管(OLED)显示面板的稳定和均匀亮度至关重要。然而,由于 LTPS 薄膜晶体管中以及与栅极绝缘体接口处存在各种陷阱,器件特性会随时间发生波动,从而导致短时图像粘连和亮度波动等异常现象,在图像变化时降低显示质量。考虑到这些现象,我们通过器件仿真进行了瞬态分析,以优化像素补偿电路。特别是,我们分析了 LTPS TFT 内陷阱的行为与补偿电路工作的相关性,并在此基础上提出了设计驱动 TFT 复位电压方案的方法,以减少图像粘连现象。
{"title":"Reduction of short-time image sticking in organic light-emitting diode display through transient analysis of low-temperature polycrystalline silicon thin-film transistor","authors":"Jiwook Hong ,&nbsp;Jaewon Lim ,&nbsp;Jongwook Jeon","doi":"10.1016/j.displa.2024.102794","DOIUrl":"10.1016/j.displa.2024.102794","url":null,"abstract":"<div><p>Accurate compensation operation of low-temperature polycrystalline-silicon (LTPS) thin-film transistor (TFT) in pixel circuits is crucial to achieve steady and uniform luminance in organic light-emitting diode (OLED) display panels. However, the device characteristics fluctuate over time due to various traps in the LTPS thin film transistor and at the interface with the gate insulator, resulting in abnormal phenomena such as short-time image sticking and luminance fluctuation, which degrade display quality during image change. Considering these phenomena, transient analysis was conducted through device simulation to optimize the pixel compensation circuit. In particular, we analyzed the behavior of traps within LTPS TFT in correlation with compensation circuit operation, and based on this, proposed a methodology for designing a reset voltage scheme for the driver TFT to reduce the image sticking phenomenon.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102794"},"PeriodicalIF":3.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001586/pdfft?md5=af589a6e358a315d9e0495f42299ea93&pid=1-s2.0-S0141938224001586-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141697954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSAug: Multi-Strategy Augmentation for rare classes in semantic segmentation of remote sensing images MSAug:遥感图像语义分割中稀有类别的多策略增强功能
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-08 DOI: 10.1016/j.displa.2024.102779
Zhi Gong , Lijuan Duan , Fengjin Xiao , Yuxi Wang

Recently, remote sensing images have been widely used in many scenarios, gradually becoming the focus of social attention. Nevertheless, the limited annotation of scarce classes severely reduces segmentation performance. This phenomenon is more prominent in remote sensing image segmentation. Given this, we focus on image fusion and model feedback, proposing a multi-strategy method called MSAug to address the remote sensing imbalance problem. Firstly, we crop rare class images multiple times based on prior knowledge at the image patch level to provide more balanced samples. Secondly, we design an adaptive image enhancement module at the model feedback level to accurately classify rare classes at each stage and dynamically paste and mask different classes to further improve the model’s recognition capabilities. The MSAug method is highly flexible and can be plug-and-play. Experimental results on remote sensing image segmentation datasets show that adding MSAug to any remote sensing image semantic segmentation network can bring varying degrees of performance improvement.

近年来,遥感图像被广泛应用于多种场景,逐渐成为社会关注的焦点。然而,对稀缺类别的有限标注严重降低了分割性能。这一现象在遥感图像分割中更为突出。有鉴于此,我们将重点放在图像融合和模型反馈上,提出了一种名为 MSAug 的多策略方法来解决遥感失衡问题。首先,我们根据图像斑块层面的先验知识对稀有类图像进行多次裁剪,以提供更均衡的样本。其次,我们在模型反馈层面设计了一个自适应图像增强模块,以便在每个阶段对稀有类别进行准确分类,并动态粘贴和屏蔽不同类别,进一步提高模型的识别能力。MSAug 方法非常灵活,可以即插即用。在遥感图像分割数据集上的实验结果表明,在任何遥感图像语义分割网络中添加 MSAug 都能带来不同程度的性能提升。
{"title":"MSAug: Multi-Strategy Augmentation for rare classes in semantic segmentation of remote sensing images","authors":"Zhi Gong ,&nbsp;Lijuan Duan ,&nbsp;Fengjin Xiao ,&nbsp;Yuxi Wang","doi":"10.1016/j.displa.2024.102779","DOIUrl":"https://doi.org/10.1016/j.displa.2024.102779","url":null,"abstract":"<div><p>Recently, remote sensing images have been widely used in many scenarios, gradually becoming the focus of social attention. Nevertheless, the limited annotation of scarce classes severely reduces segmentation performance. This phenomenon is more prominent in remote sensing image segmentation. Given this, we focus on image fusion and model feedback, proposing a multi-strategy method called MSAug to address the remote sensing imbalance problem. Firstly, we crop rare class images multiple times based on prior knowledge at the image patch level to provide more balanced samples. Secondly, we design an adaptive image enhancement module at the model feedback level to accurately classify rare classes at each stage and dynamically paste and mask different classes to further improve the model’s recognition capabilities. The MSAug method is highly flexible and can be plug-and-play. Experimental results on remote sensing image segmentation datasets show that adding MSAug to any remote sensing image semantic segmentation network can bring varying degrees of performance improvement.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102779"},"PeriodicalIF":3.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADS-VQA: Adaptive sampling model for video quality assessment ADS-VQA:用于视频质量评估的自适应采样模型
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-04 DOI: 10.1016/j.displa.2024.102792
Shuaibo Cheng, Xiaopeng Li, Zhaoyuan Zeng, Jia Yan

No-reference video quality assessment (NR-VQA) for user-generated content (UGC) plays a crucial role in ensuring the quality of video services. Although some works have achieved impressive results, their performance-complexity trade-off is still sub-optimal. On the one hand, overly complex network structures and additional inputs require more computing resources. On the other hand, the simple sampling methods have tended to overlook the temporal characteristics of the videos, resulting in the degradation of local textures and potential distortion of the thematic content, consequently leading to the performance decline of the VQA technologies. Therefore, in this paper, we propose an enhanced NR-VQA model, known as the Adaptive Sampling Strategy for Video Quality Assessment (ADS-VQA). Temporally, we conduct non-uniform sampling on videos utilizing features from the lateral geniculate nucleus (LGN) to capture the temporal characteristics of videos. Spatially, a dual-branch structure is designed to supplement spatial features across different levels. The one branch samples patches at their raw resolution, effectively preserving the local texture detail. The other branch performs a downsampling process guided by saliency cues, attaining global semantic features with a diminished computational expense. Experimental results demonstrate that the proposed approach achieves high performance at a lower computational cost than most state-of-the-art VQA models on four popular VQA databases.

针对用户生成内容(UGC)的无参考视频质量评估(NR-VQA)在确保视频服务质量方面发挥着至关重要的作用。尽管一些研究取得了令人瞩目的成果,但其性能与复杂性之间的权衡仍未达到最佳状态。一方面,过于复杂的网络结构和额外的输入需要更多的计算资源。另一方面,简单的采样方法往往会忽略视频的时间特性,造成局部纹理的退化和主题内容的潜在失真,从而导致 VQA 技术的性能下降。因此,我们在本文中提出了一种增强型 NR-VQA 模型,即视频质量评估的自适应采样策略(ADS-VQA)。在时间上,我们利用外侧膝状核(LGN)的特征对视频进行非均匀采样,以捕捉视频的时间特征。在空间上,我们设计了一个双分支结构来补充不同层次的空间特征。一个分支以原始分辨率对补丁进行采样,有效地保留了局部纹理细节。另一个分支则在显著性线索的引导下执行降采样过程,从而以较低的计算成本获得全局语义特征。实验结果表明,在四个流行的 VQA 数据库上,与大多数最先进的 VQA 模型相比,所提出的方法以更低的计算成本实现了更高的性能。
{"title":"ADS-VQA: Adaptive sampling model for video quality assessment","authors":"Shuaibo Cheng,&nbsp;Xiaopeng Li,&nbsp;Zhaoyuan Zeng,&nbsp;Jia Yan","doi":"10.1016/j.displa.2024.102792","DOIUrl":"10.1016/j.displa.2024.102792","url":null,"abstract":"<div><p>No-reference video quality assessment (NR-VQA) for user-generated content (UGC) plays a crucial role in ensuring the quality of video services. Although some works have achieved impressive results, their performance-complexity trade-off is still sub-optimal. On the one hand, overly complex network structures and additional inputs require more computing resources. On the other hand, the simple sampling methods have tended to overlook the temporal characteristics of the videos, resulting in the degradation of local textures and potential distortion of the thematic content, consequently leading to the performance decline of the VQA technologies. Therefore, in this paper, we propose an enhanced NR-VQA model, known as the Adaptive Sampling Strategy for Video Quality Assessment (ADS-VQA). Temporally, we conduct non-uniform sampling on videos utilizing features from the lateral geniculate nucleus (LGN) to capture the temporal characteristics of videos. Spatially, a dual-branch structure is designed to supplement spatial features across different levels. The one branch samples patches at their raw resolution, effectively preserving the local texture detail. The other branch performs a downsampling process guided by saliency cues, attaining global semantic features with a diminished computational expense. Experimental results demonstrate that the proposed approach achieves high performance at a lower computational cost than most state-of-the-art VQA models on four popular VQA databases.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102792"},"PeriodicalIF":3.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Displays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1