Pub Date : 2024-07-20DOI: 10.1016/j.displa.2024.102800
Lunqian Wang , Xinghua Wang , Weilin Liu , Hao Ding , Bo Xia , Zekai Zhang , Jinglin Zhang , Sen Xu
The resolution of the image has an important impact on the accuracy of segmentation. Integrating super-resolution (SR) techniques in the semantic segmentation of remote sensing images contributes to the improvement of precision and accuracy, especially when the images are blurred. In this paper, a novel and efficient SR semantic segmentation network (SRSEN) is designed by taking advantage of the similarity between SR and segmentation tasks in feature processing. SRSEN consists of the multi-scale feature encoder, the SR fusion decoder, and the multi-path feature refinement block, which adaptively establishes the feature associations between segmentation and SR tasks to improve the segmentation accuracy of blurred images. Experiments show that the proposed method achieves higher segmentation accuracy on fuzzy images compared to state-of-the-art models. Specifically, the mIoU of the proposed SRSEN is 3%–6% higher than other state-of-the-art models on low-resolution LoveDa, Vaihingen, and Potsdam datasets.
图像的分辨率对分割的准确性有重要影响。在遥感图像的语义分割中集成超分辨率(SR)技术有助于提高精度和准确性,尤其是在图像模糊的情况下。本文利用 SR 与特征处理中的分割任务之间的相似性,设计了一种新颖高效的 SR 语义分割网络(SRSEN)。SRSEN 由多尺度特征编码器、SR 融合解码器和多路径特征细化块组成,可自适应地建立分割任务和 SR 任务之间的特征关联,从而提高模糊图像的分割精度。实验表明,与最先进的模型相比,所提出的方法在模糊图像上实现了更高的分割精度。具体来说,在低分辨率的 LoveDa、Vaihingen 和 Potsdam 数据集上,所提出的 SRSEN 的 mIoU 比其他先进模型高出 3%-6%。
{"title":"A unified architecture for super-resolution and segmentation of remote sensing images based on similarity feature fusion","authors":"Lunqian Wang , Xinghua Wang , Weilin Liu , Hao Ding , Bo Xia , Zekai Zhang , Jinglin Zhang , Sen Xu","doi":"10.1016/j.displa.2024.102800","DOIUrl":"10.1016/j.displa.2024.102800","url":null,"abstract":"<div><p>The resolution of the image has an important impact on the accuracy of segmentation. Integrating super-resolution (SR) techniques in the semantic segmentation of remote sensing images contributes to the improvement of precision and accuracy, especially when the images are blurred. In this paper, a novel and efficient SR semantic segmentation network (SRSEN) is designed by taking advantage of the similarity between SR and segmentation tasks in feature processing. SRSEN consists of the multi-scale feature encoder, the SR fusion decoder, and the multi-path feature refinement block, which adaptively establishes the feature associations between segmentation and SR tasks to improve the segmentation accuracy of blurred images. Experiments show that the proposed method achieves higher segmentation accuracy on fuzzy images compared to state-of-the-art models. Specifically, the mIoU of the proposed SRSEN is 3%–6% higher than other state-of-the-art models on low-resolution LoveDa, Vaihingen, and Potsdam datasets.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102800"},"PeriodicalIF":3.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1016/j.displa.2024.102802
Zhijing Xu, Chao Wang, Kan Huang
Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with mAP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.
{"title":"BiF-DETR:Remote sensing object detection based on Bidirectional information fusion","authors":"Zhijing Xu, Chao Wang, Kan Huang","doi":"10.1016/j.displa.2024.102802","DOIUrl":"10.1016/j.displa.2024.102802","url":null,"abstract":"<div><p>Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with <em>m</em>AP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102802"},"PeriodicalIF":3.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001665/pdfft?md5=e3ed1b94823f012220f1a30a72ed7985&pid=1-s2.0-S0141938224001665-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141736394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.displa.2024.102795
Xuewen Yan, Zhangjin Huang
Few-shot learning is a challenging task, that aims to learn and identify novel classes from a limited number of unseen labeled samples. Previous work has focused primarily on extracting features solely in the spatial domain of images. However, the compressed representation in the frequency domain which contains rich pattern information is a powerful tool in the field of signal processing. Combining the frequency and spatial domains to obtain richer information can effectively alleviate the overfitting problem. In this paper, we propose a dual-domain combined model called Frequency Space Net (FSNet), which preprocesses input images simultaneously in both the spatial and frequency domains, extracts spatial and frequency information through two feature extractors, and fuses them to a composite feature for image classification tasks. We start from a different view of frequency analysis, linking conventional average pooling to Discrete Cosine Transformation (DCT). We generalize the compression of the attention mechanism in the frequency domain. Consequently, we propose a novel Frequency Channel Spatial (FCS) attention mechanism. Extensive experiments demonstrate that frequency and spatial information are complementary in few-shot image classification, improving the performance of the model. Our method outperforms state-of-the-art approaches on miniImageNet and CUB.
{"title":"FSNet: A dual-domain network for few-shot image classification","authors":"Xuewen Yan, Zhangjin Huang","doi":"10.1016/j.displa.2024.102795","DOIUrl":"10.1016/j.displa.2024.102795","url":null,"abstract":"<div><p>Few-shot learning is a challenging task, that aims to learn and identify novel classes from a limited number of unseen labeled samples. Previous work has focused primarily on extracting features solely in the spatial domain of images. However, the compressed representation in the frequency domain which contains rich pattern information is a powerful tool in the field of signal processing. Combining the frequency and spatial domains to obtain richer information can effectively alleviate the overfitting problem. In this paper, we propose a dual-domain combined model called Frequency Space Net (FSNet), which preprocesses input images simultaneously in both the spatial and frequency domains, extracts spatial and frequency information through two feature extractors, and fuses them to a composite feature for image classification tasks. We start from a different view of frequency analysis, linking conventional average pooling to Discrete Cosine Transformation (DCT). We generalize the compression of the attention mechanism in the frequency domain. Consequently, we propose a novel Frequency Channel Spatial (FCS) attention mechanism. Extensive experiments demonstrate that frequency and spatial information are complementary in few-shot image classification, improving the performance of the model. Our method outperforms state-of-the-art approaches on miniImageNet and CUB.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102795"},"PeriodicalIF":3.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.displa.2024.102796
Jiaqi Wang, Huiyan Han, Xie Han, Liqun Kuang, Xiaowen Yang
Home service robots prioritize cost-effectiveness and convenience over the precision required for industrial tasks like autonomous driving, making their task execution more easily. Meanwhile, path planning tasks using Deep Reinforcement Learning(DRL) are commonly sparse reward problems with limited data utilization, posing challenges in obtaining meaningful rewards during training, consequently resulting in slow or challenging training. In response to these challenges, our paper introduces a lightweight end-to-end path planning algorithm employing with hindsight experience replay(HER). Initially, we optimize the reinforcement learning training process from scratch and map the complex high-dimensional action space and state space to the representative low-dimensional action space. At the same time, we improve the network structure to decouple the model navigation and obstacle avoidance module to meet the requirements of lightweight. Subsequently, we integrate HER and curriculum learning (CL) to tackle issues related to inefficient training. Additionally, we propose a multi-step hindsight experience replay (MS-HER) specifically for the path planning task, markedly enhancing both training efficiency and model generalization across diverse environments. To substantiate the enhanced training efficiency of the refined algorithm, we conducted tests within diverse Gazebo simulation environments. Results of the experiments reveal noteworthy enhancements in critical metrics, including success rate and training efficiency. To further ascertain the enhanced algorithm’s generalization capability, we evaluate its performance in some ”never-before-seen” simulation environment. Ultimately, we deploy the trained model onto a real lightweight robot for validation. The experimental outcomes indicate the model’s competence in successfully executing the path planning task, even on a small robot with constrained computational resources.
与自动驾驶等工业任务所需的精度相比,家用服务机器人更注重成本效益和便利性,这使其更容易执行任务。与此同时,使用深度强化学习(DRL)的路径规划任务通常是数据利用率有限的稀疏奖励问题,在训练过程中难以获得有意义的奖励,从而导致训练速度缓慢或训练难度增加。为了应对这些挑战,我们的论文介绍了一种采用事后经验重放(HER)的轻量级端到端路径规划算法。首先,我们从头开始优化强化学习训练过程,将复杂的高维行动空间和状态空间映射到有代表性的低维行动空间。同时,我们改进了网络结构,将模型导航和避障模块解耦,以满足轻量级的要求。随后,我们整合了 HER 和课程学习(CL),以解决训练效率低下的相关问题。此外,我们还针对路径规划任务提出了多步骤后见经验重放(MS-HER),显著提高了训练效率和模型在不同环境下的泛化能力。为了证实改进算法提高了训练效率,我们在不同的 Gazebo 仿真环境中进行了测试。实验结果表明,成功率和训练效率等关键指标都有显著提高。为了进一步确定增强算法的泛化能力,我们在一些 "前所未见 "的模拟环境中对其性能进行了评估。最后,我们将训练好的模型部署到一个真正的轻型机器人上进行验证。实验结果表明,即使在计算资源有限的小型机器人上,该模型也能成功执行路径规划任务。
{"title":"Reinforcement learning path planning method incorporating multi-step Hindsight Experience Replay for lightweight robots","authors":"Jiaqi Wang, Huiyan Han, Xie Han, Liqun Kuang, Xiaowen Yang","doi":"10.1016/j.displa.2024.102796","DOIUrl":"10.1016/j.displa.2024.102796","url":null,"abstract":"<div><p>Home service robots prioritize cost-effectiveness and convenience over the precision required for industrial tasks like autonomous driving, making their task execution more easily. Meanwhile, path planning tasks using Deep Reinforcement Learning(DRL) are commonly sparse reward problems with limited data utilization, posing challenges in obtaining meaningful rewards during training, consequently resulting in slow or challenging training. In response to these challenges, our paper introduces a lightweight end-to-end path planning algorithm employing with hindsight experience replay(HER). Initially, we optimize the reinforcement learning training process from scratch and map the complex high-dimensional action space and state space to the representative low-dimensional action space. At the same time, we improve the network structure to decouple the model navigation and obstacle avoidance module to meet the requirements of lightweight. Subsequently, we integrate HER and curriculum learning (CL) to tackle issues related to inefficient training. Additionally, we propose a multi-step hindsight experience replay (MS-HER) specifically for the path planning task, markedly enhancing both training efficiency and model generalization across diverse environments. To substantiate the enhanced training efficiency of the refined algorithm, we conducted tests within diverse Gazebo simulation environments. Results of the experiments reveal noteworthy enhancements in critical metrics, including success rate and training efficiency. To further ascertain the enhanced algorithm’s generalization capability, we evaluate its performance in some ”never-before-seen” simulation environment. Ultimately, we deploy the trained model onto a real lightweight robot for validation. The experimental outcomes indicate the model’s competence in successfully executing the path planning task, even on a small robot with constrained computational resources.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102796"},"PeriodicalIF":3.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141690713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1016/j.displa.2024.102794
Jiwook Hong , Jaewon Lim , Jongwook Jeon
Accurate compensation operation of low-temperature polycrystalline-silicon (LTPS) thin-film transistor (TFT) in pixel circuits is crucial to achieve steady and uniform luminance in organic light-emitting diode (OLED) display panels. However, the device characteristics fluctuate over time due to various traps in the LTPS thin film transistor and at the interface with the gate insulator, resulting in abnormal phenomena such as short-time image sticking and luminance fluctuation, which degrade display quality during image change. Considering these phenomena, transient analysis was conducted through device simulation to optimize the pixel compensation circuit. In particular, we analyzed the behavior of traps within LTPS TFT in correlation with compensation circuit operation, and based on this, proposed a methodology for designing a reset voltage scheme for the driver TFT to reduce the image sticking phenomenon.
{"title":"Reduction of short-time image sticking in organic light-emitting diode display through transient analysis of low-temperature polycrystalline silicon thin-film transistor","authors":"Jiwook Hong , Jaewon Lim , Jongwook Jeon","doi":"10.1016/j.displa.2024.102794","DOIUrl":"10.1016/j.displa.2024.102794","url":null,"abstract":"<div><p>Accurate compensation operation of low-temperature polycrystalline-silicon (LTPS) thin-film transistor (TFT) in pixel circuits is crucial to achieve steady and uniform luminance in organic light-emitting diode (OLED) display panels. However, the device characteristics fluctuate over time due to various traps in the LTPS thin film transistor and at the interface with the gate insulator, resulting in abnormal phenomena such as short-time image sticking and luminance fluctuation, which degrade display quality during image change. Considering these phenomena, transient analysis was conducted through device simulation to optimize the pixel compensation circuit. In particular, we analyzed the behavior of traps within LTPS TFT in correlation with compensation circuit operation, and based on this, proposed a methodology for designing a reset voltage scheme for the driver TFT to reduce the image sticking phenomenon.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102794"},"PeriodicalIF":3.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001586/pdfft?md5=af589a6e358a315d9e0495f42299ea93&pid=1-s2.0-S0141938224001586-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141697954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1016/j.displa.2024.102779
Zhi Gong , Lijuan Duan , Fengjin Xiao , Yuxi Wang
Recently, remote sensing images have been widely used in many scenarios, gradually becoming the focus of social attention. Nevertheless, the limited annotation of scarce classes severely reduces segmentation performance. This phenomenon is more prominent in remote sensing image segmentation. Given this, we focus on image fusion and model feedback, proposing a multi-strategy method called MSAug to address the remote sensing imbalance problem. Firstly, we crop rare class images multiple times based on prior knowledge at the image patch level to provide more balanced samples. Secondly, we design an adaptive image enhancement module at the model feedback level to accurately classify rare classes at each stage and dynamically paste and mask different classes to further improve the model’s recognition capabilities. The MSAug method is highly flexible and can be plug-and-play. Experimental results on remote sensing image segmentation datasets show that adding MSAug to any remote sensing image semantic segmentation network can bring varying degrees of performance improvement.
{"title":"MSAug: Multi-Strategy Augmentation for rare classes in semantic segmentation of remote sensing images","authors":"Zhi Gong , Lijuan Duan , Fengjin Xiao , Yuxi Wang","doi":"10.1016/j.displa.2024.102779","DOIUrl":"https://doi.org/10.1016/j.displa.2024.102779","url":null,"abstract":"<div><p>Recently, remote sensing images have been widely used in many scenarios, gradually becoming the focus of social attention. Nevertheless, the limited annotation of scarce classes severely reduces segmentation performance. This phenomenon is more prominent in remote sensing image segmentation. Given this, we focus on image fusion and model feedback, proposing a multi-strategy method called MSAug to address the remote sensing imbalance problem. Firstly, we crop rare class images multiple times based on prior knowledge at the image patch level to provide more balanced samples. Secondly, we design an adaptive image enhancement module at the model feedback level to accurately classify rare classes at each stage and dynamically paste and mask different classes to further improve the model’s recognition capabilities. The MSAug method is highly flexible and can be plug-and-play. Experimental results on remote sensing image segmentation datasets show that adding MSAug to any remote sensing image semantic segmentation network can bring varying degrees of performance improvement.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102779"},"PeriodicalIF":3.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-04DOI: 10.1016/j.displa.2024.102792
Shuaibo Cheng, Xiaopeng Li, Zhaoyuan Zeng, Jia Yan
No-reference video quality assessment (NR-VQA) for user-generated content (UGC) plays a crucial role in ensuring the quality of video services. Although some works have achieved impressive results, their performance-complexity trade-off is still sub-optimal. On the one hand, overly complex network structures and additional inputs require more computing resources. On the other hand, the simple sampling methods have tended to overlook the temporal characteristics of the videos, resulting in the degradation of local textures and potential distortion of the thematic content, consequently leading to the performance decline of the VQA technologies. Therefore, in this paper, we propose an enhanced NR-VQA model, known as the Adaptive Sampling Strategy for Video Quality Assessment (ADS-VQA). Temporally, we conduct non-uniform sampling on videos utilizing features from the lateral geniculate nucleus (LGN) to capture the temporal characteristics of videos. Spatially, a dual-branch structure is designed to supplement spatial features across different levels. The one branch samples patches at their raw resolution, effectively preserving the local texture detail. The other branch performs a downsampling process guided by saliency cues, attaining global semantic features with a diminished computational expense. Experimental results demonstrate that the proposed approach achieves high performance at a lower computational cost than most state-of-the-art VQA models on four popular VQA databases.
{"title":"ADS-VQA: Adaptive sampling model for video quality assessment","authors":"Shuaibo Cheng, Xiaopeng Li, Zhaoyuan Zeng, Jia Yan","doi":"10.1016/j.displa.2024.102792","DOIUrl":"10.1016/j.displa.2024.102792","url":null,"abstract":"<div><p>No-reference video quality assessment (NR-VQA) for user-generated content (UGC) plays a crucial role in ensuring the quality of video services. Although some works have achieved impressive results, their performance-complexity trade-off is still sub-optimal. On the one hand, overly complex network structures and additional inputs require more computing resources. On the other hand, the simple sampling methods have tended to overlook the temporal characteristics of the videos, resulting in the degradation of local textures and potential distortion of the thematic content, consequently leading to the performance decline of the VQA technologies. Therefore, in this paper, we propose an enhanced NR-VQA model, known as the Adaptive Sampling Strategy for Video Quality Assessment (ADS-VQA). Temporally, we conduct non-uniform sampling on videos utilizing features from the lateral geniculate nucleus (LGN) to capture the temporal characteristics of videos. Spatially, a dual-branch structure is designed to supplement spatial features across different levels. The one branch samples patches at their raw resolution, effectively preserving the local texture detail. The other branch performs a downsampling process guided by saliency cues, attaining global semantic features with a diminished computational expense. Experimental results demonstrate that the proposed approach achieves high performance at a lower computational cost than most state-of-the-art VQA models on four popular VQA databases.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102792"},"PeriodicalIF":3.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-04DOI: 10.1016/j.displa.2024.102793
Zelu Qi, Da Pan, Tianyi Niu, Zefeng Ying, Ping Shi
The success of deep learning in the field of computer vision makes cartoon character detection (CCD) based on target detection expected to become an effective means of protecting intellectual property rights. However, due to the lack of suitable cartoon character datasets, CCD is still a less explored field, and there are still many problems that need to be solved to meet the needs of practical applications such as merchandise, advertising, and patent review. In this paper, we propose a new challenging CCD benchmark dataset, called CCDaS, which consists of 140,339 images of 524 famous cartoon characters from 227 cartoon works, game works, and merchandise innovations. As far as we know, CCDaS is currently the largest dataset of CCD in practical application scenarios. To further study CCD, we also provide a CCD algorithm that can achieve accurate detection of multi-scale objects and facially similar objects in practical application scenarios, called multi-path YOLO (MP-YOLO). Experimental results show that our MP-YOLO achieves better detection results on the CCDaS dataset. Comparative and ablation studies further validate the effectiveness of our CCD dataset and algorithm.
{"title":"Bridge the gap between practical application scenarios and cartoon character detection: A benchmark dataset and deep learning model","authors":"Zelu Qi, Da Pan, Tianyi Niu, Zefeng Ying, Ping Shi","doi":"10.1016/j.displa.2024.102793","DOIUrl":"https://doi.org/10.1016/j.displa.2024.102793","url":null,"abstract":"<div><p>The success of deep learning in the field of computer vision makes cartoon character detection (CCD) based on target detection expected to become an effective means of protecting intellectual property rights. However, due to the lack of suitable cartoon character datasets, CCD is still a less explored field, and there are still many problems that need to be solved to meet the needs of practical applications such as merchandise, advertising, and patent review. In this paper, we propose a new challenging CCD benchmark dataset, called CCDaS, which consists of 140,339 images of 524 famous cartoon characters from 227 cartoon works, game works, and merchandise innovations. As far as we know, CCDaS is currently the largest dataset of CCD in practical application scenarios. To further study CCD, we also provide a CCD algorithm that can achieve accurate detection of multi-scale objects and facially similar objects in practical application scenarios, called multi-path YOLO (MP-YOLO). Experimental results show that our MP-YOLO achieves better detection results on the CCDaS dataset. Comparative and ablation studies further validate the effectiveness of our CCD dataset and algorithm.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102793"},"PeriodicalIF":3.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141596623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Compared with traditional imaging, high dynamic range (HDR) imaging technology can record scene information more accurately, thereby providing users higher quality of visual experience. Inverse tone mapping is a direct and effective way to realize single-image HDR reconstruction, but it usually suffers from some problems such as detail loss, color deviation and artifacts. To solve the problems, this paper proposes a multi-stage coarse-to-fine progressive enhancement network (named MSPENet) for single-image HDR reconstruction. The entire multi-stage network architecture is designed in a progressive manner to obtain higher-quality HDR images from coarse-to-fine, where a mask mechanism is used to eliminate the effects of over-exposure regions. Specifically, in the first two stages, two asymmetric U-Nets are constructed to learn the multi-scale information of input image and perform coarse reconstruction. In the third stage, a residual network with channel attention mechanism is constructed to learn the fusion of progressively transferred multi-level features and perform fine reconstruction. In addition, a multi-stage progressive detail enhancement mechanism is designed, including progressive gated recurrent unit fusion mechanism and multi-stage feature transfer mechanism. The former fuses the progressively transferred features with coarse HDR features to reduce the error stacking effect caused by multi-stage networks. Meanwhile, the latter fuses early features to supplement the lost information during each stage of feature delivery and combines features from different stages. Extensive experimental results show that the proposed method can reconstruct higher quality HDR images and effectively recover texture and color information in over-exposure regions compared to the state-of-the-art methods.
{"title":"Multi-stage coarse-to-fine progressive enhancement network for single-image HDR reconstruction","authors":"Wei Zhang , Gangyi Jiang , Yeyao Chen , Haiyong Xu , Hao Jiang , Mei Yu","doi":"10.1016/j.displa.2024.102791","DOIUrl":"https://doi.org/10.1016/j.displa.2024.102791","url":null,"abstract":"<div><p>Compared with traditional imaging, high dynamic range (HDR) imaging technology can record scene information more accurately, thereby providing users higher quality of visual experience. Inverse tone mapping is a direct and effective way to realize single-image HDR reconstruction, but it usually suffers from some problems such as detail loss, color deviation and artifacts. To solve the problems, this paper proposes a multi-stage coarse-to-fine progressive enhancement network (named MSPENet) for single-image HDR reconstruction. The entire multi-stage network architecture is designed in a progressive manner to obtain higher-quality HDR images from coarse-to-fine, where a mask mechanism is used to eliminate the effects of over-exposure regions. Specifically, in the first two stages, two asymmetric U-Nets are constructed to learn the multi-scale information of input image and perform coarse reconstruction. In the third stage, a residual network with channel attention mechanism is constructed to learn the fusion of progressively transferred multi-level features and perform fine reconstruction. In addition, a multi-stage progressive detail enhancement mechanism is designed, including progressive gated recurrent unit fusion mechanism and multi-stage feature transfer mechanism. The former fuses the progressively transferred features with coarse HDR features to reduce the error stacking effect caused by multi-stage networks. Meanwhile, the latter fuses early features to supplement the lost information during each stage of feature delivery and combines features from different stages. Extensive experimental results show that the proposed method can reconstruct higher quality HDR images and effectively recover texture and color information in over-exposure regions compared to the state-of-the-art methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102791"},"PeriodicalIF":3.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141596622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.displa.2024.102790
Yao Wang, Yang Lu, Cheng-Yi Shen, Shi-Jian Luo, Long-Yu Zhang
Digital shopping applications and platforms offer consumers a numerous array of products with diverse styles and style attributes. Existing literature suggests that style preferences are determined by consumers’ genders, ages, education levels, and nationalities. In this study, we argue the feasibility and necessity of self-monitoring as an additional consumer variable impacting product style perception and preference through the utilization of eye-tracking technology. Three eye-movement experiments were conducted on forty-two participants (twenty males and twenty-two females; Age: M 22.8, SD 1.63). The results showed participants with higher levels of self-monitoring exhibited shorter total fixation durations and fewer fixation counts while examining images of watch product styles. In addition, gender exerted an interaction effect on self-monitoring’s impact, with female participants of high self-monitoring ability able to perceive differences in product styles more rapidly and with greater sensitivity. Overall, the results highlight the utility of self-monitoring as a research variable in product style perception investigations, as well as its implication for style intelligence classifiers, and style neuroimaging.
{"title":"Exploring product style perception: A comparative eye-tracking analysis of users across varying levels of self-monitoring","authors":"Yao Wang, Yang Lu, Cheng-Yi Shen, Shi-Jian Luo, Long-Yu Zhang","doi":"10.1016/j.displa.2024.102790","DOIUrl":"https://doi.org/10.1016/j.displa.2024.102790","url":null,"abstract":"<div><p>Digital shopping applications and platforms offer consumers a numerous array of products with diverse styles and style attributes. Existing literature suggests that style preferences are determined by consumers’ genders, ages, education levels, and nationalities. In this study, we argue the feasibility and necessity of self-monitoring as an additional consumer variable impacting product style perception and preference through the utilization of eye-tracking technology. Three eye-movement experiments were conducted on forty-two participants (twenty males and twenty-two females; Age: M <span><math><mo>=</mo></math></span> 22.8, SD <span><math><mo>=</mo></math></span> 1.63). The results showed participants with higher levels of self-monitoring exhibited shorter total fixation durations and fewer fixation counts while examining images of watch product styles. In addition, gender exerted an interaction effect on self-monitoring’s impact, with female participants of high self-monitoring ability able to perceive differences in product styles more rapidly and with greater sensitivity. Overall, the results highlight the utility of self-monitoring as a research variable in product style perception investigations, as well as its implication for style intelligence classifiers, and style neuroimaging.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102790"},"PeriodicalIF":3.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}