Pub Date : 2024-10-30DOI: 10.1016/j.displa.2024.102862
Fangyuan Zhang , Rukai Wei , Yanzhao Xie , Yangtao Wang , Xin Tan , Lizhuang Ma , Maobin Tang , Lisheng Fan
Prompt learning based on large models shows great potential to reduce training time and resource costs, which has been progressively applied to visual tasks such as image recognition. Nevertheless, the existing prompt learning schemes suffer from either inadequate prompt information from a single modality or insufficient prompt interaction between multiple modalities, resulting in low efficiency and performance. To address these limitations, we propose a Cross-Coupled Prompt Learning (CCPL) architecture, which is designed with two novel components (i.e., Cross-Coupled Prompt Generator (CCPG) module and Cross-Modal Fusion (CMF) module) to achieve efficient interaction between visual and textual prompts. Specifically, the CCPG module incorporates a cross-attention mechanism to automatically generate visual and textual prompts, each of which will be adaptively updated using the self-attention mechanism in their respective image and text encoders. Furthermore, the CMF module implements a deep fusion to reinforce the cross-modal feature interaction from the output layer with the Image–Text Matching (ITM) loss function. We conduct extensive experiments on 8 image datasets. The experimental results verify that our proposed CCPL outperforms the SOTA methods on few-shot image recognition tasks. The source code of this project is released at: https://github.com/elegantTechie/CCPL.
基于大型模型的提示学习在减少训练时间和资源成本方面显示出巨大潜力,并已逐步应用于图像识别等视觉任务中。然而,现有的提示学习方案要么来自单一模态的提示信息不足,要么多种模态之间的提示交互不足,导致效率和性能低下。为了解决这些局限性,我们提出了交叉耦合提示学习(CCPL)架构,该架构设计了两个新颖的组件(即交叉耦合提示生成器(CCPG)模块和交叉模态融合(CMF)模块),以实现视觉提示和文本提示之间的高效交互。具体来说,CCPG 模块采用交叉注意机制自动生成视觉和文本提示,每个提示都将利用各自图像和文本编码器中的自注意机制进行自适应更新。此外,CMF 模块还实现了深度融合,利用图像-文本匹配(ITM)损失函数加强输出层的跨模态特征交互。我们在 8 个图像数据集上进行了广泛的实验。实验结果验证了我们提出的 CCPL 在少量图像识别任务上优于 SOTA 方法。该项目的源代码发布于:https://github.com/elegantTechie/CCPL。
{"title":"Cross-coupled prompt learning for few-shot image recognition","authors":"Fangyuan Zhang , Rukai Wei , Yanzhao Xie , Yangtao Wang , Xin Tan , Lizhuang Ma , Maobin Tang , Lisheng Fan","doi":"10.1016/j.displa.2024.102862","DOIUrl":"10.1016/j.displa.2024.102862","url":null,"abstract":"<div><div>Prompt learning based on large models shows great potential to reduce training time and resource costs, which has been progressively applied to visual tasks such as image recognition. Nevertheless, the existing prompt learning schemes suffer from either inadequate prompt information from a single modality or insufficient prompt interaction between multiple modalities, resulting in low efficiency and performance. To address these limitations, we propose a <u>C</u>ross-<u>C</u>oupled <u>P</u>rompt <u>L</u>earning (CCPL) architecture, which is designed with two novel components (i.e., Cross-Coupled Prompt Generator (CCPG) module and Cross-Modal Fusion (CMF) module) to achieve efficient interaction between visual and textual prompts. Specifically, the CCPG module incorporates a cross-attention mechanism to automatically generate visual and textual prompts, each of which will be adaptively updated using the self-attention mechanism in their respective image and text encoders. Furthermore, the CMF module implements a deep fusion to reinforce the cross-modal feature interaction from the output layer with the Image–Text Matching (ITM) loss function. We conduct extensive experiments on 8 image datasets. The experimental results verify that our proposed CCPL outperforms the SOTA methods on few-shot image recognition tasks. The source code of this project is released at: <span><span>https://github.com/elegantTechie/CCPL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102862"},"PeriodicalIF":3.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arbitrary style transfer (AST) is a distinctive technique for transferring artistic style into content images, with the goal of generating stylized images that approximates real artistic paintings. Thus, it is natural to develop a quantitative evaluation metric to act like an artist for accurately assessing the quality of AST images. Inspired by this, we present an artist-like network (AL-Net) which can analyze the quality of the stylized images like an artist from the fine knowledge of artistic painting (e.g., aesthetics, structure, color, texture). Specifically, the AL-Net consists of three sub-networks: an aesthetic prediction network (AP-Net), a content preservation prediction network (CPP-Net), and a style resemblance prediction network (SRP-Net), which can be regarded as specialized feature extractors, leveraging professional artistic painting knowledge through pre-training by different labels. To more effectively predict the final overall quality, we apply transfer learning to integrate the pre-trained feature vectors representing different painting elements into overall vision quality regression. The loss determined by the overall vision label fine-tunes the parameters of AL-Net, and thus our model can establish a tight connection with human perception. Extensive experiments on the AST-IQAD dataset validate that the proposed method achieves the state-of-the-art performance.
{"title":"Assessing arbitrary style transfer like an artist","authors":"Hangwei Chen, Feng Shao, Baoyang Mu, Qiuping Jiang","doi":"10.1016/j.displa.2024.102859","DOIUrl":"10.1016/j.displa.2024.102859","url":null,"abstract":"<div><div>Arbitrary style transfer (AST) is a distinctive technique for transferring artistic style into content images, with the goal of generating stylized images that approximates real artistic paintings. Thus, it is natural to develop a quantitative evaluation metric to act like an artist for accurately assessing the quality of AST images. Inspired by this, we present an artist-like network (AL-Net) which can analyze the quality of the stylized images like an artist from the fine knowledge of artistic painting (e.g., aesthetics, structure, color, texture). Specifically, the AL-Net consists of three sub-networks: an aesthetic prediction network (AP-Net), a content preservation prediction network (CPP-Net), and a style resemblance prediction network (SRP-Net), which can be regarded as specialized feature extractors, leveraging professional artistic painting knowledge through pre-training by different labels. To more effectively predict the final overall quality, we apply transfer learning to integrate the pre-trained feature vectors representing different painting elements into overall vision quality regression. The loss determined by the overall vision label fine-tunes the parameters of AL-Net, and thus our model can establish a tight connection with human perception. Extensive experiments on the AST-IQAD dataset validate that the proposed method achieves the state-of-the-art performance.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102859"},"PeriodicalIF":3.7,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1016/j.displa.2024.102868
Brandy Murovec , Julia Spaniol , Behrang Keshavarz
Research on the illusion of self-motion (vection) has primarily focused on younger adults, with few studies including older adults. In light of documented age differences in bottom-up and top-down perception and attention, the current study examined the impact of stimulus properties (speed), cognitive factors (expectancy), and a combination of both (stimulus realism) on vection in younger (18–35 years) and older (65+ years) adults. Participants were led to believe through manipulation of the study instructions that they were either likely or unlikely to experience vection before they were exposed to a rotating visual stimulus aimed to induce circular vection. Realism was manipulated by disrupting the global consistency of the visual stimulus comprised of an intact 360° panoramic photograph, resulting in two images (intact, scrambled). The speed of the stimulus was varied (faster, slower). Vection was measured using self-ratings of onset latency, duration, and intensity. Results showed that intact images produced more vection than scrambled images, especially at faster speeds. In contrast, expectation did not significantly impact vection. Overall, these patterns were similar across both age groups, although younger adults reported earlier vection onsets than older adults at faster speeds. These findings suggest that vection results from an interplay of stimulus-driven and cognitive factors in both younger and older adults.
{"title":"The role of image realism and expectation in illusory self-motion (vection) perception in younger and older adults","authors":"Brandy Murovec , Julia Spaniol , Behrang Keshavarz","doi":"10.1016/j.displa.2024.102868","DOIUrl":"10.1016/j.displa.2024.102868","url":null,"abstract":"<div><div>Research on the illusion of self-motion (vection) has primarily focused on younger adults, with few studies including older adults. In light of documented age differences in bottom-up and top-down perception and attention, the current study examined the impact of stimulus properties (speed), cognitive factors (expectancy), and a combination of both (stimulus realism) on vection in younger (18–35 years) and older (65+ years) adults. Participants were led to believe through manipulation of the study instructions that they were either likely or unlikely to experience vection before they were exposed to a rotating visual stimulus aimed to induce circular vection. Realism was manipulated by disrupting the global consistency of the visual stimulus comprised of an intact 360° panoramic photograph, resulting in two images (intact, scrambled). The speed of the stimulus was varied (faster, slower). Vection was measured using self-ratings of onset latency, duration, and intensity. Results showed that intact images produced more vection than scrambled images, especially at faster speeds. In contrast, expectation did not significantly impact vection. Overall, these patterns were similar across both age groups, although younger adults reported earlier vection onsets than older adults at faster speeds. These findings suggest that vection results from an interplay of stimulus-driven and cognitive factors in both younger and older adults.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102868"},"PeriodicalIF":3.7,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1016/j.displa.2024.102866
Mengfan Lv, Xiwu Shang, Jiajia Wang, Guoping Li, Guozhong Wang
The rapid growth of video data poses a serious challenge to the limited bandwidth. Video coding pre-processing technology can remove coding noise without changing the architecture of the codec. Therefore, it can improve the coding efficiency while ensuring a high degree of compatibility with existing codec. However, the existing pre-processing methods have the problem of feature redundancy, and lack an effective mechanism to recover high-frequency details. In view of these problems, we propose a Degradation Compensation and Multi-dimensional Reconstruction (DCMR) pre-processing method for video coding to improve compression efficiency. Firstly, we develop a degradation compensation model, which aims at filtering the coding noise in the original video and relieving the frame quality degradation caused by transmission. Secondly, we construct a lightweight multi-dimensional feature reconstruction network, which combines residual learning and feature distillation. It aims to enhance and refine the key features related to coding from both spatial and channel dimensions while suppressing irrelevant features. In addition, we design a weighted guided image filter detail enhancement convolution module, which is specifically used to recover the high-frequency details lost in the denoising process. Finally, we introduce an adaptive discrete cosine transform loss to balance coding efficiency and quality. Experimental results demonstrate that compared with the original codec H.266/VVC, the proposed DCMR can achieve BD-rate (VMAF) and BD-rate (MOS) gains by 21.62% and 12.99% respectively on VVC, UVG, and MCL-JCV datasets.
{"title":"DCMR: Degradation compensation and multi-dimensional reconstruction based pre-processing for video coding","authors":"Mengfan Lv, Xiwu Shang, Jiajia Wang, Guoping Li, Guozhong Wang","doi":"10.1016/j.displa.2024.102866","DOIUrl":"10.1016/j.displa.2024.102866","url":null,"abstract":"<div><div>The rapid growth of video data poses a serious challenge to the limited bandwidth. Video coding pre-processing technology can remove coding noise without changing the architecture of the codec. Therefore, it can improve the coding efficiency while ensuring a high degree of compatibility with existing codec. However, the existing pre-processing methods have the problem of feature redundancy, and lack an effective mechanism to recover high-frequency details. In view of these problems, we propose a Degradation Compensation and Multi-dimensional Reconstruction (DCMR) pre-processing method for video coding to improve compression efficiency. Firstly, we develop a degradation compensation model, which aims at filtering the coding noise in the original video and relieving the frame quality degradation caused by transmission. Secondly, we construct a lightweight multi-dimensional feature reconstruction network, which combines residual learning and feature distillation. It aims to enhance and refine the key features related to coding from both spatial and channel dimensions while suppressing irrelevant features. In addition, we design a weighted guided image filter detail enhancement convolution module, which is specifically used to recover the high-frequency details lost in the denoising process. Finally, we introduce an adaptive discrete cosine transform loss to balance coding efficiency and quality. Experimental results demonstrate that compared with the original codec H.266/VVC, the proposed DCMR can achieve BD-rate (VMAF) and BD-rate (MOS) gains by 21.62% and 12.99% respectively on VVC, UVG, and MCL-JCV datasets.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102866"},"PeriodicalIF":3.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1016/j.displa.2024.102863
Jiale Chen, Qiusheng Lian, Baoshun Shi
Low-light image enhancement poses significant challenges due to its ill-posed nature. Recently, deep learning-based methods have attempted to establish a unified mapping relationship between normal-light images and their low-light versions but frequently struggle to capture the intricate variations in brightness conditions. As a result, these methods often suffer from overexposure, underexposure, amplified noise, and distorted colors. To tackle these issues, we propose a brightness-guided normalizing flow framework, dubbed BGFlow, for low-light image enhancement. Specifically, we recognize that low-frequency sub-bands in the wavelet domain carry significant brightness information. To effectively capture the intricate variations in brightness within an image, we design a transformer-based multi-scale wavelet-domain encoder to extract brightness information from the multi-scale features of the low-frequency sub-bands. The extracted brightness feature maps, at different scales, are then injected into the brightness-guided affine coupling layer to guide the training of the conditional normalizing flow module. Extensive experimental evaluations demonstrate the superiority of BGFlow over existing deep learning-based approaches in both qualitative and quantitative assessments. Moreover, we also showcase the exceptional performance of BGFlow on the underwater image enhancement task.
{"title":"BGFlow: Brightness-guided normalizing flow for low-light image enhancement","authors":"Jiale Chen, Qiusheng Lian, Baoshun Shi","doi":"10.1016/j.displa.2024.102863","DOIUrl":"10.1016/j.displa.2024.102863","url":null,"abstract":"<div><div>Low-light image enhancement poses significant challenges due to its ill-posed nature. Recently, deep learning-based methods have attempted to establish a unified mapping relationship between normal-light images and their low-light versions but frequently struggle to capture the intricate variations in brightness conditions. As a result, these methods often suffer from overexposure, underexposure, amplified noise, and distorted colors. To tackle these issues, we propose a brightness-guided normalizing flow framework, dubbed BGFlow, for low-light image enhancement. Specifically, we recognize that low-frequency sub-bands in the wavelet domain carry significant brightness information. To effectively capture the intricate variations in brightness within an image, we design a transformer-based multi-scale wavelet-domain encoder to extract brightness information from the multi-scale features of the low-frequency sub-bands. The extracted brightness feature maps, at different scales, are then injected into the brightness-guided affine coupling layer to guide the training of the conditional normalizing flow module. Extensive experimental evaluations demonstrate the superiority of BGFlow over existing deep learning-based approaches in both qualitative and quantitative assessments. Moreover, we also showcase the exceptional performance of BGFlow on the underwater image enhancement task.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102863"},"PeriodicalIF":3.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.displa.2024.102861
Sanghyeon Kim, Uijong Ju
People watching video displays for long durations experience visual fatigue and other symptoms associated with visual discomfort. Fatigue-reduction techniques are often applied but may potentially degrade the immersive experience. To appropriately adjust fatigue-reduction techniques, the changes in visual fatigue over time should be analyzed which is crucial for the appropriate adjustment of fatigue-reduction techniques. However, conventional methods used for assessing visual fatigue are inadequate because they rely entirely on post-task surveys, which cannot easily determine dynamic changes. This study employed a dynamic assessment method for evaluating visual fatigue in real-time. Using a joystick, participants continuously evaluated subjective fatigue whenever they perceived changes. A Simulator Sickness Questionnaire (SSQ) validated the results, which indicated significant correlations between dynamic assessments and the SSQ across five items associated with symptoms associated with visual discomfort. Furthermore, we explored the potential relationship between dynamic visual fatigue and objective video features, e.g., optical flow and the V-values of the hue/saturation value (HSV) color space, which represent the motion and brightness of the video. The results revealed that dynamic visual fatigue significantly correlated with both the optical flow and the V-value. Moreover, based on machine learning models, we determined that the changes in visual fatigue can be predicted based on the optical flow and V-value. Overall, the results validate that dynamic assessment methods can form a reliable baseline for real-time prediction of visual fatigue.
长时间观看视频显示屏的人会产生视觉疲劳和其他与视觉不适相关的症状。减少疲劳的技术经常被使用,但可能会降低身临其境的体验。为了适当调整疲劳消除技术,应分析视觉疲劳随时间的变化,这对适当调整疲劳消除技术至关重要。然而,用于评估视觉疲劳的传统方法并不完善,因为这些方法完全依赖于任务后调查,无法轻松确定动态变化。本研究采用了一种动态评估方法来实时评估视觉疲劳。参与者使用操纵杆,在感知到变化时持续评估主观疲劳度。模拟器晕机问卷(SSQ)验证了结果,结果表明动态评估与 SSQ 在与视觉不适症状相关的五个项目上存在显著相关性。此外,我们还探讨了动态视觉疲劳与客观视频特征之间的潜在关系,例如光流和色调/饱和度值(HSV)色彩空间的 V 值,它们代表了视频的运动和亮度。结果表明,动态视觉疲劳与光流和 V 值都有明显的相关性。此外,基于机器学习模型,我们确定视觉疲劳的变化可以根据光流和 V 值进行预测。总之,研究结果验证了动态评估方法可以为实时预测视觉疲劳提供可靠的基准。
{"title":"Dynamic assessment of visual fatigue during video watching: Validation of dynamic rating based on post-task ratings and video features","authors":"Sanghyeon Kim, Uijong Ju","doi":"10.1016/j.displa.2024.102861","DOIUrl":"10.1016/j.displa.2024.102861","url":null,"abstract":"<div><div>People watching video displays for long durations experience visual fatigue and other symptoms associated with visual discomfort. Fatigue-reduction techniques are often applied but may potentially degrade the immersive experience. To appropriately adjust fatigue-reduction techniques, the changes in visual fatigue over time should be analyzed which is crucial for the appropriate adjustment of fatigue-reduction techniques. However, conventional methods used for assessing visual fatigue are inadequate because they rely entirely on post-task surveys, which cannot easily determine dynamic changes. This study employed a dynamic assessment method for evaluating visual fatigue in real-time. Using a joystick, participants continuously evaluated subjective fatigue whenever they perceived changes. A Simulator Sickness Questionnaire (SSQ) validated the results, which indicated significant correlations between dynamic assessments and the SSQ across five items associated with symptoms associated with visual discomfort. Furthermore, we explored the potential relationship between dynamic visual fatigue and objective video features, e.g., optical flow and the V-values of the hue/saturation value (HSV) color space, which represent the motion and brightness of the video. The results revealed that dynamic visual fatigue significantly correlated with both the optical flow and the V-value. Moreover, based on machine learning models, we determined that the changes in visual fatigue can be predicted based on the optical flow and V-value. Overall, the results validate that dynamic assessment methods can form a reliable baseline for real-time prediction of visual fatigue.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102861"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142527918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.displa.2024.102860
Ho Sub Lee , Sung In Cho
Image deblurring, which eliminates blurring artifacts to recover details from a given input image, represents an important task for the computer vision field. Recently, the attention mechanism with deep neural networks (DNN) demonstrates promising performance of image deblurring. However, they have difficulty learning complex blurry and sharp relationships through a balance of spatial detail and high-level contextualized information. Moreover, most existing attention-based DNN methods fail to selectively exploit the information from attention and non-attention branches. To address these challenges, we propose a new approach called Multi-Scale Attention in Attention (MSAiA) for image deblurring. MSAiA incorporates dynamic weight generation by leveraging the joint dependencies of channel and spatial information, allowing for adaptive changes to the weight values in attention and non-attention branches. In contrast to existing attention mechanisms that primarily consider channel or spatial dependencies and do not adequately utilize the information from attention and non-attention branches, our proposed AiA design combines channel-spatial attention. This attention mechanism effectively utilizes the dependencies between channel-spatial information to allocate weight values for attention and non-attention branches, enabling the full utilization of information from both branches. Consequently, the attention branch can more effectively incorporate useful information, while the non-attention branch avoids less useful information. Additionally, we employ a novel multi-scale neural network that aims to learn the relationships between blurring artifacts and the original sharp image by further exploiting multi-scale information. The experimental results prove that the proposed MSAiA achieves superior deblurring performance compared with the state-of-the-art methods.
{"title":"Multi-scale attention in attention neural network for single image deblurring","authors":"Ho Sub Lee , Sung In Cho","doi":"10.1016/j.displa.2024.102860","DOIUrl":"10.1016/j.displa.2024.102860","url":null,"abstract":"<div><div>Image deblurring, which eliminates blurring artifacts to recover details from a given input image, represents an important task for the computer vision field. Recently, the attention mechanism with deep neural networks (DNN) demonstrates promising performance of image deblurring. However, they have difficulty learning complex blurry and sharp relationships through a balance of spatial detail and high-level contextualized information. Moreover, most existing attention-based DNN methods fail to selectively exploit the information from attention and non-attention branches. To address these challenges, we propose a new approach called Multi-Scale Attention in Attention (MSAiA) for image deblurring. MSAiA incorporates dynamic weight generation by leveraging the joint dependencies of channel and spatial information, allowing for adaptive changes to the weight values in attention and non-attention branches. In contrast to existing attention mechanisms that primarily consider channel or spatial dependencies and do not adequately utilize the information from attention and non-attention branches, our proposed AiA design combines channel-spatial attention. This attention mechanism effectively utilizes the dependencies between channel-spatial information to allocate weight values for attention and non-attention branches, enabling the full utilization of information from both branches. Consequently, the attention branch can more effectively incorporate useful information, while the non-attention branch avoids less useful information. Additionally, we employ a novel multi-scale neural network that aims to learn the relationships between blurring artifacts and the original sharp image by further exploiting multi-scale information. The experimental results prove that the proposed MSAiA achieves superior deblurring performance compared with the state-of-the-art methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102860"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142527921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.displa.2024.102857
Yuan Zhang , Zixi Wang , Xiaodi Guan , Lijun He , Fan Li
In the emerging Internet of Things (IoT) paradigm, mobile cloud inference serves as an efficient application framework that relieves the computation and storage burden on resource-constrained mobile devices by offloading the workload to cloud servers. However, mobile cloud inference encounters computation, communication, and privacy challenges to ensure efficient system inference and protect the privacy of mobile users’ collected information. To address the deployment of deep neural networks (DNN) with large capacity, we propose splitting computing (SC) where the entire model is divided into two parts, to be executed on mobile and cloud ends respectively. However, the transmission of intermediate data poses a bottleneck to system performance. This paper initially demonstrates the privacy issue arising from the machine analysis-oriented intermediate feature. We conduct a preliminary experiment to intuitively reveal the latent potential for enhancing the privacy-preserving ability of the initial feature. Motivated by this, we propose a framework for privacy-preserving intermediate feature compression, which addresses the limitations in both compression and privacy that arise in the original extracted feature data. Specifically, we propose a method that jointly enhances privacy and encoding efficiency, achieved through the collaboration of the encoding feature privacy enhancement module and the privacy feature ordering enhancement module. Additionally, we develop a gradient-reversal optimization strategy based on information theory to ensure the utmost concealment of core privacy information throughout the entire codec process. We evaluate the proposed method on two DNN models using two datasets, demonstrating its ability to achieve superior analysis accuracy and higher privacy preservation than HEVC. Furthermore, we provide an application case of a wireless sensor network to validate the effectiveness of the proposed method in a real-world scenario.
{"title":"Private compression for intermediate feature in IoT-supported mobile cloud inference","authors":"Yuan Zhang , Zixi Wang , Xiaodi Guan , Lijun He , Fan Li","doi":"10.1016/j.displa.2024.102857","DOIUrl":"10.1016/j.displa.2024.102857","url":null,"abstract":"<div><div>In the emerging Internet of Things (IoT) paradigm, mobile cloud inference serves as an efficient application framework that relieves the computation and storage burden on resource-constrained mobile devices by offloading the workload to cloud servers. However, mobile cloud inference encounters computation, communication, and privacy challenges to ensure efficient system inference and protect the privacy of mobile users’ collected information. To address the deployment of deep neural networks (DNN) with large capacity, we propose splitting computing (SC) where the entire model is divided into two parts, to be executed on mobile and cloud ends respectively. However, the transmission of intermediate data poses a bottleneck to system performance. This paper initially demonstrates the privacy issue arising from the machine analysis-oriented intermediate feature. We conduct a preliminary experiment to intuitively reveal the latent potential for enhancing the privacy-preserving ability of the initial feature. Motivated by this, we propose a framework for privacy-preserving intermediate feature compression, which addresses the limitations in both compression and privacy that arise in the original extracted feature data. Specifically, we propose a method that jointly enhances privacy and encoding efficiency, achieved through the collaboration of the encoding feature privacy enhancement module and the privacy feature ordering enhancement module. Additionally, we develop a gradient-reversal optimization strategy based on information theory to ensure the utmost concealment of core privacy information throughout the entire codec process. We evaluate the proposed method on two DNN models using two datasets, demonstrating its ability to achieve superior analysis accuracy and higher privacy preservation than HEVC. Furthermore, we provide an application case of a wireless sensor network to validate the effectiveness of the proposed method in a real-world scenario.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102857"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142527919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.displa.2024.102864
Linlin Wang, Yixuan Zou, Haiyan Wang, Chengqi Xue
Human-computer cooperation guided by natural interaction, intelligent interaction, and human–computer integration is gradually becoming a new trend in human–computer interfaces. An icon is an indispensable pictographic symbol in an interface that can convey pivotal semantics between humans and computers. Research on similar icons’ cognition in humans and the discrimination of computers can reduce misunderstandings and facilitate transparent cooperation. Therefore, this research focuses on images of icons, extracted contours, and four features, including the curvature, proportion, orientation, and line of the contour, step by step. By manipulating the feature value change to obtain 360 similar icons, a cognitive experiment was conducted with 25 participants to explore the boundary values of the feature dimensions that cause different levels of similarity. Its boundary values were applied to deep learning to train a discrimination algorithm model that included 1500 similar icons. This dataset was used to train a Siamese neural network using a 16-layer network branch of a visual geometry group. The training process used stochastic gradient descent. This method of combining human cognition and deep learning technology is meaningful for establishing a consensus on icon semantics, including content and emotions, by outputting similarity levels and values. Taking icon similarity discrimination as an example, this study explored the analysis and simulation methods of computer vision for human visual cognition. The accuracy evaluated is 90.82%. The precision was evaluated as 90% for high, 80.65% for medium, and 97.30% for low. Recall was evaluated as 100% for high, 89.29% for medium, and 83.72% for low. It has been verified that it can compensate for fuzzy cognition in humans and enable computers to cooperate efficiently.
{"title":"Icon similarity model based on cognition and deep learning","authors":"Linlin Wang, Yixuan Zou, Haiyan Wang, Chengqi Xue","doi":"10.1016/j.displa.2024.102864","DOIUrl":"10.1016/j.displa.2024.102864","url":null,"abstract":"<div><div>Human-computer cooperation guided by natural interaction, intelligent interaction, and human–computer integration is gradually becoming a new trend in human–computer interfaces. An icon is an indispensable pictographic symbol in an interface that can convey pivotal semantics between humans and computers. Research on similar icons’ cognition in humans and the discrimination of computers can reduce misunderstandings and facilitate transparent cooperation. Therefore, this research focuses on images of icons, extracted contours, and four features, including the curvature, proportion, orientation, and line of the contour, step by step. By manipulating the feature value change to obtain 360 similar icons, a cognitive experiment was conducted with 25 participants to explore the boundary values of the feature dimensions that cause different levels of similarity. Its boundary values were applied to deep learning to train a discrimination algorithm model that included 1500 similar icons. This dataset was used to train a Siamese neural network using a 16-layer network branch of a visual geometry group. The training process used stochastic gradient descent. This method of combining human cognition and deep learning technology is meaningful for establishing a consensus on icon semantics, including content and emotions, by outputting similarity levels and values. Taking icon similarity discrimination as an example, this study explored the analysis and simulation methods of computer vision for human visual cognition. The accuracy evaluated is 90.82%. The precision was evaluated as 90% for high, 80.65% for medium, and 97.30% for low. Recall was evaluated as 100% for high, 89.29% for medium, and 83.72% for low. It has been verified that it can compensate for fuzzy cognition in humans and enable computers to cooperate efficiently.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102864"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-19DOI: 10.1016/j.displa.2024.102867
Wuchao Li , Tongyin Yang , Pinhao Li , Xinfeng Liu , Shasha Zhang , Jianguo Zhu , Yuanyuan Pei , Yan Zhang , Tijiang Zhang , Rongpin Wang
Background
Non-metastatic clear cell renal cell carcinoma (nccRCC) poses a significant risk of postoperative recurrence and metastasis, underscoring the importance of accurate preoperative risk assessment. While the Leibovich score is effective, it relies on postoperative histopathological data. This study aims to evaluate the efficacy of CT radiomics and deep learning models in predicting Leibovich score risk groups in nccRCC, and to explore the interrelationship between CT and pathological features.
Patients and Methods
This research analyzed 600 nccRCC patients from four datasets, dividing them into low (Leibovich scores of 0–2) and intermediate to high risk (Leibovich scores exceeding 3) groups. Radiological model was developed from CT subjective features, and radiomics and deep learning models were constructed from CT images. Additionally, a deep radiomics model using radiomics and deep learning features was developed, alongside a fusion model incorporating all feature types. Model performance was assessed by AUC values, while survival differences across predicted groups were analyzed using survival curves and the log-rank test. Moreover, the research investigated the interrelationship between CT and pathological features derived from whole-slide pathological images.
Results
Within the training dataset, four radiological, three radiomics, and thirteen deep learning features were selected to develop models predicting nccRCC Leibovich score risk groups. The deep radiomics model demonstrated superior predictive accuracy, evidenced by AUC values of 0.881, 0.829, and 0.819 in external validation datasets. Notably, significant differences in overall survival were observed among patients classified by this model (log-rank test p < 0.05 across all datasets). Furthermore, a correlation and complementarity were observed between CT deep radiomics features and pathological deep learning features.
Conclusions
The CT deep radiomics model precisely predicts nccRCC Leibovich score risk groups preoperatively and highlights the synergistic effect between CT and pathological data.
{"title":"Multicenter evaluation of CT deep radiomics model in predicting Leibovich score risk groups for non-metastatic clear cell renal cell carcinoma","authors":"Wuchao Li , Tongyin Yang , Pinhao Li , Xinfeng Liu , Shasha Zhang , Jianguo Zhu , Yuanyuan Pei , Yan Zhang , Tijiang Zhang , Rongpin Wang","doi":"10.1016/j.displa.2024.102867","DOIUrl":"10.1016/j.displa.2024.102867","url":null,"abstract":"<div><h3>Background</h3><div>Non-metastatic clear cell renal cell carcinoma (nccRCC) poses a significant risk of postoperative recurrence and metastasis, underscoring the importance of accurate preoperative risk assessment. While the Leibovich score is effective, it relies on postoperative histopathological data. This study aims to evaluate the efficacy of CT radiomics and deep learning models in predicting Leibovich score risk groups in nccRCC, and to explore the interrelationship between CT and pathological features.</div></div><div><h3>Patients and Methods</h3><div>This research analyzed 600 nccRCC patients from four datasets, dividing them into low (Leibovich scores of 0–2) and intermediate to high risk (Leibovich scores exceeding 3) groups. Radiological model was developed from CT subjective features, and radiomics and deep learning models were constructed from CT images. Additionally, a deep radiomics model using radiomics and deep learning features was developed, alongside a fusion model incorporating all feature types. Model performance was assessed by AUC values, while survival differences across predicted groups were analyzed using survival curves and the log-rank test. Moreover, the research investigated the interrelationship between CT and pathological features derived from whole-slide pathological images.</div></div><div><h3>Results</h3><div>Within the training dataset, four radiological, three radiomics, and thirteen deep learning features were selected to develop models predicting nccRCC Leibovich score risk groups. The deep radiomics model demonstrated superior predictive accuracy, evidenced by AUC values of 0.881, 0.829, and 0.819 in external validation datasets. Notably, significant differences in overall survival were observed among patients classified by this model (log-rank test p < 0.05 across all datasets). Furthermore, a correlation and complementarity were observed between CT deep radiomics features and pathological deep learning features.</div></div><div><h3>Conclusions</h3><div>The CT deep radiomics model precisely predicts nccRCC Leibovich score risk groups preoperatively and highlights the synergistic effect between CT and pathological data.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102867"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}