Displays最新文献_第4页

Cross-coupled prompt learning for few-shot image recognition 交叉耦合提示学习用于少镜头图像识别

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-30 DOI: 10.1016/j.displa.2024.102862

Fangyuan Zhang , Rukai Wei , Yanzhao Xie , Yangtao Wang , Xin Tan , Lizhuang Ma , Maobin Tang , Lisheng Fan

Prompt learning based on large models shows great potential to reduce training time and resource costs, which has been progressively applied to visual tasks such as image recognition. Nevertheless, the existing prompt learning schemes suffer from either inadequate prompt information from a single modality or insufficient prompt interaction between multiple modalities, resulting in low efficiency and performance. To address these limitations, we propose a Cross-Coupled Prompt Learning (CCPL) architecture, which is designed with two novel components (i.e., Cross-Coupled Prompt Generator (CCPG) module and Cross-Modal Fusion (CMF) module) to achieve efficient interaction between visual and textual prompts. Specifically, the CCPG module incorporates a cross-attention mechanism to automatically generate visual and textual prompts, each of which will be adaptively updated using the self-attention mechanism in their respective image and text encoders. Furthermore, the CMF module implements a deep fusion to reinforce the cross-modal feature interaction from the output layer with the Image–Text Matching (ITM) loss function. We conduct extensive experiments on 8 image datasets. The experimental results verify that our proposed CCPL outperforms the SOTA methods on few-shot image recognition tasks. The source code of this project is released at: https://github.com/elegantTechie/CCPL.

基于大型模型的提示学习在减少训练时间和资源成本方面显示出巨大潜力，并已逐步应用于图像识别等视觉任务中。然而，现有的提示学习方案要么来自单一模态的提示信息不足，要么多种模态之间的提示交互不足，导致效率和性能低下。为了解决这些局限性，我们提出了交叉耦合提示学习（CCPL）架构，该架构设计了两个新颖的组件（即交叉耦合提示生成器（CCPG）模块和交叉模态融合（CMF）模块），以实现视觉提示和文本提示之间的高效交互。具体来说，CCPG 模块采用交叉注意机制自动生成视觉和文本提示，每个提示都将利用各自图像和文本编码器中的自注意机制进行自适应更新。此外，CMF 模块还实现了深度融合，利用图像-文本匹配（ITM）损失函数加强输出层的跨模态特征交互。我们在 8 个图像数据集上进行了广泛的实验。实验结果验证了我们提出的 CCPL 在少量图像识别任务上优于 SOTA 方法。该项目的源代码发布于：https://github.com/elegantTechie/CCPL。

{"title":"Cross-coupled prompt learning for few-shot image recognition","authors":"Fangyuan Zhang , Rukai Wei , Yanzhao Xie , Yangtao Wang , Xin Tan , Lizhuang Ma , Maobin Tang , Lisheng Fan","doi":"10.1016/j.displa.2024.102862","DOIUrl":"10.1016/j.displa.2024.102862","url":null,"abstract":"<div><div>Prompt learning based on large models shows great potential to reduce training time and resource costs, which has been progressively applied to visual tasks such as image recognition. Nevertheless, the existing prompt learning schemes suffer from either inadequate prompt information from a single modality or insufficient prompt interaction between multiple modalities, resulting in low efficiency and performance. To address these limitations, we propose a <u>C</u>ross-<u>C</u>oupled <u>P</u>rompt <u>L</u>earning (CCPL) architecture, which is designed with two novel components (i.e., Cross-Coupled Prompt Generator (CCPG) module and Cross-Modal Fusion (CMF) module) to achieve efficient interaction between visual and textual prompts. Specifically, the CCPG module incorporates a cross-attention mechanism to automatically generate visual and textual prompts, each of which will be adaptively updated using the self-attention mechanism in their respective image and text encoders. Furthermore, the CMF module implements a deep fusion to reinforce the cross-modal feature interaction from the output layer with the Image–Text Matching (ITM) loss function. We conduct extensive experiments on 8 image datasets. The experimental results verify that our proposed CCPL outperforms the SOTA methods on few-shot image recognition tasks. The source code of this project is released at: <span><span>https://github.com/elegantTechie/CCPL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102862"},"PeriodicalIF":3.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing arbitrary style transfer like an artist 像艺术家一样评估任意的风格转换

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-28 DOI: 10.1016/j.displa.2024.102859

Hangwei Chen, Feng Shao, Baoyang Mu, Qiuping Jiang

Arbitrary style transfer (AST) is a distinctive technique for transferring artistic style into content images, with the goal of generating stylized images that approximates real artistic paintings. Thus, it is natural to develop a quantitative evaluation metric to act like an artist for accurately assessing the quality of AST images. Inspired by this, we present an artist-like network (AL-Net) which can analyze the quality of the stylized images like an artist from the fine knowledge of artistic painting (e.g., aesthetics, structure, color, texture). Specifically, the AL-Net consists of three sub-networks: an aesthetic prediction network (AP-Net), a content preservation prediction network (CPP-Net), and a style resemblance prediction network (SRP-Net), which can be regarded as specialized feature extractors, leveraging professional artistic painting knowledge through pre-training by different labels. To more effectively predict the final overall quality, we apply transfer learning to integrate the pre-trained feature vectors representing different painting elements into overall vision quality regression. The loss determined by the overall vision label fine-tunes the parameters of AL-Net, and thus our model can establish a tight connection with human perception. Extensive experiments on the AST-IQAD dataset validate that the proposed method achieves the state-of-the-art performance.

任意风格转移（AST）是一种将艺术风格转移到内容图像中的独特技术，其目标是生成接近真实艺术绘画的风格化图像。因此，很自然地，我们需要开发一种量化评估指标，像艺术家一样准确评估 AST 图像的质量。受此启发，我们提出了一种类似艺术家的网络（AL-Net），它可以像艺术家一样，从艺术绘画的细微知识（如美学、结构、色彩、纹理）出发，分析风格化图像的质量。具体来说，AL-Net 由三个子网络组成：美学预测网络（AP-Net）、内容保存预测网络（CPP-Net）和风格相似性预测网络（SRP-Net），它们可以被视为专门的特征提取器，通过不同标签的预训练利用专业的艺术绘画知识。为了更有效地预测最终的整体质量，我们应用迁移学习将代表不同绘画元素的预训练特征向量整合到整体视觉质量回归中。由整体视觉标签决定的损失会微调 AL-Net 的参数，因此我们的模型可以与人类感知建立紧密联系。在 AST-IQAD 数据集上进行的大量实验验证了所提出的方法达到了最先进的性能。

{"title":"Assessing arbitrary style transfer like an artist","authors":"Hangwei Chen, Feng Shao, Baoyang Mu, Qiuping Jiang","doi":"10.1016/j.displa.2024.102859","DOIUrl":"10.1016/j.displa.2024.102859","url":null,"abstract":"<div><div>Arbitrary style transfer (AST) is a distinctive technique for transferring artistic style into content images, with the goal of generating stylized images that approximates real artistic paintings. Thus, it is natural to develop a quantitative evaluation metric to act like an artist for accurately assessing the quality of AST images. Inspired by this, we present an artist-like network (AL-Net) which can analyze the quality of the stylized images like an artist from the fine knowledge of artistic painting (e.g., aesthetics, structure, color, texture). Specifically, the AL-Net consists of three sub-networks: an aesthetic prediction network (AP-Net), a content preservation prediction network (CPP-Net), and a style resemblance prediction network (SRP-Net), which can be regarded as specialized feature extractors, leveraging professional artistic painting knowledge through pre-training by different labels. To more effectively predict the final overall quality, we apply transfer learning to integrate the pre-trained feature vectors representing different painting elements into overall vision quality regression. The loss determined by the overall vision label fine-tunes the parameters of AL-Net, and thus our model can establish a tight connection with human perception. Extensive experiments on the AST-IQAD dataset validate that the proposed method achieves the state-of-the-art performance.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102859"},"PeriodicalIF":3.7,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The role of image realism and expectation in illusory self-motion (vection) perception in younger and older adults 图像逼真度和期望值在年轻人和老年人的虚幻自我运动（向量）感知中的作用

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-28 DOI: 10.1016/j.displa.2024.102868

Brandy Murovec , Julia Spaniol , Behrang Keshavarz

Research on the illusion of self-motion (vection) has primarily focused on younger adults, with few studies including older adults. In light of documented age differences in bottom-up and top-down perception and attention, the current study examined the impact of stimulus properties (speed), cognitive factors (expectancy), and a combination of both (stimulus realism) on vection in younger (18–35 years) and older (65+ years) adults. Participants were led to believe through manipulation of the study instructions that they were either likely or unlikely to experience vection before they were exposed to a rotating visual stimulus aimed to induce circular vection. Realism was manipulated by disrupting the global consistency of the visual stimulus comprised of an intact 360° panoramic photograph, resulting in two images (intact, scrambled). The speed of the stimulus was varied (faster, slower). Vection was measured using self-ratings of onset latency, duration, and intensity. Results showed that intact images produced more vection than scrambled images, especially at faster speeds. In contrast, expectation did not significantly impact vection. Overall, these patterns were similar across both age groups, although younger adults reported earlier vection onsets than older adults at faster speeds. These findings suggest that vection results from an interplay of stimulus-driven and cognitive factors in both younger and older adults.

关于自我运动错觉（vection）的研究主要集中在年轻人身上，很少有研究涉及老年人。鉴于在自下而上和自上而下的感知和注意力方面存在年龄差异，本研究考察了刺激属性（速度）、认知因素（期望值）以及二者的结合（刺激真实性）对年轻（18-35 岁）和年长（65 岁以上）成年人的自我运动错觉的影响。通过操纵研究说明，让参与者相信，在他们接触旨在诱发环状牵引的旋转视觉刺激之前，他们有可能或不可能经历牵引。通过破坏视觉刺激物（由一张完整的 360° 全景照片组成）的整体一致性来操纵逼真度，从而产生两幅图像（完整的、乱码的）。刺激的速度是变化的（较快、较慢）。通过对起始潜伏期、持续时间和强度进行自我评分来测量视觉效果。结果显示，完整图像比乱码图像产生更多的牵引力，尤其是在速度较快的情况下。相比之下，期望值对牵引力的影响不大。总体而言，这些模式在两个年龄组中都相似，但在速度较快时，年轻成人比老年人更早出现牵引。这些研究结果表明，在年轻人和老年人中，静脉阻断是刺激驱动因素和认知因素相互作用的结果。

{"title":"The role of image realism and expectation in illusory self-motion (vection) perception in younger and older adults","authors":"Brandy Murovec , Julia Spaniol , Behrang Keshavarz","doi":"10.1016/j.displa.2024.102868","DOIUrl":"10.1016/j.displa.2024.102868","url":null,"abstract":"<div><div>Research on the illusion of self-motion (vection) has primarily focused on younger adults, with few studies including older adults. In light of documented age differences in bottom-up and top-down perception and attention, the current study examined the impact of stimulus properties (speed), cognitive factors (expectancy), and a combination of both (stimulus realism) on vection in younger (18–35 years) and older (65+ years) adults. Participants were led to believe through manipulation of the study instructions that they were either likely or unlikely to experience vection before they were exposed to a rotating visual stimulus aimed to induce circular vection. Realism was manipulated by disrupting the global consistency of the visual stimulus comprised of an intact 360° panoramic photograph, resulting in two images (intact, scrambled). The speed of the stimulus was varied (faster, slower). Vection was measured using self-ratings of onset latency, duration, and intensity. Results showed that intact images produced more vection than scrambled images, especially at faster speeds. In contrast, expectation did not significantly impact vection. Overall, these patterns were similar across both age groups, although younger adults reported earlier vection onsets than older adults at faster speeds. These findings suggest that vection results from an interplay of stimulus-driven and cognitive factors in both younger and older adults.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102868"},"PeriodicalIF":3.7,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142579051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DCMR: Degradation compensation and multi-dimensional reconstruction based pre-processing for video coding DCMR：基于降级补偿和多维重建的视频编码预处理

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-24 DOI: 10.1016/j.displa.2024.102866

Mengfan Lv, Xiwu Shang, Jiajia Wang, Guoping Li, Guozhong Wang

The rapid growth of video data poses a serious challenge to the limited bandwidth. Video coding pre-processing technology can remove coding noise without changing the architecture of the codec. Therefore, it can improve the coding efficiency while ensuring a high degree of compatibility with existing codec. However, the existing pre-processing methods have the problem of feature redundancy, and lack an effective mechanism to recover high-frequency details. In view of these problems, we propose a Degradation Compensation and Multi-dimensional Reconstruction (DCMR) pre-processing method for video coding to improve compression efficiency. Firstly, we develop a degradation compensation model, which aims at filtering the coding noise in the original video and relieving the frame quality degradation caused by transmission. Secondly, we construct a lightweight multi-dimensional feature reconstruction network, which combines residual learning and feature distillation. It aims to enhance and refine the key features related to coding from both spatial and channel dimensions while suppressing irrelevant features. In addition, we design a weighted guided image filter detail enhancement convolution module, which is specifically used to recover the high-frequency details lost in the denoising process. Finally, we introduce an adaptive discrete cosine transform loss to balance coding efficiency and quality. Experimental results demonstrate that compared with the original codec H.266/VVC, the proposed DCMR can achieve BD-rate (VMAF) and BD-rate (MOS) gains by 21.62% and 12.99% respectively on VVC, UVG, and MCL-JCV datasets.

视频数据的快速增长对有限的带宽提出了严峻的挑战。视频编码预处理技术可以在不改变编解码器结构的情况下消除编码噪声。因此，它既能提高编码效率，又能确保与现有编解码器的高度兼容性。然而，现有的预处理方法存在特征冗余的问题，缺乏恢复高频细节的有效机制。针对这些问题，我们提出了退化补偿和多维重建（DCMR）视频编码预处理方法，以提高压缩效率。首先，我们建立了降级补偿模型，旨在过滤原始视频中的编码噪声，缓解传输过程中造成的帧质量下降。其次，我们构建了一个轻量级多维特征重构网络，将残差学习和特征提炼相结合。它旨在从空间和信道两个维度增强和提炼与编码相关的关键特征，同时抑制无关特征。此外，我们还设计了一个加权引导图像滤波器细节增强卷积模块，专门用于恢复在去噪过程中丢失的高频细节。最后，我们引入了自适应离散余弦变换损耗，以平衡编码效率和质量。实验结果表明，与原始编解码器 H.266/VVC 相比，所提出的 DCMR 在 VVC、UVG 和 MCL-JCV 数据集上的 BD 速率（VMAF）和 BD 速率（MOS）分别提高了 21.62% 和 12.99%。

{"title":"DCMR: Degradation compensation and multi-dimensional reconstruction based pre-processing for video coding","authors":"Mengfan Lv, Xiwu Shang, Jiajia Wang, Guoping Li, Guozhong Wang","doi":"10.1016/j.displa.2024.102866","DOIUrl":"10.1016/j.displa.2024.102866","url":null,"abstract":"<div><div>The rapid growth of video data poses a serious challenge to the limited bandwidth. Video coding pre-processing technology can remove coding noise without changing the architecture of the codec. Therefore, it can improve the coding efficiency while ensuring a high degree of compatibility with existing codec. However, the existing pre-processing methods have the problem of feature redundancy, and lack an effective mechanism to recover high-frequency details. In view of these problems, we propose a Degradation Compensation and Multi-dimensional Reconstruction (DCMR) pre-processing method for video coding to improve compression efficiency. Firstly, we develop a degradation compensation model, which aims at filtering the coding noise in the original video and relieving the frame quality degradation caused by transmission. Secondly, we construct a lightweight multi-dimensional feature reconstruction network, which combines residual learning and feature distillation. It aims to enhance and refine the key features related to coding from both spatial and channel dimensions while suppressing irrelevant features. In addition, we design a weighted guided image filter detail enhancement convolution module, which is specifically used to recover the high-frequency details lost in the denoising process. Finally, we introduce an adaptive discrete cosine transform loss to balance coding efficiency and quality. Experimental results demonstrate that compared with the original codec H.266/VVC, the proposed DCMR can achieve BD-rate (VMAF) and BD-rate (MOS) gains by 21.62% and 12.99% respectively on VVC, UVG, and MCL-JCV datasets.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102866"},"PeriodicalIF":3.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BGFlow: Brightness-guided normalizing flow for low-light image enhancement BGFlow：用于弱光图像增强的亮度引导归一化流程

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-23 DOI: 10.1016/j.displa.2024.102863

Jiale Chen, Qiusheng Lian, Baoshun Shi

Low-light image enhancement poses significant challenges due to its ill-posed nature. Recently, deep learning-based methods have attempted to establish a unified mapping relationship between normal-light images and their low-light versions but frequently struggle to capture the intricate variations in brightness conditions. As a result, these methods often suffer from overexposure, underexposure, amplified noise, and distorted colors. To tackle these issues, we propose a brightness-guided normalizing flow framework, dubbed BGFlow, for low-light image enhancement. Specifically, we recognize that low-frequency sub-bands in the wavelet domain carry significant brightness information. To effectively capture the intricate variations in brightness within an image, we design a transformer-based multi-scale wavelet-domain encoder to extract brightness information from the multi-scale features of the low-frequency sub-bands. The extracted brightness feature maps, at different scales, are then injected into the brightness-guided affine coupling layer to guide the training of the conditional normalizing flow module. Extensive experimental evaluations demonstrate the superiority of BGFlow over existing deep learning-based approaches in both qualitative and quantitative assessments. Moreover, we also showcase the exceptional performance of BGFlow on the underwater image enhancement task.

低照度图像增强因其不确定性而面临巨大挑战。最近，基于深度学习的方法试图在正常光照下的图像和低光照下的图像之间建立统一的映射关系，但往往难以捕捉亮度条件的复杂变化。因此，这些方法经常会出现曝光过度、曝光不足、噪声放大和色彩失真等问题。为了解决这些问题，我们提出了一种用于低亮度图像增强的亮度引导归一化流程框架，称为 BGFlow。具体来说，我们认识到小波域中的低频子带具有重要的亮度信息。为了有效捕捉图像中错综复杂的亮度变化，我们设计了一种基于变换器的多尺度小波域编码器，从低频子带的多尺度特征中提取亮度信息。然后将提取的不同尺度亮度特征图注入亮度引导的仿射耦合层，以指导条件归一化流模块的训练。广泛的实验评估表明，BGFlow 在定性和定量评估方面都优于现有的基于深度学习的方法。此外，我们还展示了 BGFlow 在水下图像增强任务中的卓越表现。

{"title":"BGFlow: Brightness-guided normalizing flow for low-light image enhancement","authors":"Jiale Chen, Qiusheng Lian, Baoshun Shi","doi":"10.1016/j.displa.2024.102863","DOIUrl":"10.1016/j.displa.2024.102863","url":null,"abstract":"<div><div>Low-light image enhancement poses significant challenges due to its ill-posed nature. Recently, deep learning-based methods have attempted to establish a unified mapping relationship between normal-light images and their low-light versions but frequently struggle to capture the intricate variations in brightness conditions. As a result, these methods often suffer from overexposure, underexposure, amplified noise, and distorted colors. To tackle these issues, we propose a brightness-guided normalizing flow framework, dubbed BGFlow, for low-light image enhancement. Specifically, we recognize that low-frequency sub-bands in the wavelet domain carry significant brightness information. To effectively capture the intricate variations in brightness within an image, we design a transformer-based multi-scale wavelet-domain encoder to extract brightness information from the multi-scale features of the low-frequency sub-bands. The extracted brightness feature maps, at different scales, are then injected into the brightness-guided affine coupling layer to guide the training of the conditional normalizing flow module. Extensive experimental evaluations demonstrate the superiority of BGFlow over existing deep learning-based approaches in both qualitative and quantitative assessments. Moreover, we also showcase the exceptional performance of BGFlow on the underwater image enhancement task.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102863"},"PeriodicalIF":3.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic assessment of visual fatigue during video watching: Validation of dynamic rating based on post-task ratings and video features 视频观看过程中视觉疲劳的动态评估：基于任务后评级和视频特征的动态评级验证

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-19 DOI: 10.1016/j.displa.2024.102861

Sanghyeon Kim, Uijong Ju

People watching video displays for long durations experience visual fatigue and other symptoms associated with visual discomfort. Fatigue-reduction techniques are often applied but may potentially degrade the immersive experience. To appropriately adjust fatigue-reduction techniques, the changes in visual fatigue over time should be analyzed which is crucial for the appropriate adjustment of fatigue-reduction techniques. However, conventional methods used for assessing visual fatigue are inadequate because they rely entirely on post-task surveys, which cannot easily determine dynamic changes. This study employed a dynamic assessment method for evaluating visual fatigue in real-time. Using a joystick, participants continuously evaluated subjective fatigue whenever they perceived changes. A Simulator Sickness Questionnaire (SSQ) validated the results, which indicated significant correlations between dynamic assessments and the SSQ across five items associated with symptoms associated with visual discomfort. Furthermore, we explored the potential relationship between dynamic visual fatigue and objective video features, e.g., optical flow and the V-values of the hue/saturation value (HSV) color space, which represent the motion and brightness of the video. The results revealed that dynamic visual fatigue significantly correlated with both the optical flow and the V-value. Moreover, based on machine learning models, we determined that the changes in visual fatigue can be predicted based on the optical flow and V-value. Overall, the results validate that dynamic assessment methods can form a reliable baseline for real-time prediction of visual fatigue.

长时间观看视频显示屏的人会产生视觉疲劳和其他与视觉不适相关的症状。减少疲劳的技术经常被使用，但可能会降低身临其境的体验。为了适当调整疲劳消除技术，应分析视觉疲劳随时间的变化，这对适当调整疲劳消除技术至关重要。然而，用于评估视觉疲劳的传统方法并不完善，因为这些方法完全依赖于任务后调查，无法轻松确定动态变化。本研究采用了一种动态评估方法来实时评估视觉疲劳。参与者使用操纵杆，在感知到变化时持续评估主观疲劳度。模拟器晕机问卷（SSQ）验证了结果，结果表明动态评估与 SSQ 在与视觉不适症状相关的五个项目上存在显著相关性。此外，我们还探讨了动态视觉疲劳与客观视频特征之间的潜在关系，例如光流和色调/饱和度值（HSV）色彩空间的 V 值，它们代表了视频的运动和亮度。结果表明，动态视觉疲劳与光流和 V 值都有明显的相关性。此外，基于机器学习模型，我们确定视觉疲劳的变化可以根据光流和 V 值进行预测。总之，研究结果验证了动态评估方法可以为实时预测视觉疲劳提供可靠的基准。

{"title":"Dynamic assessment of visual fatigue during video watching: Validation of dynamic rating based on post-task ratings and video features","authors":"Sanghyeon Kim, Uijong Ju","doi":"10.1016/j.displa.2024.102861","DOIUrl":"10.1016/j.displa.2024.102861","url":null,"abstract":"<div><div>People watching video displays for long durations experience visual fatigue and other symptoms associated with visual discomfort. Fatigue-reduction techniques are often applied but may potentially degrade the immersive experience. To appropriately adjust fatigue-reduction techniques, the changes in visual fatigue over time should be analyzed which is crucial for the appropriate adjustment of fatigue-reduction techniques. However, conventional methods used for assessing visual fatigue are inadequate because they rely entirely on post-task surveys, which cannot easily determine dynamic changes. This study employed a dynamic assessment method for evaluating visual fatigue in real-time. Using a joystick, participants continuously evaluated subjective fatigue whenever they perceived changes. A Simulator Sickness Questionnaire (SSQ) validated the results, which indicated significant correlations between dynamic assessments and the SSQ across five items associated with symptoms associated with visual discomfort. Furthermore, we explored the potential relationship between dynamic visual fatigue and objective video features, e.g., optical flow and the V-values of the hue/saturation value (HSV) color space, which represent the motion and brightness of the video. The results revealed that dynamic visual fatigue significantly correlated with both the optical flow and the V-value. Moreover, based on machine learning models, we determined that the changes in visual fatigue can be predicted based on the optical flow and V-value. Overall, the results validate that dynamic assessment methods can form a reliable baseline for real-time prediction of visual fatigue.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102861"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142527918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-scale attention in attention neural network for single image deblurring 用于单幅图像去模糊的注意力神经网络中的多尺度注意力

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-19 DOI: 10.1016/j.displa.2024.102860

Ho Sub Lee , Sung In Cho

Image deblurring, which eliminates blurring artifacts to recover details from a given input image, represents an important task for the computer vision field. Recently, the attention mechanism with deep neural networks (DNN) demonstrates promising performance of image deblurring. However, they have difficulty learning complex blurry and sharp relationships through a balance of spatial detail and high-level contextualized information. Moreover, most existing attention-based DNN methods fail to selectively exploit the information from attention and non-attention branches. To address these challenges, we propose a new approach called Multi-Scale Attention in Attention (MSAiA) for image deblurring. MSAiA incorporates dynamic weight generation by leveraging the joint dependencies of channel and spatial information, allowing for adaptive changes to the weight values in attention and non-attention branches. In contrast to existing attention mechanisms that primarily consider channel or spatial dependencies and do not adequately utilize the information from attention and non-attention branches, our proposed AiA design combines channel-spatial attention. This attention mechanism effectively utilizes the dependencies between channel-spatial information to allocate weight values for attention and non-attention branches, enabling the full utilization of information from both branches. Consequently, the attention branch can more effectively incorporate useful information, while the non-attention branch avoids less useful information. Additionally, we employ a novel multi-scale neural network that aims to learn the relationships between blurring artifacts and the original sharp image by further exploiting multi-scale information. The experimental results prove that the proposed MSAiA achieves superior deblurring performance compared with the state-of-the-art methods.

图像去模糊是计算机视觉领域的一项重要任务，它可以消除模糊伪影，恢复给定输入图像的细节。最近，深度神经网络（DNN）的注意力机制在图像去模糊方面表现出了良好的性能。然而，它们很难通过平衡空间细节和高级上下文信息来学习复杂的模糊和清晰关系。此外，大多数现有的基于注意力的 DNN 方法无法选择性地利用注意力和非注意力分支的信息。为了应对这些挑战，我们提出了一种用于图像去模糊的新方法，称为 "注意力中的多尺度注意力"（MSAiA）。MSAiA 利用信道和空间信息的共同依赖性，结合动态权重生成，允许自适应地改变注意力和非注意力分支的权重值。现有的注意力机制主要考虑信道或空间依赖性，没有充分利用注意力和非注意力分支的信息，与之相比，我们提出的 AiA 设计结合了信道和空间注意力。这种注意力机制能有效利用信道-空间信息之间的依赖关系，为注意力分支和非注意力分支分配权重值，从而充分利用两个分支的信息。因此，注意力分支能更有效地吸收有用信息，而非注意力分支则能避免有用信息。此外，我们还采用了一种新颖的多尺度神经网络，旨在通过进一步利用多尺度信息来学习模糊伪影与原始清晰图像之间的关系。实验结果证明，与最先进的方法相比，所提出的 MSAiA 实现了更优越的去模糊性能。

{"title":"Multi-scale attention in attention neural network for single image deblurring","authors":"Ho Sub Lee , Sung In Cho","doi":"10.1016/j.displa.2024.102860","DOIUrl":"10.1016/j.displa.2024.102860","url":null,"abstract":"<div><div>Image deblurring, which eliminates blurring artifacts to recover details from a given input image, represents an important task for the computer vision field. Recently, the attention mechanism with deep neural networks (DNN) demonstrates promising performance of image deblurring. However, they have difficulty learning complex blurry and sharp relationships through a balance of spatial detail and high-level contextualized information. Moreover, most existing attention-based DNN methods fail to selectively exploit the information from attention and non-attention branches. To address these challenges, we propose a new approach called Multi-Scale Attention in Attention (MSAiA) for image deblurring. MSAiA incorporates dynamic weight generation by leveraging the joint dependencies of channel and spatial information, allowing for adaptive changes to the weight values in attention and non-attention branches. In contrast to existing attention mechanisms that primarily consider channel or spatial dependencies and do not adequately utilize the information from attention and non-attention branches, our proposed AiA design combines channel-spatial attention. This attention mechanism effectively utilizes the dependencies between channel-spatial information to allocate weight values for attention and non-attention branches, enabling the full utilization of information from both branches. Consequently, the attention branch can more effectively incorporate useful information, while the non-attention branch avoids less useful information. Additionally, we employ a novel multi-scale neural network that aims to learn the relationships between blurring artifacts and the original sharp image by further exploiting multi-scale information. The experimental results prove that the proposed MSAiA achieves superior deblurring performance compared with the state-of-the-art methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102860"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142527921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Private compression for intermediate feature in IoT-supported mobile cloud inference 在物联网支持的移动云推理中对中间特征进行私有压缩

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-19 DOI: 10.1016/j.displa.2024.102857

Yuan Zhang , Zixi Wang , Xiaodi Guan , Lijun He , Fan Li

In the emerging Internet of Things (IoT) paradigm, mobile cloud inference serves as an efficient application framework that relieves the computation and storage burden on resource-constrained mobile devices by offloading the workload to cloud servers. However, mobile cloud inference encounters computation, communication, and privacy challenges to ensure efficient system inference and protect the privacy of mobile users’ collected information. To address the deployment of deep neural networks (DNN) with large capacity, we propose splitting computing (SC) where the entire model is divided into two parts, to be executed on mobile and cloud ends respectively. However, the transmission of intermediate data poses a bottleneck to system performance. This paper initially demonstrates the privacy issue arising from the machine analysis-oriented intermediate feature. We conduct a preliminary experiment to intuitively reveal the latent potential for enhancing the privacy-preserving ability of the initial feature. Motivated by this, we propose a framework for privacy-preserving intermediate feature compression, which addresses the limitations in both compression and privacy that arise in the original extracted feature data. Specifically, we propose a method that jointly enhances privacy and encoding efficiency, achieved through the collaboration of the encoding feature privacy enhancement module and the privacy feature ordering enhancement module. Additionally, we develop a gradient-reversal optimization strategy based on information theory to ensure the utmost concealment of core privacy information throughout the entire codec process. We evaluate the proposed method on two DNN models using two datasets, demonstrating its ability to achieve superior analysis accuracy and higher privacy preservation than HEVC. Furthermore, we provide an application case of a wireless sensor network to validate the effectiveness of the proposed method in a real-world scenario.

在新兴的物联网（IoT）模式中，移动云推理作为一种高效的应用框架，通过将工作负载卸载到云服务器，减轻了资源受限的移动设备的计算和存储负担。然而，移动云推理面临着计算、通信和隐私方面的挑战，既要确保高效的系统推理，又要保护移动用户所收集信息的隐私。为了解决大容量深度神经网络（DNN）的部署问题，我们提出了拆分计算（SC），即将整个模型分为两部分，分别在移动端和云端执行。然而，中间数据的传输对系统性能构成了瓶颈。本文初步展示了面向机器分析的中间特征所带来的隐私问题。我们进行了初步实验，直观地揭示了增强初始特征隐私保护能力的潜在可能性。受此启发，我们提出了一个保护隐私的中间特征压缩框架，该框架解决了原始提取特征数据在压缩和隐私保护方面的局限性。具体来说，我们提出了一种方法，通过编码特征隐私增强模块和隐私特征排序增强模块的协作，共同提高隐私和编码效率。此外，我们还开发了一种基于信息论的梯度反转优化策略，以确保在整个编码过程中最大限度地隐藏核心隐私信息。我们使用两个数据集在两个 DNN 模型上对所提出的方法进行了评估，结果表明该方法能够实现比 HEVC 更高的分析精度和更高的隐私保护。此外，我们还提供了一个无线传感器网络的应用案例，以验证所提方法在真实世界场景中的有效性。

{"title":"Private compression for intermediate feature in IoT-supported mobile cloud inference","authors":"Yuan Zhang , Zixi Wang , Xiaodi Guan , Lijun He , Fan Li","doi":"10.1016/j.displa.2024.102857","DOIUrl":"10.1016/j.displa.2024.102857","url":null,"abstract":"<div><div>In the emerging Internet of Things (IoT) paradigm, mobile cloud inference serves as an efficient application framework that relieves the computation and storage burden on resource-constrained mobile devices by offloading the workload to cloud servers. However, mobile cloud inference encounters computation, communication, and privacy challenges to ensure efficient system inference and protect the privacy of mobile users’ collected information. To address the deployment of deep neural networks (DNN) with large capacity, we propose splitting computing (SC) where the entire model is divided into two parts, to be executed on mobile and cloud ends respectively. However, the transmission of intermediate data poses a bottleneck to system performance. This paper initially demonstrates the privacy issue arising from the machine analysis-oriented intermediate feature. We conduct a preliminary experiment to intuitively reveal the latent potential for enhancing the privacy-preserving ability of the initial feature. Motivated by this, we propose a framework for privacy-preserving intermediate feature compression, which addresses the limitations in both compression and privacy that arise in the original extracted feature data. Specifically, we propose a method that jointly enhances privacy and encoding efficiency, achieved through the collaboration of the encoding feature privacy enhancement module and the privacy feature ordering enhancement module. Additionally, we develop a gradient-reversal optimization strategy based on information theory to ensure the utmost concealment of core privacy information throughout the entire codec process. We evaluate the proposed method on two DNN models using two datasets, demonstrating its ability to achieve superior analysis accuracy and higher privacy preservation than HEVC. Furthermore, we provide an application case of a wireless sensor network to validate the effectiveness of the proposed method in a real-world scenario.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102857"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142527919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Icon similarity model based on cognition and deep learning 基于认知和深度学习的图标相似性模型

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-19 DOI: 10.1016/j.displa.2024.102864

Linlin Wang, Yixuan Zou, Haiyan Wang, Chengqi Xue

Human-computer cooperation guided by natural interaction, intelligent interaction, and human–computer integration is gradually becoming a new trend in human–computer interfaces. An icon is an indispensable pictographic symbol in an interface that can convey pivotal semantics between humans and computers. Research on similar icons’ cognition in humans and the discrimination of computers can reduce misunderstandings and facilitate transparent cooperation. Therefore, this research focuses on images of icons, extracted contours, and four features, including the curvature, proportion, orientation, and line of the contour, step by step. By manipulating the feature value change to obtain 360 similar icons, a cognitive experiment was conducted with 25 participants to explore the boundary values of the feature dimensions that cause different levels of similarity. Its boundary values were applied to deep learning to train a discrimination algorithm model that included 1500 similar icons. This dataset was used to train a Siamese neural network using a 16-layer network branch of a visual geometry group. The training process used stochastic gradient descent. This method of combining human cognition and deep learning technology is meaningful for establishing a consensus on icon semantics, including content and emotions, by outputting similarity levels and values. Taking icon similarity discrimination as an example, this study explored the analysis and simulation methods of computer vision for human visual cognition. The accuracy evaluated is 90.82%. The precision was evaluated as 90% for high, 80.65% for medium, and 97.30% for low. Recall was evaluated as 100% for high, 89.29% for medium, and 83.72% for low. It has been verified that it can compensate for fuzzy cognition in humans and enable computers to cooperate efficiently.

以自然交互、智能交互、人机融合为指导的人机合作逐渐成为人机界面的新趋势。图标是界面中不可或缺的图形符号，可以传达人机之间的重要语义。研究人类对类似图标的认知和计算机的辨别，可以减少误解，促进透明的合作。因此，本研究将重点放在图标图像、提取的轮廓和轮廓的曲率、比例、方向和线条等四个特征上，逐步进行研究。通过操纵特征值变化获得 360 个相似图标，并对 25 名参与者进行了认知实验，以探索导致不同相似度的特征维度的边界值。其边界值被应用于深度学习，以训练一个包含 1500 个相似图标的判别算法模型。该数据集被用于使用视觉几何组的 16 层网络分支训练连体神经网络。训练过程采用随机梯度下降法。这种将人类认知与深度学习技术相结合的方法对于通过输出相似度等级和数值，就图标语义（包括内容和情感）达成共识很有意义。本研究以图标相似性判别为例，探索了计算机视觉对人类视觉认知的分析和模拟方法。评估的准确率为 90.82%。高精确度为 90%，中精确度为 80.65%，低精确度为 97.30%。高召回率为 100%，中召回率为 89.29%，低召回率为 83.72%。经验证，它可以弥补人类的模糊认知，并使计算机能够高效合作。

{"title":"Icon similarity model based on cognition and deep learning","authors":"Linlin Wang, Yixuan Zou, Haiyan Wang, Chengqi Xue","doi":"10.1016/j.displa.2024.102864","DOIUrl":"10.1016/j.displa.2024.102864","url":null,"abstract":"<div><div>Human-computer cooperation guided by natural interaction, intelligent interaction, and human–computer integration is gradually becoming a new trend in human–computer interfaces. An icon is an indispensable pictographic symbol in an interface that can convey pivotal semantics between humans and computers. Research on similar icons’ cognition in humans and the discrimination of computers can reduce misunderstandings and facilitate transparent cooperation. Therefore, this research focuses on images of icons, extracted contours, and four features, including the curvature, proportion, orientation, and line of the contour, step by step. By manipulating the feature value change to obtain 360 similar icons, a cognitive experiment was conducted with 25 participants to explore the boundary values of the feature dimensions that cause different levels of similarity. Its boundary values were applied to deep learning to train a discrimination algorithm model that included 1500 similar icons. This dataset was used to train a Siamese neural network using a 16-layer network branch of a visual geometry group. The training process used stochastic gradient descent. This method of combining human cognition and deep learning technology is meaningful for establishing a consensus on icon semantics, including content and emotions, by outputting similarity levels and values. Taking icon similarity discrimination as an example, this study explored the analysis and simulation methods of computer vision for human visual cognition. The accuracy evaluated is 90.82%. The precision was evaluated as 90% for high, 80.65% for medium, and 97.30% for low. Recall was evaluated as 100% for high, 89.29% for medium, and 83.72% for low. It has been verified that it can compensate for fuzzy cognition in humans and enable computers to cooperate efficiently.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102864"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multicenter evaluation of CT deep radiomics model in predicting Leibovich score risk groups for non-metastatic clear cell renal cell carcinoma 多中心评估 CT 深度放射组学模型在预测非转移性透明细胞肾细胞癌的莱博维奇评分风险组别中的应用

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-10-19 DOI: 10.1016/j.displa.2024.102867

Wuchao Li , Tongyin Yang , Pinhao Li , Xinfeng Liu , Shasha Zhang , Jianguo Zhu , Yuanyuan Pei , Yan Zhang , Tijiang Zhang , Rongpin Wang

Background

Non-metastatic clear cell renal cell carcinoma (nccRCC) poses a significant risk of postoperative recurrence and metastasis, underscoring the importance of accurate preoperative risk assessment. While the Leibovich score is effective, it relies on postoperative histopathological data. This study aims to evaluate the efficacy of CT radiomics and deep learning models in predicting Leibovich score risk groups in nccRCC, and to explore the interrelationship between CT and pathological features.

Patients and Methods

This research analyzed 600 nccRCC patients from four datasets, dividing them into low (Leibovich scores of 0–2) and intermediate to high risk (Leibovich scores exceeding 3) groups. Radiological model was developed from CT subjective features, and radiomics and deep learning models were constructed from CT images. Additionally, a deep radiomics model using radiomics and deep learning features was developed, alongside a fusion model incorporating all feature types. Model performance was assessed by AUC values, while survival differences across predicted groups were analyzed using survival curves and the log-rank test. Moreover, the research investigated the interrelationship between CT and pathological features derived from whole-slide pathological images.

Results

Within the training dataset, four radiological, three radiomics, and thirteen deep learning features were selected to develop models predicting nccRCC Leibovich score risk groups. The deep radiomics model demonstrated superior predictive accuracy, evidenced by AUC values of 0.881, 0.829, and 0.819 in external validation datasets. Notably, significant differences in overall survival were observed among patients classified by this model (log-rank test p < 0.05 across all datasets). Furthermore, a correlation and complementarity were observed between CT deep radiomics features and pathological deep learning features.

Conclusions

The CT deep radiomics model precisely predicts nccRCC Leibovich score risk groups preoperatively and highlights the synergistic effect between CT and pathological data.

背景非转移性透明细胞肾细胞癌（nccRCC）术后复发和转移的风险很大，这凸显了准确术前风险评估的重要性。虽然莱博维奇评分很有效，但它依赖于术后组织病理学数据。本研究旨在评估CT放射组学和深度学习模型在预测nccRCC莱博维奇评分风险组别中的有效性，并探索CT和病理特征之间的相互关系。患者和方法本研究分析了四个数据集中的600例nccRCC患者，将其分为低风险组（莱博维奇评分为0-2分）和中高风险组（莱博维奇评分超过3分）。根据CT主观特征开发了放射学模型，根据CT图像构建了放射组学和深度学习模型。此外，还开发了一个使用放射组学和深度学习特征的深度放射组学模型，以及一个包含所有特征类型的融合模型。模型性能通过 AUC 值进行评估，而预测组间的生存差异则通过生存曲线和对数秩检验进行分析。结果在训练数据集中，选择了四个放射学特征、三个放射组学特征和十三个深度学习特征来开发预测 nccRCC 莱博维奇评分风险组别的模型。在外部验证数据集中，深度放射组学模型的AUC值分别为0.881、0.829和0.819，显示了其卓越的预测准确性。值得注意的是，通过该模型分类的患者在总生存率方面存在明显差异（所有数据集的对数秩检验 p < 0.05）。结论 CT 深度放射组学模型可精确预测 nccRCC 莱博维奇评分术前风险组别，并突出了 CT 和病理数据之间的协同效应。

{"title":"Multicenter evaluation of CT deep radiomics model in predicting Leibovich score risk groups for non-metastatic clear cell renal cell carcinoma","authors":"Wuchao Li , Tongyin Yang , Pinhao Li , Xinfeng Liu , Shasha Zhang , Jianguo Zhu , Yuanyuan Pei , Yan Zhang , Tijiang Zhang , Rongpin Wang","doi":"10.1016/j.displa.2024.102867","DOIUrl":"10.1016/j.displa.2024.102867","url":null,"abstract":"<div><h3>Background</h3><div>Non-metastatic clear cell renal cell carcinoma (nccRCC) poses a significant risk of postoperative recurrence and metastasis, underscoring the importance of accurate preoperative risk assessment. While the Leibovich score is effective, it relies on postoperative histopathological data. This study aims to evaluate the efficacy of CT radiomics and deep learning models in predicting Leibovich score risk groups in nccRCC, and to explore the interrelationship between CT and pathological features.</div></div><div><h3>Patients and Methods</h3><div>This research analyzed 600 nccRCC patients from four datasets, dividing them into low (Leibovich scores of 0–2) and intermediate to high risk (Leibovich scores exceeding 3) groups. Radiological model was developed from CT subjective features, and radiomics and deep learning models were constructed from CT images. Additionally, a deep radiomics model using radiomics and deep learning features was developed, alongside a fusion model incorporating all feature types. Model performance was assessed by AUC values, while survival differences across predicted groups were analyzed using survival curves and the log-rank test. Moreover, the research investigated the interrelationship between CT and pathological features derived from whole-slide pathological images.</div></div><div><h3>Results</h3><div>Within the training dataset, four radiological, three radiomics, and thirteen deep learning features were selected to develop models predicting nccRCC Leibovich score risk groups. The deep radiomics model demonstrated superior predictive accuracy, evidenced by AUC values of 0.881, 0.829, and 0.819 in external validation datasets. Notably, significant differences in overall survival were observed among patients classified by this model (log-rank test p < 0.05 across all datasets). Furthermore, a correlation and complementarity were observed between CT deep radiomics features and pathological deep learning features.</div></div><div><h3>Conclusions</h3><div>The CT deep radiomics model precisely predicts nccRCC Leibovich score risk groups preoperatively and highlights the synergistic effect between CT and pathological data.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102867"},"PeriodicalIF":3.7,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0