首页 > 最新文献

Displays最新文献

英文 中文
Degradation-Aware Mixture-of-Experts for Real-World Image Super-Resolution 真实世界图像超分辨率的退化感知混合专家
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2025-12-18 DOI: 10.1016/j.displa.2025.103323
Luyang Xiao , Yixiao Liu , Xiao Liu , Hong Yang , Yuanyuan Wu , Chao Ren
Recovering missing details in low-resolution (LR) images with unknown degradations is the main challenge for real-world image super-resolution (Real-ISR) task. Nevertheless, recovering all types of these unknown degradations is usually too complex by using only one specific model. In the study, we find that the degradations of different real-world images have both commonalities and specificities. Therefore, we propose a brand-new Mixture-of-Degradation-Experts (MoDE) Transformer network for dealing with the commonalities and specificities in degraded images. To process the commonalities of LR images, we set MoDE blocks with identical structure in different depth of our network. To process the specificities of LR images, there are a number of experts in every MoDE block with different parameters learned by the network adaptively. These experts excel in dealing with different types of degradations, and our network assigns the most appropriate expert for different images with specific degradations guided by our proposed degradation representation feature extraction branch. Consequently, the collaboration between different experts in different depth of our network complete the Real-ISR task with complex and diverse degradation images. Our approach shows good performance compared to current state-of-the-arts (SOTA) methods by conducting extensive experiments.
在低分辨率(LR)图像中恢复未知退化的缺失细节是现实世界图像超分辨率(Real-ISR)任务的主要挑战。然而,仅使用一个特定的模型来恢复所有类型的这些未知的退化通常过于复杂。在研究中,我们发现不同真实世界图像的退化既有共性也有特殊性。因此,我们提出了一种全新的混合退化专家(MoDE)变压器网络来处理退化图像的共性和特殊性。为了处理LR图像的共性,我们在网络的不同深度设置具有相同结构的MoDE块。为了处理LR图像的特殊性,每个MoDE块中都有许多专家,网络自适应学习不同的参数。这些专家擅长处理不同类型的退化,我们的网络根据我们提出的退化表示特征提取分支的指导,为具有特定退化的不同图像分配最合适的专家。因此,我们的网络在不同深度的不同专家之间的协作完成了具有复杂和多样化退化图像的Real-ISR任务。通过大量的实验,我们的方法与目前最先进的(SOTA)方法相比显示出良好的性能。
{"title":"Degradation-Aware Mixture-of-Experts for Real-World Image Super-Resolution","authors":"Luyang Xiao ,&nbsp;Yixiao Liu ,&nbsp;Xiao Liu ,&nbsp;Hong Yang ,&nbsp;Yuanyuan Wu ,&nbsp;Chao Ren","doi":"10.1016/j.displa.2025.103323","DOIUrl":"10.1016/j.displa.2025.103323","url":null,"abstract":"<div><div>Recovering missing details in low-resolution (LR) images with unknown degradations is the main challenge for real-world image super-resolution (Real-ISR) task. Nevertheless, recovering all types of these unknown degradations is usually too complex by using only one specific model. In the study, we find that the degradations of different real-world images have both commonalities and specificities. Therefore, we propose a brand-new Mixture-of-Degradation-Experts (MoDE) Transformer network for dealing with the commonalities and specificities in degraded images. To process the commonalities of LR images, we set MoDE blocks with identical structure in different depth of our network. To process the specificities of LR images, there are a number of experts in every MoDE block with different parameters learned by the network adaptively. These experts excel in dealing with different types of degradations, and our network assigns the most appropriate expert for different images with specific degradations guided by our proposed degradation representation feature extraction branch. Consequently, the collaboration between different experts in different depth of our network complete the Real-ISR task with complex and diverse degradation images. Our approach shows good performance compared to current state-of-the-arts (SOTA) methods by conducting extensive experiments.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103323"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning video normality for anomaly detection via multi-scale spatiotemporal feature extraction and a feature memory module 基于多尺度时空特征提取和特征记忆模块的视频常态学习异常检测
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2026-01-22 DOI: 10.1016/j.displa.2026.103355
Yongqing Huo, Wenke Jiang
Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.
视频异常检测(VAD)是监控系统中异常行为自动识别的关键,在公共安全、智能交通和医疗保健等领域都有广泛的应用。然而,随着应用领域的不断扩展,确保VAD算法在不同场景下保持优异的检测性能已成为当前研究方向的首要焦点。为了提高检测在不同环境下的鲁棒性,本文提出了一种新的基于自编码器的模型。与其他算法相比,我们的方法可以更有效地利用帧内的多尺度特征信息来学习特征分布。在编码器中,我们构建了具有多个核大小的卷积模块,并结合设计的空间通道变压器注意(SCTA)模块来增强特征表示。在解码器中,我们将多尺度特征重构模块与自监督预测卷积关注块(SSPCAB)相结合,以实现更准确的下一帧预测。此外,我们还引入了一个专用的内存模块来捕获和存储正常数据模式的分布。同时,该架构采用了convl - lstm和在跳过连接中特别设计的时空注意(Temporal-Spatial Attention, TSA)模块来捕获视频帧之间的时空依赖性。得益于这些模块的设计和集成,我们提出的方法在UCSD Ped2、CUHK Avenue和ShanghaiTech等公共数据集上取得了优异的检测性能。实验结果证明了该方法在异常检测任务中的有效性和通用性。
{"title":"Learning video normality for anomaly detection via multi-scale spatiotemporal feature extraction and a feature memory module","authors":"Yongqing Huo,&nbsp;Wenke Jiang","doi":"10.1016/j.displa.2026.103355","DOIUrl":"10.1016/j.displa.2026.103355","url":null,"abstract":"<div><div>Video anomaly detection (VAD) is critical for automated identification of anomalous behaviors in surveillance system, with applications in public safety, intelligent transportation and healthcare. However, with the continuous expansion of application domains, ensuring that VAD algorithm maintains excellent detection performance across diverse scenarios has become the primary focus of current research direction. To enhance the robustness of detection across various environments, we propose a novel autoencoder-based model in this paper. Compared with other algorithms, our method can more effectively exploit multi-scale feature information within frames for learning feature distribution. In the encoder, we construct the convolutional module with multiple kernel sizes and incorporate the designed Spatial-Channel Transformer Attention (SCTA) module to strengthen the feature representation. In the decoder, we integrate the multi-scale feature reconstruction module with Self-Supervised Predictive Convolutional Attentive Blocks (SSPCAB) for more accurate next-frame prediction. Moreover, we introduce a dedicated memory module to capture and store the distribution of normal data patterns. Meanwhile, the architecture employs the Conv-LSTM and a specially designed Temporal-Spatial Attention (TSA) module in skip connections to capture spatiotemporal dependencies across video frames. Benefiting from the design and integration of those modules, our proposed method achieves superior detection performance on public datasets, including UCSD Ped2, CUHK Avenue and ShanghaiTech. The experimental results demonstrate the effectiveness and versatility of our method in anomaly detection tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103355"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MeAP: dual level memory strategy augmented transformer based visual object predictor MeAP:基于视觉对象预测器的双级存储策略增强变压器
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2026-01-20 DOI: 10.1016/j.displa.2026.103356
Shiliang Yan, Yinling Wang, Dandan Lu, Min Wang
The exploration and resolution of persistent noise incursions within the tracking sequences, especially the occlusion, illumination variations, and fast motion, have garnered substantial attention for their functional properties in enhancing the accuracy and robustness of visual object trackers. However, existing visual object trackers, equipped with template updating mechanisms or calibration strategy, heavily rely on time-consuming historical data to achieve optimal tracking performance, impeding their real-time tracking capabilities. To address these challenges, this paper introduces a long-short term dual level memory augmented transformer structure aided visual object predictor (MeAP). The key contributions of MeAP can be summarized as follows: 1) the formulation of a noise model for specific invasion events based on incursion effects and corresponding template strategies serving as the foundation for more efficient memory utilization; 2) The memory exploration scheme based online tracking mask-based feature extraction strategy and the transformer architecture is introduced to mitigate the impact of noise invasion during memory vector construction; 3) the memory utilization scheme based target basic feature and dual feature target mask predictor is provided to implement the scene-edge feature for mask-based feature extraction method and jointly predict the accurate location of the tracking target.. Extensive experiments conducted on OTB100, NFS, VOT2021, and AVisT benchmarks demonstrate that MeAP, with its introduced modules, achieves comparable tracking performances against other state-of-the-art (SOTA) trackers, and operates at an average speed of 31 frames per second (FPS) across 4 benchmarks.
跟踪序列中持续噪声入侵的探索和解决,特别是遮挡、光照变化和快速运动,因其在提高视觉目标跟踪器的准确性和鲁棒性方面的功能特性而获得了大量关注。然而,现有的视觉目标跟踪器大多采用模板更新机制或校准策略,严重依赖耗时的历史数据来实现最佳跟踪性能,阻碍了其实时跟踪能力。为了解决这些问题,本文介绍了一种长短期双电平记忆增强变压器结构辅助视觉对象预测器(MeAP)。MeAP的主要贡献包括:1)建立了基于入侵效应的特定入侵事件的噪声模型和相应的模板策略,为更有效地利用内存奠定了基础;2)引入基于在线跟踪掩模的特征提取策略和变压器结构的记忆探索方案,减轻了记忆向量构建过程中噪声入侵的影响;3)提供基于目标基本特征和双特征目标掩码预测器的内存利用方案,实现基于掩码的特征提取方法的场景边缘特征,共同预测跟踪目标的准确位置。在OTB100、NFS、VOT2021和AVisT基准测试上进行的大量实验表明,MeAP及其引入的模块实现了与其他最先进(SOTA)跟踪器相当的跟踪性能,并且在4个基准测试中以每秒31帧(FPS)的平均速度运行。
{"title":"MeAP: dual level memory strategy augmented transformer based visual object predictor","authors":"Shiliang Yan,&nbsp;Yinling Wang,&nbsp;Dandan Lu,&nbsp;Min Wang","doi":"10.1016/j.displa.2026.103356","DOIUrl":"10.1016/j.displa.2026.103356","url":null,"abstract":"<div><div>The exploration and resolution of persistent noise incursions within the tracking sequences, especially the occlusion, illumination variations, and fast motion, have garnered substantial attention for their functional properties in enhancing the accuracy and robustness of visual object trackers. However, existing visual object trackers, equipped with template updating mechanisms or calibration strategy, heavily rely on time-consuming historical data to achieve optimal tracking performance, impeding their real-time tracking capabilities. To address these challenges, this paper introduces a long-short term dual level memory augmented transformer structure aided visual object predictor (MeAP). The key contributions of MeAP can be summarized as follows: 1) the formulation of a noise model for specific invasion events based on incursion effects and corresponding template strategies serving as the foundation for more efficient memory utilization; 2) The memory exploration scheme based online tracking mask-based feature extraction strategy and the transformer architecture is introduced to mitigate the impact of noise invasion during memory vector construction; 3) the memory utilization scheme based target basic feature and dual feature target mask predictor is provided to implement the scene-edge feature for mask-based feature extraction method and jointly predict the accurate location of the tracking target.. Extensive experiments conducted on OTB100, NFS, VOT2021, and AVisT benchmarks demonstrate that MeAP, with its introduced modules, achieves comparable tracking performances against other state-of-the-art (SOTA) trackers, and operates at an average speed of 31 frames per second (FPS) across 4 benchmarks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103356"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Omnidirectional image quality assessment via multi-perceptual feature fusion 基于多感知特征融合的全方位图像质量评价
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2025-11-26 DOI: 10.1016/j.displa.2025.103302
Cheng Zhang , Shucun Si , Bo Zhang , Jiaying Wang
Omnidirectional images are integral to virtual reality (VR) applications. They present unique challenges for quality assessment. This is due to their high resolution and spatial complexity. Current omnidirectional image quality assessment (OIQA) techniques still struggle to extract multi-perceptual features and create interrelationships across consecutive viewports, which makes it difficult to replicate the subjective perception of the human eye. In response, this research proposes a multi-perceptual feature aggregation-based omnidirectional image quality assessment approach. The method creates a pseudo-temporal input by transforming the equirectangular projection (ERP) omnidirectional image into a series of viewports, simulating the user’s multi-viewport browsing journey. To improve frequency domain feature extraction capabilities, the backbone network combines a convolutional neural network with 2D wavelet transform convolution (WTConv). This module allows signal decomposition in the frequency domain while maintaining spatial information, which makes it easier to identify high-frequency features and structural defects in pictures. To better capture the continuous relationship between viewports, a temporal shift module (TSM) is added, which dynamically shifts the viewport features in the channel dimension, thereby improving the model’s perception of the continuity and spatial consistency of viewpoints. Additionally, the model incorporates the self-channel attention (SCA) mechanism to merge various perceptual characteristics and amplify salient feature expression to further improve the perceptual ability of important distortion regions. Experiments are conducted on the OIQA and CVIQD standard datasets, and the results show that our proposed models achieve excellent performance compared to existing full-reference and no-reference methods.
全向图像是虚拟现实(VR)应用中不可或缺的一部分。它们对质量评估提出了独特的挑战。这是由于它们的高分辨率和空间复杂性。目前的全向图像质量评估(OIQA)技术仍然在努力提取多感知特征,并在连续的视口之间创建相互关系,这使得很难复制人眼的主观感知。为此,本研究提出了一种基于多感知特征聚合的全方位图像质量评估方法。该方法通过将等矩形投影(ERP)全向图像转换为一系列视口,模拟用户的多视口浏览过程,从而产生伪时间输入。为了提高频域特征提取能力,骨干网将卷积神经网络与二维小波变换卷积(WTConv)相结合。该模块在保持空间信息的同时,在频域进行信号分解,便于识别图像中的高频特征和结构缺陷。为了更好地捕捉视口之间的连续关系,增加了时间位移模块(TSM),该模块在通道维度上动态移动视口特征,从而提高模型对视点连续性和空间一致性的感知。此外,该模型还引入了自通道注意(SCA)机制,融合各种感知特征,放大显著特征表达,进一步提高重要失真区域的感知能力。在OIQA和CVIQD标准数据集上进行了实验,结果表明,与现有的全参考和无参考方法相比,我们提出的模型具有优异的性能。
{"title":"Omnidirectional image quality assessment via multi-perceptual feature fusion","authors":"Cheng Zhang ,&nbsp;Shucun Si ,&nbsp;Bo Zhang ,&nbsp;Jiaying Wang","doi":"10.1016/j.displa.2025.103302","DOIUrl":"10.1016/j.displa.2025.103302","url":null,"abstract":"<div><div>Omnidirectional images are integral to virtual reality (VR) applications. They present unique challenges for quality assessment. This is due to their high resolution and spatial complexity. Current omnidirectional image quality assessment (OIQA) techniques still struggle to extract multi-perceptual features and create interrelationships across consecutive viewports, which makes it difficult to replicate the subjective perception of the human eye. In response, this research proposes a multi-perceptual feature aggregation-based omnidirectional image quality assessment approach. The method creates a pseudo-temporal input by transforming the equirectangular projection (ERP) omnidirectional image into a series of viewports, simulating the user’s multi-viewport browsing journey. To improve frequency domain feature extraction capabilities, the backbone network combines a convolutional neural network with 2D wavelet transform convolution (WTConv). This module allows signal decomposition in the frequency domain while maintaining spatial information, which makes it easier to identify high-frequency features and structural defects in pictures. To better capture the continuous relationship between viewports, a temporal shift module (TSM) is added, which dynamically shifts the viewport features in the channel dimension, thereby improving the model’s perception of the continuity and spatial consistency of viewpoints. Additionally, the model incorporates the self-channel attention (SCA) mechanism to merge various perceptual characteristics and amplify salient feature expression to further improve the perceptual ability of important distortion regions. Experiments are conducted on the OIQA and CVIQD standard datasets, and the results show that our proposed models achieve excellent performance compared to existing full-reference and no-reference methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103302"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DepressionLLM: Emotion- and causality-aware depression detection with foundation models 抑郁症llm:基于基础模型的情绪和因果关系感知抑郁症检测
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2025-12-04 DOI: 10.1016/j.displa.2025.103304
Shiyu Teng , Jiaqing Liu , Hao Sun , Yue Huang , Rahul Kumar Jain , Shurong Chai , Ruibo Hou , Tomoko Tateyama , Lanfen Lin , Lang He , Yen-Wei Chen
Depression is a complex mental health issue often reflected through subtle multimodal signals in speech, facial expressions, and language. However, existing approaches using large language models (LLMs) face limitations in integrating these diverse modalities and providing interpretable insights, restricting their effectiveness in real-world and clinical settings. This study presents a novel framework that leverages foundation models for interpretable multimodal depression detection. Our approach follows a three-stage process: First, pseudo-labels enriched with emotional and causal cues are generated using a pretrained language model (GPT-4o), expanding the training signal beyond ground-truth labels. Second, a coarse-grained learning phase employs another model (Qwen2.5) to capture relationships among depression levels, emotional states, and inferred reasoning. Finally, a fine-grained tuning stage fuses video, audio, and text inputs via a multimodal prompt fusion module to construct a unified depression representation. We evaluate our framework on benchmark datasets – E-DAIC, CMDC, and EATD – demonstrating consistent improvements over state-of-the-art methods in both depression detection and causal reasoning tasks. By integrating foundation models with multimodal video understanding, our work offers a robust and interpretable solution for mental health analysis, contributing to the advancement of multimodal AI in clinical and real-world applications.
抑郁症是一种复杂的心理健康问题,通常通过言语、面部表情和语言中微妙的多模态信号反映出来。然而,使用大型语言模型(llm)的现有方法在整合这些不同的模式和提供可解释的见解方面面临局限性,限制了它们在现实世界和临床环境中的有效性。本研究提出了一个新的框架,利用可解释的多模态抑郁检测的基础模型。我们的方法遵循三个阶段的过程:首先,使用预训练语言模型(gpt - 40)生成富含情感和因果线索的伪标签,将训练信号扩展到基本事实标签之外。其次,粗粒度学习阶段采用另一个模型(Qwen2.5)来捕捉抑郁水平、情绪状态和推断推理之间的关系。最后,细粒度调谐阶段通过多模态提示融合模块融合视频、音频和文本输入,以构建统一的压抑表示。我们在基准数据集(e- aic、cmc和EATD)上评估了我们的框架,证明了在抑郁症检测和因果推理任务方面,我们比最先进的方法有了一致的改进。通过将基础模型与多模态视频理解相结合,我们的工作为心理健康分析提供了一个强大且可解释的解决方案,为多模态人工智能在临床和现实世界中的应用做出了贡献。
{"title":"DepressionLLM: Emotion- and causality-aware depression detection with foundation models","authors":"Shiyu Teng ,&nbsp;Jiaqing Liu ,&nbsp;Hao Sun ,&nbsp;Yue Huang ,&nbsp;Rahul Kumar Jain ,&nbsp;Shurong Chai ,&nbsp;Ruibo Hou ,&nbsp;Tomoko Tateyama ,&nbsp;Lanfen Lin ,&nbsp;Lang He ,&nbsp;Yen-Wei Chen","doi":"10.1016/j.displa.2025.103304","DOIUrl":"10.1016/j.displa.2025.103304","url":null,"abstract":"<div><div>Depression is a complex mental health issue often reflected through subtle multimodal signals in speech, facial expressions, and language. However, existing approaches using large language models (LLMs) face limitations in integrating these diverse modalities and providing interpretable insights, restricting their effectiveness in real-world and clinical settings. This study presents a novel framework that leverages foundation models for interpretable multimodal depression detection. Our approach follows a three-stage process: First, pseudo-labels enriched with emotional and causal cues are generated using a pretrained language model (GPT-4o), expanding the training signal beyond ground-truth labels. Second, a coarse-grained learning phase employs another model (Qwen2.5) to capture relationships among depression levels, emotional states, and inferred reasoning. Finally, a fine-grained tuning stage fuses video, audio, and text inputs via a multimodal prompt fusion module to construct a unified depression representation. We evaluate our framework on benchmark datasets – E-DAIC, CMDC, and EATD – demonstrating consistent improvements over state-of-the-art methods in both depression detection and causal reasoning tasks. By integrating foundation models with multimodal video understanding, our work offers a robust and interpretable solution for mental health analysis, contributing to the advancement of multimodal AI in clinical and real-world applications.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103304"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unleash and integrate the power of pre-trained ViTs via feature fusion for open-vocabulary object detection 通过特征融合释放和整合预训练vit的力量,用于开放词汇目标检测
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2025-12-15 DOI: 10.1016/j.displa.2025.103321
Xiangyu Gao, Yu Dai, Taijin Zhao, Benliu Qiu, Lanxiao Wang, Heqian Qiu, Qingbo Wu, Hongliang Li
Alleviating over-fitting is one of the major concerns in open-vocabulary object detection (OVOD). Most OVOD methods rely on base data training and inherit the model structure from the close-set detectors. Final predictions are derived from the extracted features from backbones. Thus, backbone design plays a key role in improving the generalization capacity. However, existing works either choose fully optimizable network or a single frozen visual encoder as backbone, which limits the representation capacity of backbone features for OVOD and leads to sub-optimal performances. Therefore, we propose a novel multi-branch backbone network, named ViT-Feature-Modulated Multi-Scale Convolutional Network (VMCNet), which can effectively integrate and unleash the power of multiple pre-trained ViTs via the proposed feature fusion strategy. Drawing an analogy to the modulation mechanism in communication, we use an additional light-weight CNN branch to produce multi-scale carrier features, then modulates the representations from pre-trained ViTs to attain the final detection features. Our method not only leverages the information from base data but also utilizes the knowledge from multiple ViTs from CLIP and SAM, which ensembles the knowledge and generalization ability for OVOD setting. Equipped with the proposed backbone network, the detector could achieve better performance on novel categories. Evaluated on two popular benchmarks, our method boosts the detection performance on novel category and outperforms state-of-the-art methods. On OV-COCO, the proposed method achieves 47.5 AP50novel with ViT-B/16 and 52.8 AP50novel with ViT-L/14. On OV-LVIS, VMCNet with ViT-B/16 reaches 27.7 mAPr.
缓解过度拟合是开放词汇目标检测(OVOD)中的主要问题之一。大多数OVOD方法依赖于基础数据训练,并继承近集检测器的模型结构。最后的预测是由从主干提取的特征得到的。因此,主干网设计对提高系统的泛化能力起着至关重要的作用。然而,现有的工作要么选择完全可优化的网络,要么选择单个固定的视觉编码器作为主干,这限制了主干特征对OVOD的表示能力,导致性能不理想。因此,我们提出了一种新的多分支主干网络——ViT-Feature-Modulated Multi-Scale Convolutional network (VMCNet),该网络可以通过所提出的特征融合策略有效地整合和释放多个预训练的vit的力量。与通信中的调制机制类似,我们使用一个额外的轻量级CNN分支来产生多尺度载波特征,然后调制来自预训练vit的表示以获得最终的检测特征。该方法不仅利用了基础数据信息,而且利用了来自CLIP和SAM的多个vit的知识,综合了OVOD设置的知识和泛化能力。利用所提出的骨干网,该检测器可以在新的分类中取得更好的性能。在两个流行的基准测试中,我们的方法提高了对新类别的检测性能,并且优于最先进的方法。在OV-COCO上,该方法在viti - b /16和viti - l /14下分别达到47.5和52.8 AP50novel。在OV-LVIS上,VMCNet与ViT-B/16的比值达到27.7 mAPr。
{"title":"Unleash and integrate the power of pre-trained ViTs via feature fusion for open-vocabulary object detection","authors":"Xiangyu Gao,&nbsp;Yu Dai,&nbsp;Taijin Zhao,&nbsp;Benliu Qiu,&nbsp;Lanxiao Wang,&nbsp;Heqian Qiu,&nbsp;Qingbo Wu,&nbsp;Hongliang Li","doi":"10.1016/j.displa.2025.103321","DOIUrl":"10.1016/j.displa.2025.103321","url":null,"abstract":"<div><div>Alleviating over-fitting is one of the major concerns in open-vocabulary object detection (OVOD). Most OVOD methods rely on base data training and inherit the model structure from the close-set detectors. Final predictions are derived from the extracted features from backbones. Thus, backbone design plays a key role in improving the generalization capacity. However, existing works either choose fully optimizable network or a single frozen visual encoder as backbone, which limits the representation capacity of backbone features for OVOD and leads to sub-optimal performances. Therefore, we propose a novel multi-branch backbone network, named <strong>V</strong>iT-Feature-<strong>M</strong>odulated Multi-Scale <strong>C</strong>onvolutional Network (VMCNet), which can effectively integrate and unleash the power of multiple pre-trained ViTs via the proposed feature fusion strategy. Drawing an analogy to the modulation mechanism in communication, we use an additional light-weight CNN branch to produce multi-scale carrier features, then modulates the representations from pre-trained ViTs to attain the final detection features. Our method not only leverages the information from base data but also utilizes the knowledge from multiple ViTs from CLIP and SAM, which ensembles the knowledge and generalization ability for OVOD setting. Equipped with the proposed backbone network, the detector could achieve better performance on novel categories. Evaluated on two popular benchmarks, our method boosts the detection performance on novel category and outperforms state-of-the-art methods. On OV-COCO, the proposed method achieves 47.5 AP<span><math><msubsup><mrow></mrow><mrow><mn>50</mn></mrow><mrow><mi>novel</mi></mrow></msubsup></math></span> with ViT-B/16 and 52.8 AP<span><math><msubsup><mrow></mrow><mrow><mn>50</mn></mrow><mrow><mi>novel</mi></mrow></msubsup></math></span> with ViT-L/14. On OV-LVIS, VMCNet with ViT-B/16 reaches 27.7 mAP<span><math><msub><mrow></mrow><mrow><mi>r</mi></mrow></msub></math></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103321"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-channel image dehazing algorithm based on spatial-frequency domain feature enhancement 基于空频域特征增强的双通道图像去雾算法
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2025-12-16 DOI: 10.1016/j.displa.2025.103315
Jiameng Yu, Nan Xia, Xinmiao Yu
To address the limitations of traditional single-domain dehazing methods in balancing global scene recovery and local detail preservation, this paper proposes a dual-domain feature enhancement network (DFENet) that achieves precise haze removal and authentic image reconstruction through a cross-domain collaboration mechanism. In the spatial domain, we design two key modules: the global-local feature enhancement module (GLFEM) decouples features and employs joint attention to channel position, simultaneously optimizing scene structures and texture details. The multiscale feature enhancement module (MSFE) dynamically adapts receptive fields to fuse multiscale features, enhancing robustness in complex scenarios. In the frequency domain, we introduce the discrete cosine transform module (DCTM), which strategically learns and selects different frequency channels while dynamically filtering and enhancing both high- and low-frequency components. Extensive experiments demonstrate that DFENet outperforms state-of-the-art (SOTA) methods, achieving a PSNR of 39.42 dB on the SOTS indoor dataset. Meanwhile, it improves by 0.51 dB on the SOTS-outdoor dataset. Furthermore, it performs well on real-world datasets, achieving PSNRs of 21.68 dB and 17.12 dB for NH-HAZE and Dense-Haze, respectively.
针对传统的单域去雾方法在平衡全局场景恢复和局部细节保留方面的局限性,本文提出了一种双域特征增强网络(dual-domain feature enhancement network, DFENet),通过跨域协作机制实现精确去雾和真实图像重建。在空间域中,我们设计了两个关键模块:全局局部特征增强模块(GLFEM),该模块对特征进行解耦,并对通道位置进行联合关注,同时优化场景结构和纹理细节。多尺度特征增强模块(MSFE)通过动态调整接收场来融合多尺度特征,增强了复杂场景下的鲁棒性。在频域,我们引入了离散余弦变换模块(DCTM),该模块在动态滤波和增强高频和低频分量的同时,策略性地学习和选择不同的频率通道。大量实验表明,DFENet优于最先进的(SOTA)方法,在SOTS室内数据集上实现了39.42 dB的PSNR。同时,在SOTS-outdoor数据集上,它提高了0.51 dB。此外,它在实际数据集上表现良好,NH-HAZE和Dense-Haze的psnr分别达到21.68 dB和17.12 dB。
{"title":"Dual-channel image dehazing algorithm based on spatial-frequency domain feature enhancement","authors":"Jiameng Yu,&nbsp;Nan Xia,&nbsp;Xinmiao Yu","doi":"10.1016/j.displa.2025.103315","DOIUrl":"10.1016/j.displa.2025.103315","url":null,"abstract":"<div><div>To address the limitations of traditional single-domain dehazing methods in balancing global scene recovery and local detail preservation, this paper proposes a dual-domain feature enhancement network (DFENet) that achieves precise haze removal and authentic image reconstruction through a cross-domain collaboration mechanism. In the spatial domain, we design two key modules: the global-local feature enhancement module (GLFEM) decouples features and employs joint attention to channel position, simultaneously optimizing scene structures and texture details. The multiscale feature enhancement module (MSFE) dynamically adapts receptive fields to fuse multiscale features, enhancing robustness in complex scenarios. In the frequency domain, we introduce the discrete cosine transform module (DCTM), which strategically learns and selects different frequency channels while dynamically filtering and enhancing both high- and low-frequency components. Extensive experiments demonstrate that DFENet outperforms state-of-the-art (SOTA) methods, achieving a PSNR of 39.42 dB on the SOTS indoor dataset. Meanwhile, it improves by 0.51 dB on the SOTS-outdoor dataset. Furthermore, it performs well on real-world datasets, achieving PSNRs of 21.68 dB and 17.12 dB for NH-HAZE and Dense-Haze, respectively.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103315"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking low-light image enhancement: A local–global synergy perspective 重新思考低光图像增强:局部-全局协同视角
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2026-01-13 DOI: 10.1016/j.displa.2026.103348
Qinghua Lin , Yu Long , Xudong Xiong , Wenchao Jiang , Zhihua Wang , Qiuping Jiang
Low-light image enhancement (LLIE) remains a challenging task due to the complex degradations in illumination, contrast, and structural details. Deep neural network-based approaches have shown promising results in addressing LLIE. However, most existing methods either utilize convolutional layers with local receptive fields, which are well-suited for restoring local textures, or Transformer layers with long-range dependencies, which are better at correcting global illumination. Despite their respective strengths, these approaches often struggle to effectively handle both aspects simultaneously. In this paper, we revisit LLIE from a local–global synergy perspective and propose a unified framework, the Local–Global Synergy Network (LGS-Net). LGS-Net explicitly extracts local and global features in parallel using a separable CNN and a Swin Transformer block, respectively, effectively modeling both local structural fidelity and global illumination balance. The extracted features are then fed into a squeeze-and-excitation-based fusion module, which adaptively integrates multi-scale information guided by perceptual relevance. Extensive experiments on multiple real-world benchmarks show that our method consistently outperforms existing state-of-the-art methods across both quantitative metrics (e.g., PSNR, SSIM, Q-Align) and perceptual quality, with notable improvements in color fidelity and detail preservation under extreme low-light and non-uniform illumination.
低光图像增强(LLIE)仍然是一个具有挑战性的任务,由于在照明,对比度和结构细节的复杂退化。基于深度神经网络的方法在解决LLIE方面显示出有希望的结果。然而,大多数现有方法要么使用具有局部接受域的卷积层,这非常适合恢复局部纹理,要么使用具有远程依赖关系的Transformer层,这更适合校正全局光照。尽管这些方法各自具有优势,但它们往往难以同时有效地处理这两个方面。在本文中,我们从本地-全球协同的角度重新审视LLIE,并提出了一个统一的框架,即本地-全球协同网络(LGS-Net)。LGS-Net分别使用可分离的CNN和Swin Transformer块明确地并行提取局部和全局特征,有效地模拟了局部结构保真度和全局照明平衡。然后将提取的特征输入到基于挤压和兴奋的融合模块中,该模块以感知相关性为导向自适应集成多尺度信息。在多个现实世界基准上的广泛实验表明,我们的方法在定量指标(例如,PSNR, SSIM, Q-Align)和感知质量方面始终优于现有的最先进的方法,在极低光和不均匀照明下的色彩保真度和细节保存方面有显着改善。
{"title":"Rethinking low-light image enhancement: A local–global synergy perspective","authors":"Qinghua Lin ,&nbsp;Yu Long ,&nbsp;Xudong Xiong ,&nbsp;Wenchao Jiang ,&nbsp;Zhihua Wang ,&nbsp;Qiuping Jiang","doi":"10.1016/j.displa.2026.103348","DOIUrl":"10.1016/j.displa.2026.103348","url":null,"abstract":"<div><div>Low-light image enhancement (LLIE) remains a challenging task due to the complex degradations in illumination, contrast, and structural details. Deep neural network-based approaches have shown promising results in addressing LLIE. However, most existing methods either utilize convolutional layers with local receptive fields, which are well-suited for restoring local textures, or Transformer layers with long-range dependencies, which are better at correcting global illumination. Despite their respective strengths, these approaches often struggle to effectively handle both aspects simultaneously. In this paper, we revisit LLIE from a local–global synergy perspective and propose a unified framework, the Local–Global Synergy Network (LGS-Net). LGS-Net explicitly extracts local and global features in parallel using a separable CNN and a Swin Transformer block, respectively, effectively modeling both local structural fidelity and global illumination balance. The extracted features are then fed into a squeeze-and-excitation-based fusion module, which adaptively integrates multi-scale information guided by perceptual relevance. Extensive experiments on multiple real-world benchmarks show that our method consistently outperforms existing state-of-the-art methods across both quantitative metrics (e.g., PSNR, SSIM, Q-Align) and perceptual quality, with notable improvements in color fidelity and detail preservation under extreme low-light and non-uniform illumination.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103348"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Endo-E2E-GS: End-to-end 3D reconstruction of endoscopic scenes using Gaussian Splatting endoe - e2e - gs:使用高斯飞溅对内镜场景进行端到端三维重建
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2026-01-13 DOI: 10.1016/j.displa.2026.103353
Xiongzhi Wang , Boyu Yang , Min Wei , Yu Chen , Jingang Zhang , Yunfeng Nie
Three-dimensional (3D) reconstruction is essential for enhancing spatial perception and geometric understanding in minimally invasive surgery. However, current methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) often rely on offline preprocessing—such as COLMAP-based point clouds or multi-frame fusion—limiting their adaptability and clinical deployment. We propose Endo-E2E-GS, a fully end-to-end framework that reconstructs structured 3D Gaussian fields directly from a single stereo endoscopic image pair. The system integrates (1) a DilatedResNet-based stereo depth estimator for robust geometry inference in low-texture scenes, (2) a Gaussian attribute predictor that infers per-pixel rotation, scale, and opacity, and (3) a differentiable splatting renderer for 2D view supervision. Evaluated on the ENDONERF and SCARED datasets, Endo-E2E-GS achieves highly competitive performance, reaching PSNR values of 38.874/33.052 and SSIM scores of 0.978/0.863, respectively, surpassing recent state-of-the-art approaches. It requires no explicit scene initialization and demonstrates consistent performance across two representative endoscopic datasets. Code is available at: https://github.com/Intelligent-Imaging-Center/Endo-E2E-GS.
在微创手术中,三维重建对于增强空间感知和几何理解至关重要。然而,目前的方法,如神经辐射场(NeRF)和3D高斯飞溅(3DGS)通常依赖于离线预处理,如基于colmap的点云或多帧融合,限制了它们的适应性和临床部署。我们提出了Endo-E2E-GS,这是一个完全端到端的框架,可以直接从单个立体内窥镜图像对重建结构化的3D高斯场。该系统集成了(1)一个基于dilatedresnet的立体深度估计器,用于在低纹理场景中进行鲁棒的几何推断;(2)一个高斯属性预测器,用于推断每像素的旋转、比例和不透明度;(3)一个可微分的飞溅渲染器,用于2D视图监督。在ENDONERF和SCARED数据集上进行评估,Endo-E2E-GS具有很强的竞争力,PSNR值分别达到38.874/33.052,SSIM得分分别达到0.978/0.863,超过了目前最先进的方法。它不需要明确的场景初始化,并在两个代表性的内窥镜数据集上展示一致的性能。代码可从https://github.com/Intelligent-Imaging-Center/Endo-E2E-GS获得。
{"title":"Endo-E2E-GS: End-to-end 3D reconstruction of endoscopic scenes using Gaussian Splatting","authors":"Xiongzhi Wang ,&nbsp;Boyu Yang ,&nbsp;Min Wei ,&nbsp;Yu Chen ,&nbsp;Jingang Zhang ,&nbsp;Yunfeng Nie","doi":"10.1016/j.displa.2026.103353","DOIUrl":"10.1016/j.displa.2026.103353","url":null,"abstract":"<div><div>Three-dimensional (3D) reconstruction is essential for enhancing spatial perception and geometric understanding in minimally invasive surgery. However, current methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) often rely on offline preprocessing—such as COLMAP-based point clouds or multi-frame fusion—limiting their adaptability and clinical deployment. We propose Endo-E2E-GS, a fully end-to-end framework that reconstructs structured 3D Gaussian fields directly from a single stereo endoscopic image pair. The system integrates (1) a DilatedResNet-based stereo depth estimator for robust geometry inference in low-texture scenes, (2) a Gaussian attribute predictor that infers per-pixel rotation, scale, and opacity, and (3) a differentiable splatting renderer for 2D view supervision. Evaluated on the ENDONERF and SCARED datasets, Endo-E2E-GS achieves highly competitive performance, reaching PSNR values of 38.874/33.052 and SSIM scores of 0.978/0.863, respectively, surpassing recent state-of-the-art approaches. It requires no explicit scene initialization and demonstrates consistent performance across two representative endoscopic datasets. Code is available at: <span><span><strong>https://github.com/Intelligent-Imaging-Center/Endo-E2E-GS</strong></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103353"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time laser speckle myocardial blood flow imaging system in vivo 活体实时激光斑点心肌血流成像系统
IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-04-01 Epub Date: 2025-12-18 DOI: 10.1016/j.displa.2025.103328
Ren Bin Wang , Yuan Yuan , Hong Li Liu , Kai Jing Shang , Wei Nan Gao , Yong Bi , Yang Yu
Laser speckle contrast imaging (LSCI) is a real-time, full-field, non-contact imaging technique widely used for blood flow visualization in biomedical applications. Potentially, LSCI could be applied to monitor the spatiotemporal evolution of the myocardial coronary arteries for improving the surgical quality in coronary artery bypass graft (CABG). The functionality of the myocardial LSCI device has been well demonstrated on the animal hearts by the data post-processing method. While the corresponding real-time LSCI system, which is of great significance to assist the surgeons in making diagnoses during CABG, has not been resolved yet. This paper aims to develop a high-speed laser speckle myocardial blood flow imaging (LSMBFI) system for measuring blood flow perfusion in real time. Through parallel computing and asynchronous programming combined with the speckle contrast-to-blood speed relationship, our LSMBFI system enables to display the blood flow index (BFI) images of 1456 × 1088 pixels with the frame rate of 68 Hz. For smaller region of interest, such as 1000 × 1000 pixels, the displaying framerate could reach up to 120 Hz. For validation, the phantom and the animal experiments are designed. It turns out that our LSMBFI system firstly, to the best of our knowledge, realizes real-time monitoring the spatial and temporal myocardial blood flow perfusion on the beating heart. The results would contribute to improving the surgical quality control in CABG on large animals or human beings and help to the engineering application of the LSMBFI instrument in the future.
激光散斑对比成像(LSCI)是一种实时、全视野、非接触成像技术,广泛应用于生物医学领域的血流可视化。LSCI可能用于监测心肌冠状动脉的时空演变,以提高冠状动脉搭桥术(CABG)的手术质量。通过数据后处理方法,在动物心脏上很好地证明了心肌LSCI装置的功能。而相应的实时LSCI系统,对于辅助外科医生在CABG过程中进行诊断具有重要意义,目前还没有解决。本文旨在研制一种高速激光散斑心肌血流成像(LSMBFI)系统,用于实时测量血流灌注。通过并行计算和异步编程,结合散斑对比-血流速度关系,我们的LSMBFI系统能够显示1456 × 1088像素的血流指数(BFI)图像,帧率为68 Hz。对于较小的感兴趣区域,例如1000 × 1000像素,显示帧率可以达到120 Hz。为了验证,设计了模型和动物实验。结果表明,我们的LSMBFI系统在我们所知的范围内,首次实现了心肌血流灌注在跳动心脏上的时空实时监测。研究结果将有助于提高大型动物或人冠状动脉搭桥术的手术质量控制,并有助于LSMBFI仪器在未来的工程应用。
{"title":"Real-time laser speckle myocardial blood flow imaging system in vivo","authors":"Ren Bin Wang ,&nbsp;Yuan Yuan ,&nbsp;Hong Li Liu ,&nbsp;Kai Jing Shang ,&nbsp;Wei Nan Gao ,&nbsp;Yong Bi ,&nbsp;Yang Yu","doi":"10.1016/j.displa.2025.103328","DOIUrl":"10.1016/j.displa.2025.103328","url":null,"abstract":"<div><div>Laser speckle contrast imaging (LSCI) is a real-time, full-field, non-contact imaging technique widely used for blood flow visualization in biomedical applications. Potentially, LSCI could be applied to monitor the spatiotemporal evolution of the myocardial coronary arteries for improving the surgical quality in coronary artery bypass graft (CABG). The functionality of the myocardial LSCI device has been well demonstrated on the animal hearts by the data post-processing method. While the corresponding real-time LSCI system, which is of great significance to assist the surgeons in making diagnoses during CABG, has not been resolved yet. This paper aims to develop a high-speed laser speckle myocardial blood flow imaging (LSMBFI) system for measuring blood flow perfusion in real time. Through parallel computing and asynchronous programming combined with the speckle contrast-to-blood speed relationship, our LSMBFI system enables to display the blood flow index (BFI) images of 1456 × 1088 pixels with the frame rate of 68 Hz. For smaller region of interest, such as 1000 × 1000 pixels, the displaying framerate could reach up to 120 Hz. For validation, the phantom and the animal experiments are designed. It turns out that our LSMBFI system firstly, to the best of our knowledge, realizes real-time monitoring the spatial and temporal myocardial blood flow perfusion on the beating heart. The results would contribute to improving the surgical quality control in CABG on large animals or human beings and help to the engineering application of the LSMBFI instrument in the future.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"92 ","pages":"Article 103328"},"PeriodicalIF":3.4,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Displays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1