Displays最新文献_第9页

Evaluation and application strategy of low blue light mode of desktop display based on brightness characteristics 基于亮度特性的台式显示器低蓝光模式评估与应用策略

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-10 DOI: 10.1016/j.displa.2024.102809

Wenqian Xu , Peiyu Wu , Qi Yao , Rongjun Zhang , Bang Qin , Dong Wang , Shenfei Chen , Yedong Shen

Long-term use of desktop displays may increase the burden of the visual system and users can use low blue light mode for eye protection in terms of circadian effect. In this work, we investigated its influence from two aspects of brightness-visual effect, namely efficacy and circadian effect, and color quality, namely color difference Δu’v’ (chromaticity coordinate offset of two colors), and D_uv (deviation from blackbody locus). The decrease of brightness is accompanied by the increase of efficacy while diminishing circadian effect. The blue, cyan, and magenta have the largest Δu’v’, and the lower the saturation, the greater the Δu’v’. The lower the correlated color temperature (CCT), the greater the D_uv and the farther it deviates from the Planckian locus. We summarize three low blue light mode adjustment strategies based on red, green, and blue three-channel ratio of spectrum, and propose an optimized mode using genetic algorithm, which has two optional CCT ranges of 3500–5000 K and 2700–3000 K. Furthermore, we establish the relationship between brightness and gamut coverage to refine the screen brightness range for low blue light mode. This research provides valuable insights into low blue light mode application and their implications for human-centric healthy displays.

长期使用台式显示器可能会增加视觉系统的负担，从昼夜节律效应的角度考虑，用户可以使用低蓝光模式来保护眼睛。在这项工作中，我们从亮度的两个方面--视觉效果（即功效和昼夜节律效应）和色彩质量（即色差Δu'v'（两种颜色的色度坐标偏移）和 Duv（偏离黑体位置））--研究了其影响。亮度的降低伴随着功效的提高，同时昼夜效应也会减弱。蓝色、青色和品红色的Δu'v'最大，饱和度越低，Δu'v'越大。相关色温（CCT）越低，Duv 越大，偏离普朗克位置越远。我们总结了基于红、绿、蓝三通道光谱比例的三种低蓝光模式调整策略，并利用遗传算法提出了一种优化模式，该模式有 3500-5000 K 和 2700-3000 K 两种可选色温范围。这项研究为低蓝光模式的应用及其对以人为本的健康显示器的影响提供了宝贵的见解。

{"title":"Evaluation and application strategy of low blue light mode of desktop display based on brightness characteristics","authors":"Wenqian Xu , Peiyu Wu , Qi Yao , Rongjun Zhang , Bang Qin , Dong Wang , Shenfei Chen , Yedong Shen","doi":"10.1016/j.displa.2024.102809","DOIUrl":"10.1016/j.displa.2024.102809","url":null,"abstract":"<div><p>Long-term use of desktop displays may increase the burden of the visual system and users can use low blue light mode for eye protection in terms of circadian effect. In this work, we investigated its influence from two aspects of brightness-visual effect, namely efficacy and circadian effect, and color quality, namely color difference Δu’v’ (chromaticity coordinate offset of two colors), and D<sub>uv</sub> (deviation from blackbody locus). The decrease of brightness is accompanied by the increase of efficacy while diminishing circadian effect. The blue, cyan, and magenta have the largest Δu’v’, and the lower the saturation, the greater the Δu’v’. The lower the correlated color temperature (CCT), the greater the D<sub>uv</sub> and the farther it deviates from the Planckian locus. We summarize three low blue light mode adjustment strategies based on red, green, and blue three-channel ratio of spectrum, and propose an optimized mode using genetic algorithm, which has two optional CCT ranges of 3500–5000 K and 2700–3000 K. Furthermore, we establish the relationship between brightness and gamut coverage to refine the screen brightness range for low blue light mode. This research provides valuable insights into low blue light mode application and their implications for human-centric healthy displays.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102809"},"PeriodicalIF":3.7,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human pose estimation in complex background videos via Transformer-based multi-scale feature integration 通过基于变换器的多尺度特征集成，在复杂背景视频中进行人体姿态估计

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-08 DOI: 10.1016/j.displa.2024.102805

Chen Cheng, Huahu Xu

Human posture estimation is still a hot research topic. Previous algorithms based on traditional machine learning have difficulties in feature extraction and low fusion efficiency. To address these problems, we proposed a Transformer-based method. We combined three techniques, namely the Transformer-based feature extraction module, the multi-scale feature fusion module, and the occlusion processing mechanism, to capture the human pose. The Transformer-based feature extraction module uses the self-attention mechanism to extract key features from the input sequence, the multi-scale feature fusion module fuses feature information of different scales to enhance the perception ability of the model, and the occlusion processing mechanism can effectively handle occlusion in the data and effectively remove background interference. Our method has shown excellent performance through verification on the standard dataset Human3.6M and the wild video dataset, achieving accurate pose prediction in both complex actions and challenging samples.

人体姿态估计仍是一个热门研究课题。以往基于传统机器学习的算法存在特征提取困难、融合效率低等问题。针对这些问题，我们提出了一种基于变换器的方法。我们结合了三种技术，即基于变换器的特征提取模块、多尺度特征融合模块和遮挡处理机制，来捕捉人体姿态。基于变换器的特征提取模块利用自注意机制从输入序列中提取关键特征，多尺度特征融合模块融合不同尺度的特征信息以增强模型的感知能力，而遮挡处理机制能有效处理数据中的遮挡并有效去除背景干扰。通过在标准数据集 Human3.6M 和野生视频数据集上的验证，我们的方法表现出了卓越的性能，在复杂动作和高难度样本中都能实现准确的姿势预测。

引用次数: 0

Development of low-temperature polycrystalline silicon process and novel 2T2C driving circuits for electric paper 开发用于电纸的低温多晶硅工艺和新型 2T2C 驱动电路

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-08 DOI: 10.1016/j.displa.2024.102808

Yu Jin , Ying Shen , Wen-Jie Xu , Wen-Zhi Fan , Lei Xu , Xiao-Yu Gao , Yong Wu , Zhi-Yi Zhou , Wei-Jie Gu , Dong-Liang Yu , Jian-Qiu Sun , Li-Juan Ke , Wei-Bin Zhang , Wei-Qi Xu , Feng-Ying Xu

In this work, we systematically investigate low-temperature polycrystalline silicon (LTPS)-based driving circuits of electronic paper for the aim of adopting small width/length ratio (W/L) of LTPS-based thin film transistors (TFTs) to reduce switch error and thus improve image sticking. Firstly, LTPS-TFTs with extremely low off-state leakage current (I_OFF) even at a large source-drain voltage (V_DS) of 30 V were obtained through detailed explorations of LTPS process technology. Meanwhile, the high on-state current (I_ON) of LTPS-TFTs also meet the requirements of fast signal writing to the storage capacitor due to their extremely high field-effect mobility (approximately 100 cm²/V⋅s), making it possible to fabricate TFTs with relatively small W/L, thereby minimizing switch error. The I_D-V_D test results reveal that the produced LTPS-TFTs can effectively withstand the maximum voltage difference of 30 V during product operation. Subsequently, the optimal W/L of the LTPS-TFT was determined through experimental results. Then, reliability test was conducted on the obtained LTPS-TFTs, revealing that the threshold voltage (V_TH) of the LTPS-TFTs shifted by 0.08 V after 7200 s under negative bias temperature stress (NBTS), and only by 0.19 V under positive bias temperature stress (PBTS). The aging test results of the aforementioned LTPS-TFTs exhibits a new physical phenomenon, that is, the I_OFF of the LTPS-TFTs has a strict matching characteristic with the aging direction. Next, we proposed a novel 2T2C driving circuit for the e-paper, which can effectively avoid the adverse effects of I_OFF on the frame holding period, and plotted it into an array layout. Finally, we combined the optimal fabricating process of the LTPS-TFTs with the 2T2C driving circuit design scheme to produce an e-paper with outstanding image sticking performance.

在这项工作中，我们系统地研究了基于低温多晶硅（LTPS）的电子纸驱动电路，目的是采用基于 LTPS 的薄膜晶体管（TFT）的小宽度/长度比（W/L），以减少开关误差，从而改善图像粘性。首先，通过对 LTPS 工艺技术的详细探索，即使在 30 V 的大源漏极电压 (VDS) 下，LTPS-TFT 也能获得极低的离态漏电流 (IOFF)。同时，由于 LTPS-TFT 具有极高的场效应迁移率（约 100 cm2/V⋅s），其高导通电流（ION）也能满足向存储电容器快速写入信号的要求，因此可以制造出相对较小 W/L 的 TFT，从而将开关误差降至最低。ID-VD 测试结果表明，所生产的 LTPS-TFT 在产品运行期间可有效承受 30 V 的最大电压差。随后，通过实验结果确定了 LTPS-TFT 的最佳 W/L。然后，对所制备的 LTPS-TFT 进行了可靠性测试，结果表明在负偏压温度应力（NBTS）作用下，LTPS-TFT 的阈值电压（VTH）在 7200 秒后偏移了 0.08 V，而在正偏压温度应力（PBTS）作用下仅偏移了 0.19 V。上述 LTPS-TFT 的老化测试结果表明了一种新的物理现象，即 LTPS-TFT 的 IOFF 与老化方向具有严格的匹配特性。接着，我们提出了一种新型的 2T2C 电子纸驱动电路，它能有效避免 IOFF 对帧保持期的不利影响，并将其绘制成阵列布局图。最后，我们将 LTPS-TFT 的最佳制造工艺与 2T2C 驱动电路设计方案相结合，制造出了具有出色图像保持性能的电子纸。

{"title":"Development of low-temperature polycrystalline silicon process and novel 2T2C driving circuits for electric paper","authors":"Yu Jin , Ying Shen , Wen-Jie Xu , Wen-Zhi Fan , Lei Xu , Xiao-Yu Gao , Yong Wu , Zhi-Yi Zhou , Wei-Jie Gu , Dong-Liang Yu , Jian-Qiu Sun , Li-Juan Ke , Wei-Bin Zhang , Wei-Qi Xu , Feng-Ying Xu","doi":"10.1016/j.displa.2024.102808","DOIUrl":"10.1016/j.displa.2024.102808","url":null,"abstract":"<div><p>In this work, we systematically investigate low-temperature polycrystalline silicon (LTPS)-based driving circuits of electronic paper for the aim of adopting small width/length ratio (W/L) of LTPS-based thin film transistors (TFTs) to reduce switch error and thus improve image sticking. Firstly, LTPS-TFTs with extremely low off-state leakage current (I<sub>OFF</sub>) even at a large source-drain voltage (V<sub>DS</sub>) of 30 V were obtained through detailed explorations of LTPS process technology. Meanwhile, the high on-state current (I<sub>ON</sub>) of LTPS-TFTs also meet the requirements of fast signal writing to the storage capacitor due to their extremely high field-effect mobility (approximately 100 cm<sup>2</sup>/V⋅s), making it possible to fabricate TFTs with relatively small W/L, thereby minimizing switch error. The I<sub>D</sub>-V<sub>D</sub> test results reveal that the produced LTPS-TFTs can effectively withstand the maximum voltage difference of 30 V during product operation. Subsequently, the optimal W/L of the LTPS-TFT was determined through experimental results. Then, reliability test was conducted on the obtained LTPS-TFTs, revealing that the threshold voltage (V<sub>TH</sub>) of the LTPS-TFTs shifted by 0.08 V after 7200 s under negative bias temperature stress (NBTS), and only by 0.19 V under positive bias temperature stress (PBTS). The aging test results of the aforementioned LTPS-TFTs exhibits a new physical phenomenon, that is, the I<sub>OFF</sub> of the LTPS-TFTs has a strict matching characteristic with the aging direction. Next, we proposed a novel 2T2C driving circuit for the e-paper, which can effectively avoid the adverse effects of I<sub>OFF</sub> on the frame holding period, and plotted it into an array layout. Finally, we combined the optimal fabricating process of the LTPS-TFTs with the 2T2C driving circuit design scheme to produce an e-paper with outstanding image sticking performance.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102808"},"PeriodicalIF":3.7,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Review on SLAM algorithms for Augmented Reality 增强现实 SLAM 算法综述

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-07-31 DOI: 10.1016/j.displa.2024.102806

Xingdong Sheng , Shijie Mao , Yichao Yan , Xiaokang Yang

Augmented Reality (AR) has gained significant attention in recent years as a technology that enhances the user’s perception and interaction with the real world by overlaying virtual objects. Simultaneous Localization and Mapping (SLAM) algorithm plays a crucial role in enabling AR applications by allowing the device to understand its position and orientation in the real world while mapping the environment. This paper first summarizes AR products and SLAM algorithms in recent years, and presents a comprehensive overview of SLAM algorithms including feature-based method, direct method, and deep learning-based method, highlighting their advantages and limitations. Then provides an in-depth exploration of classical SLAM algorithms for AR, with a focus on visual SLAM and visual-inertial SLAM. Lastly, sensor configuration, datasets, and performance evaluation for AR SLAM are also discussed. The review concludes with a summary of the current state of SLAM algorithms for AR and provides insights into future directions for research and development in this field. Overall, this review serves as a valuable resource for researchers and engineers who are interested in understanding the advancements and challenges in SLAM algorithms for AR.

增强现实（AR）技术通过叠加虚拟对象来增强用户对现实世界的感知和互动，近年来受到了广泛关注。同时定位和映射（SLAM）算法允许设备在映射环境的同时了解自己在现实世界中的位置和方向，在实现 AR 应用方面发挥着至关重要的作用。本文首先总结了近年来的 AR 产品和 SLAM 算法，并全面介绍了 SLAM 算法，包括基于特征的方法、直接方法和基于深度学习的方法，强调了它们的优势和局限性。然后深入探讨了 AR 的经典 SLAM 算法，重点介绍了视觉 SLAM 和视觉-惯性 SLAM。最后，还讨论了 AR SLAM 的传感器配置、数据集和性能评估。综述最后总结了 AR SLAM 算法的现状，并对该领域未来的研发方向提出了见解。总之，对于有兴趣了解 AR SLAM 算法的进展和挑战的研究人员和工程师来说，本综述是一份宝贵的资料。

{"title":"Review on SLAM algorithms for Augmented Reality","authors":"Xingdong Sheng , Shijie Mao , Yichao Yan , Xiaokang Yang","doi":"10.1016/j.displa.2024.102806","DOIUrl":"10.1016/j.displa.2024.102806","url":null,"abstract":"<div><p>Augmented Reality (AR) has gained significant attention in recent years as a technology that enhances the user’s perception and interaction with the real world by overlaying virtual objects. Simultaneous Localization and Mapping (SLAM) algorithm plays a crucial role in enabling AR applications by allowing the device to understand its position and orientation in the real world while mapping the environment. This paper first summarizes AR products and SLAM algorithms in recent years, and presents a comprehensive overview of SLAM algorithms including feature-based method, direct method, and deep learning-based method, highlighting their advantages and limitations. Then provides an in-depth exploration of classical SLAM algorithms for AR, with a focus on visual SLAM and visual-inertial SLAM. Lastly, sensor configuration, datasets, and performance evaluation for AR SLAM are also discussed. The review concludes with a summary of the current state of SLAM algorithms for AR and provides insights into future directions for research and development in this field. Overall, this review serves as a valuable resource for researchers and engineers who are interested in understanding the advancements and challenges in SLAM algorithms for AR.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102806"},"PeriodicalIF":3.7,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-resolution enhanced cross-subspace fusion network for light field image superresolution 用于光场图像超分辨率的高分辨率增强型跨子空间融合网络

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-07-29 DOI: 10.1016/j.displa.2024.102803

Shixu Ying , Shubo Zhou , Xue-Qin Jiang , Yongbin Gao , Feng Pan , Zhijun Fang

Light field (LF) images offer abundant spatial and angular information, therefore, the combination of which is beneficial in the performance of LF image superresolution (LF image SR). Currently, existing methods often decompose the 4D LF data into low-dimensional subspaces for individual feature extraction and fusion for LF image SR. However, the performance of these methods is restricted because of lacking effective correlations between subspaces and missing out on crucial complementary information for capturing rich texture details. To address this, we propose a cross-subspace fusion network for LF spatial SR (i.e., CSFNet). Specifically, we design the progressive cross-subspace fusion module (PCSFM), which can progressively establish cross-subspace correlations based on a cross-attention mechanism to comprehensively enrich LF information. Additionally, we propose a high-resolution adaptive enhancement group (HR-AEG), which preserves the texture and edge details in the high resolution feature domain by employing a multibranch enhancement method and an adaptive weight strategy. The experimental results demonstrate that our approach achieves highly competitive performance on multiple LF datasets compared to state-of-the-art (SOTA) methods.

光场（LF）图像提供了丰富的空间和角度信息，因此，将这些信息结合起来有利于实现 LF 图像超分辨率（LF 图像 SR）。目前，现有的方法通常将 4D 光场数据分解成低维子空间，用于单独特征提取和光场图像 SR 的融合。然而，这些方法的性能受到限制，因为子空间之间缺乏有效的相关性，无法捕捉到丰富纹理细节的关键互补信息。为此，我们提出了一种用于低频空间 SR 的跨子空间融合网络（即 CSFNet）。具体来说，我们设计了渐进式跨子空间融合模块（PCSFM），它可以基于交叉关注机制逐步建立跨子空间相关性，从而全面丰富低频信息。此外，我们还提出了高分辨率自适应增强组（HR-AEG），通过采用多分支增强方法和自适应权重策略，保留了高分辨率特征域中的纹理和边缘细节。实验结果表明，与最先进的（SOTA）方法相比，我们的方法在多个低频数据集上取得了极具竞争力的性能。

{"title":"High-resolution enhanced cross-subspace fusion network for light field image superresolution","authors":"Shixu Ying , Shubo Zhou , Xue-Qin Jiang , Yongbin Gao , Feng Pan , Zhijun Fang","doi":"10.1016/j.displa.2024.102803","DOIUrl":"10.1016/j.displa.2024.102803","url":null,"abstract":"<div><p>Light field (LF) images offer abundant spatial and angular information, therefore, the combination of which is beneficial in the performance of LF image superresolution (LF image SR). Currently, existing methods often decompose the 4D LF data into low-dimensional subspaces for individual feature extraction and fusion for LF image SR. However, the performance of these methods is restricted because of lacking effective correlations between subspaces and missing out on crucial complementary information for capturing rich texture details. To address this, we propose a cross-subspace fusion network for LF spatial SR (i.e., CSFNet). Specifically, we design the progressive cross-subspace fusion module (PCSFM), which can progressively establish cross-subspace correlations based on a cross-attention mechanism to comprehensively enrich LF information. Additionally, we propose a high-resolution adaptive enhancement group (HR-AEG), which preserves the texture and edge details in the high resolution feature domain by employing a multibranch enhancement method and an adaptive weight strategy. The experimental results demonstrate that our approach achieves highly competitive performance on multiple LF datasets compared to state-of-the-art (SOTA) methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102803"},"PeriodicalIF":3.7,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A dense video caption dataset of student classroom behaviors and a baseline model with boundary semantic awareness 学生课堂行为的密集视频字幕数据集和具有边界语义意识的基线模型

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-07-26 DOI: 10.1016/j.displa.2024.102804

Yong Wu , Jinyu Tian , HuiJun Liu , Yuanyan Tang

Dense video captioning automatically locates events in untrimmed videos and describes event contents through natural language. This task has many potential applications, including security, assisting people who are visually impaired, and video retrieval. The related datasets constitute an important foundation for research on data-driven methods. However, the existing models for building dense video caption datasets were designed for the universal domain, often ignoring the characteristics and requirements of a specific domain. In addition, the one-way dataset construction process cannot form a closed-loop iterative scheme to improve the quality of the dataset. Therefore, this paper proposes a novel dataset construction model that is suitable for classroom-specific scenarios. On this basis, the Dense Video Caption Dataset of Student Classroom Behaviors (SCB-DVC) is constructed. Additionally, the existing dense video captioning methods typically utilize only temporal event boundaries as direct supervisory information during localization and fail to consider semantic information. This results in a limited correlation between the localization and captioning stages. This defect makes it more difficult to locate events in videos with oversmooth boundaries (due to the excessive similarity between the foregrounds and backgrounds (temporal domains) of events). Therefore, we propose a fine-grained semantic-aware assisted boundary localization-based dense video captioning method. This method enhances the ability to effectively learn the differential features between the foreground and background of an event by introducing semantic-aware information. It can provide increased boundary perception and achieve more accurate captions. Experimental results show that the proposed method performs well on both the SCB-DVC dataset and public datasets (ActivityNet Captions, YouCook2 and TACoS). We will release the SCB-DVC dataset soon.

密集视频字幕可以自动定位未剪辑视频中的事件，并通过自然语言描述事件内容。这项任务有许多潜在的应用，包括安全、帮助视障人士和视频检索。相关数据集是研究数据驱动方法的重要基础。然而，现有的密集视频字幕数据集构建模型都是针对通用领域设计的，往往忽略了特定领域的特点和要求。此外，单向的数据集构建过程无法形成闭环迭代方案来提高数据集的质量。因此，本文提出了一种适用于教室特定场景的新型数据集构建模型。在此基础上，构建了学生课堂行为密集视频字幕数据集（SCB-DVC）。此外，现有的密集视频字幕方法在定位过程中通常只利用时间事件边界作为直接监督信息，而不考虑语义信息。这导致定位和字幕制作阶段之间的相关性有限。这一缺陷使得在边界过于光滑的视频中定位事件变得更加困难（由于事件的前景和背景（时域）之间的相似性过高）。因此，我们提出了一种基于细粒度语义感知辅助边界定位的密集视频字幕制作方法。该方法通过引入语义感知信息，增强了有效学习事件前景和背景之间差异特征的能力。它能提高边界感知能力，实现更准确的字幕。实验结果表明，所提出的方法在 SCB-DVC 数据集和公共数据集（ActivityNet Captions、YouCook2 和 TACoS）上都表现良好。我们将很快发布 SCB-DVC 数据集。

{"title":"A dense video caption dataset of student classroom behaviors and a baseline model with boundary semantic awareness","authors":"Yong Wu , Jinyu Tian , HuiJun Liu , Yuanyan Tang","doi":"10.1016/j.displa.2024.102804","DOIUrl":"10.1016/j.displa.2024.102804","url":null,"abstract":"<div><p>Dense video captioning automatically locates events in untrimmed videos and describes event contents through natural language. This task has many potential applications, including security, assisting people who are visually impaired, and video retrieval. The related datasets constitute an important foundation for research on data-driven methods. However, the existing models for building dense video caption datasets were designed for the universal domain, often ignoring the characteristics and requirements of a specific domain. In addition, the one-way dataset construction process cannot form a closed-loop iterative scheme to improve the quality of the dataset. Therefore, this paper proposes a novel dataset construction model that is suitable for classroom-specific scenarios. On this basis, the Dense Video Caption Dataset of Student Classroom Behaviors (SCB-DVC) is constructed. Additionally, the existing dense video captioning methods typically utilize only temporal event boundaries as direct supervisory information during localization and fail to consider semantic information. This results in a limited correlation between the localization and captioning stages. This defect makes it more difficult to locate events in videos with oversmooth boundaries (due to the excessive similarity between the foregrounds and backgrounds (temporal domains) of events). Therefore, we propose a fine-grained semantic-aware assisted boundary localization-based dense video captioning method. This method enhances the ability to effectively learn the differential features between the foreground and background of an event by introducing semantic-aware information. It can provide increased boundary perception and achieve more accurate captions. Experimental results show that the proposed method performs well on both the SCB-DVC dataset and public datasets (ActivityNet Captions, YouCook2 and TACoS). We will release the SCB-DVC dataset soon.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102804"},"PeriodicalIF":3.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141959363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CLIP2TF:Multimodal video–text retrieval for adolescent education CLIP2TF：面向青少年教育的多模态视频-文本检索

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-07-25 DOI: 10.1016/j.displa.2024.102801

Xiaoning Sun, Tao Fan, Hongxu Li, Guozhong Wang, Peien Ge, Xiwu Shang

With the rapid advancement of artificial intelligence technology, particularly within the sphere of adolescent education, a continual emergence of new challenges and opportunities is observed. The current educational system increasingly requires the automation of teaching activities detection and evaluation, offering fresh perspectives for enhancing the quality of adolescent education. Although large-scale models are receiving significant attention in educational research, their high demand for computational resources and limitations in specific applications constrain their widespread use in analyzing educational video content, especially when handling multimodal data. Current multimodal contrastive learning methods, which integrate video, audio, and text information, have achieved certain successes in video–text retrieval tasks. However, these methods typically employ simpler weighted fusion strategies and fail to avoid noise and information redundancy. Therefore, our study proposes a novel network framework, CLIP2TF, which includes an efficient audio–visual fusion encoder. It aims to dynamically interact and integrate visual and audio features, further enhancing the visual features that may be missing or insufficient in specific teaching scenarios while effectively reducing redundant information transfer during the modality fusion process. Through ablation experiments on the MSRVTT and MSVD datasets, we first demonstrate the effectiveness of CLIP2TF in video–text retrieval tasks. Subsequent tests on teaching video datasets further proves the applicability of the proposed method. This research not only showcases the potential of artificial intelligence in the automated assessment of teaching quality but also provides new directions for research in related fields studies.

随着人工智能技术的快速发展，尤其是在青少年教育领域，新的挑战和机遇不断涌现。当前的教育系统越来越需要实现教学活动检测和评估的自动化，这为提高青少年教育质量提供了新的视角。虽然大规模模型在教育研究中备受关注，但其对计算资源的高要求和在具体应用中的局限性限制了其在教育视频内容分析中的广泛应用，尤其是在处理多模态数据时。目前的多模态对比学习方法整合了视频、音频和文本信息，在视频文本检索任务中取得了一定的成功。然而，这些方法通常采用较简单的加权融合策略，无法避免噪声和信息冗余。因此，我们的研究提出了一个新颖的网络框架 CLIP2TF，其中包括一个高效的视听融合编码器。它旨在动态交互和整合视觉与音频特征，进一步增强特定教学场景中可能缺失或不足的视觉特征，同时有效减少模态融合过程中的冗余信息传输。通过对 MSRVTT 和 MSVD 数据集的消融实验，我们首先证明了 CLIP2TF 在视频-文本检索任务中的有效性。随后在教学视频数据集上的测试进一步证明了所提方法的适用性。这项研究不仅展示了人工智能在自动评估教学质量方面的潜力，也为相关领域的研究提供了新的方向。

{"title":"CLIP2TF:Multimodal video–text retrieval for adolescent education","authors":"Xiaoning Sun, Tao Fan, Hongxu Li, Guozhong Wang, Peien Ge, Xiwu Shang","doi":"10.1016/j.displa.2024.102801","DOIUrl":"10.1016/j.displa.2024.102801","url":null,"abstract":"<div><p>With the rapid advancement of artificial intelligence technology, particularly within the sphere of adolescent education, a continual emergence of new challenges and opportunities is observed. The current educational system increasingly requires the automation of teaching activities detection and evaluation, offering fresh perspectives for enhancing the quality of adolescent education. Although large-scale models are receiving significant attention in educational research, their high demand for computational resources and limitations in specific applications constrain their widespread use in analyzing educational video content, especially when handling multimodal data. Current multimodal contrastive learning methods, which integrate video, audio, and text information, have achieved certain successes in video–text retrieval tasks. However, these methods typically employ simpler weighted fusion strategies and fail to avoid noise and information redundancy. Therefore, our study proposes a novel network framework, CLIP2TF, which includes an efficient audio–visual fusion encoder. It aims to dynamically interact and integrate visual and audio features, further enhancing the visual features that may be missing or insufficient in specific teaching scenarios while effectively reducing redundant information transfer during the modality fusion process. Through ablation experiments on the MSRVTT and MSVD datasets, we first demonstrate the effectiveness of CLIP2TF in video–text retrieval tasks. Subsequent tests on teaching video datasets further proves the applicability of the proposed method. This research not only showcases the potential of artificial intelligence in the automated assessment of teaching quality but also provides new directions for research in related fields studies.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102801"},"PeriodicalIF":3.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor ICEAP：带有增强型属性预测器的高级细粒度图像字幕网络

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-07-24 DOI: 10.1016/j.displa.2024.102798

Md. Bipul Hossen, Zhongfu Ye, Amr Abdussalam, Mohammad Alamgir Hossain

Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.

细粒度图像字幕是视觉转语言任务中的一个焦点，在生成准确且与上下文相关的图像字幕方面引起了广泛关注。有效的属性预测及其利用在提高图像标题性能方面起着至关重要的作用。尽管之前与属性相关的方法取得了进展，但这些方法要么侧重于预测与输入图像相关的属性，要么侧重于在语言模型的每个时间步骤中预测与语言上下文相关的属性。然而，这些方法往往忽视了平衡视觉和语言上下文的重要性，从而导致语义信息的无效利用和随之而来的性能下降。为了解决这些问题，我们引入了独立属性预测器（IAP），通过利用视觉对象和属性嵌入之间的关系来精确预测与输入图像相关的属性。随后，又提出了增强型属性预测器（EAP），首先预测与语言上下文相关的属性，然后利用 IAP 模块的先验概率重新平衡图像和语言上下文相关属性，从而生成更稳健、更增强的属性概率。这些经过改进的属性随后被整合到语言 LSTM 层，以确保在每个时间步骤中进行准确的单词预测。在我们提出的图像字幕增强属性预测器（ICEAP）模型中，IAP 和 EAP 模块的集成有效地整合了高层语义细节，从而提高了模型的整体性能。通过交叉熵优化，ICEAP 的表现优于同类模型，其在 MS-COCO 数据集、Flickr30K 数据集和 Flickr8K 数据集上的 CIDEr-D 得分平均提高了 10.62%，Flickr30K 数据集提高了 9.63%，Flickr8K 数据集提高了 7.74%，定性分析证实了其生成细粒度标题的能力。

{"title":"ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor","authors":"Md. Bipul Hossen, Zhongfu Ye, Amr Abdussalam, Mohammad Alamgir Hossain","doi":"10.1016/j.displa.2024.102798","DOIUrl":"10.1016/j.displa.2024.102798","url":null,"abstract":"<div><p>Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102798"},"PeriodicalIF":3.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141951620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DBMKA-Net:Dual branch multi-perception kernel adaptation for underwater image enhancement DBMKA-Net：用于水下图像增强的双分支多感知内核适配

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-07-22 DOI: 10.1016/j.displa.2024.102797

Hongjian Wang, Suting Chen

In recent years, due to the dependence on wavelength-based light absorption and scattering, underwater photographs captured by devices often exhibit characteristics such as blurriness, faded color tones, and low contrast. To address these challenges, convolutional neural networks (CNNs) with their robust feature-capturing capabilities and adaptable structures have been employed for underwater image enhancement. However, most CNN-based studies on underwater image enhancement have not taken into account color space kernel convolution adaptability, which can significantly enhance the model’s expressive capacity. Building upon current academic research on adjusting the color space size for each perceptual field, this paper introduces a Double-Branch Multi-Perception Kernel Adaptive (DBMKA) model. A DBMKA module is constructed through two perceptual branches that adapt the kernels: channel features and local image entropy. Additionally, considering the pronounced attenuation of the red channel in underwater images, a Dependency-Capturing Feature Jump Connection module (DCFJC) has been designed to capture the red channel’s dependence on the blue and green channels for compensation. Its skip mechanism effectively preserves color contextual information. To better utilize the extracted features for enhancing underwater images, a Cross-Level Attention Feature Fusion (CLAFF) module has been designed. With the Double-Branch Multi-Perception Kernel Adaptive model, Dependency-Capturing Skip Connection module, and Cross-Level Adaptive Feature Fusion module, this network can effectively enhance various types of underwater images. Qualitative and quantitative evaluations were conducted on the UIEB and EUVP datasets. In the color correction comparison experiments, our method demonstrated a more uniform red channel distribution across all gray levels, maintaining color consistency and naturalness. Regarding image information entropy (IIE) and average gradient (AG), the data confirmed our method’s superiority in preserving image details. Furthermore, our proposed method showed performance improvements exceeding 10% on other metrics like MSE and UCIQE, further validating its effectiveness and accuracy.

近年来，由于对基于波长的光吸收和散射的依赖，设备拍摄的水下照片往往表现出模糊、色调褪色和对比度低等特征。为了应对这些挑战，卷积神经网络（CNN）凭借其强大的特征捕捉能力和适应性强的结构被用于水下图像增强。然而，大多数基于卷积神经网络的水下图像增强研究都没有考虑色彩空间内核卷积的适应性，而这种适应性可以显著增强模型的表达能力。在目前学术界关于调整各感知领域色彩空间大小的研究基础上，本文介绍了双分支多感知核自适应（DBMKA）模型。DBMKA 模块是通过两个感知分支来构建的，这两个分支分别对通道特征和局部图像熵进行内核自适应。此外，考虑到水下图像中红色通道的明显衰减，还设计了依赖捕捉特征跳转连接模块（DCFJC），以捕捉红色通道对蓝色和绿色通道的依赖性，从而进行补偿。其跳转机制可有效保留色彩上下文信息。为了更好地利用提取的特征来增强水下图像，我们设计了一个跨级别注意力特征融合（CLAFF）模块。通过双分支多感知内核自适应模型、依赖捕捉跳转连接模块和跨层自适应特征融合模块，该网络可有效增强各类水下图像。在 UIEB 和 EUVP 数据集上进行了定性和定量评估。在色彩校正对比实验中，我们的方法在各灰度级的红色通道分布更加均匀，保持了色彩的一致性和自然度。在图像信息熵（IIE）和平均梯度（AG）方面，数据证实了我们的方法在保留图像细节方面的优势。此外，我们提出的方法在 MSE 和 UCIQE 等其他指标上的性能改进超过了 10%，进一步验证了其有效性和准确性。

{"title":"DBMKA-Net:Dual branch multi-perception kernel adaptation for underwater image enhancement","authors":"Hongjian Wang, Suting Chen","doi":"10.1016/j.displa.2024.102797","DOIUrl":"10.1016/j.displa.2024.102797","url":null,"abstract":"<div><p>In recent years, due to the dependence on wavelength-based light absorption and scattering, underwater photographs captured by devices often exhibit characteristics such as blurriness, faded color tones, and low contrast. To address these challenges, convolutional neural networks (CNNs) with their robust feature-capturing capabilities and adaptable structures have been employed for underwater image enhancement. However, most CNN-based studies on underwater image enhancement have not taken into account color space kernel convolution adaptability, which can significantly enhance the model’s expressive capacity. Building upon current academic research on adjusting the color space size for each perceptual field, this paper introduces a Double-Branch Multi-Perception Kernel Adaptive (DBMKA) model. A DBMKA module is constructed through two perceptual branches that adapt the kernels: channel features and local image entropy. Additionally, considering the pronounced attenuation of the red channel in underwater images, a Dependency-Capturing Feature Jump Connection module (DCFJC) has been designed to capture the red channel’s dependence on the blue and green channels for compensation. Its skip mechanism effectively preserves color contextual information. To better utilize the extracted features for enhancing underwater images, a Cross-Level Attention Feature Fusion (CLAFF) module has been designed. With the Double-Branch Multi-Perception Kernel Adaptive model, Dependency-Capturing Skip Connection module, and Cross-Level Adaptive Feature Fusion module, this network can effectively enhance various types of underwater images. Qualitative and quantitative evaluations were conducted on the UIEB and EUVP datasets. In the color correction comparison experiments, our method demonstrated a more uniform red channel distribution across all gray levels, maintaining color consistency and naturalness. Regarding image information entropy (IIE) and average gradient (AG), the data confirmed our method’s superiority in preserving image details. Furthermore, our proposed method showed performance improvements exceeding 10% on other metrics like MSE and UCIQE, further validating its effectiveness and accuracy.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102797"},"PeriodicalIF":3.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141847192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-threshold image segmentation using new strategies enhanced whale optimization for lupus nephritis pathological images 针对狼疮性肾炎病理图像的多阈值图像分割新策略--增强鲸鱼优化法

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-07-20 DOI: 10.1016/j.displa.2024.102799

Jinge Shi , Yi Chen , Chaofan Wang , Ali Asghar Heidari , Lei Liu , Huiling Chen , Xiaowei Chen , Li Sun

Lupus Nephritis (LN) has been considered as the most prevalent form of systemic lupus erythematosus. Medical imaging plays an important role in diagnosing and treating LN, which can help doctors accurately assess the extent and extent of the lesion. However, relying solely on visual observation and judgment can introduce subjectivity and errors, especially for complex pathological images. Image segmentation techniques are used to differentiate various tissues and structures in medical images to assist doctors in diagnosis. Multi-threshold Image Segmentation (MIS) has gained widespread recognition for its direct and practical application. However, existing MIS methods still have some issues. Therefore, this study combines non-local means, 2D histogram, and 2D Renyi’s entropy to improve the performance of MIS methods. Additionally, this study introduces an improved variant of the Whale Optimization Algorithm (GTMWOA) to optimize the aforementioned MIS methods and reduce algorithm complexity. The GTMWOA fusions Gaussian Exploration (GE), Topology Mapping (TM), and Magnetic Liquid Climbing (MLC). The GE effectively amplifies the algorithm’s proficiency in local exploration and quickens the convergence rate. The TM facilitates the algorithm in escaping local optima, while the MLC mechanism emulates the physical phenomenon of MLC, refining the algorithm’s convergence precision. This study conducted an extensive series of tests using the IEEE CEC 2017 benchmark functions to demonstrate the superior performance of GTMWOA in addressing intricate optimization problems. Furthermore, this study executed an experiment using Berkeley images and LN images to verify the superiority of GTMWOA in MIS. The ultimate outcomes of the MIS experiments substantiate the algorithm’s advanced capabilities and robustness in handling complex optimization problems.

狼疮性肾炎（LN）被认为是系统性红斑狼疮中最常见的一种。医学影像在诊断和治疗 LN 方面发挥着重要作用，可以帮助医生准确评估病变的范围和程度。然而，仅仅依靠肉眼观察和判断会带来主观性和误差，尤其是对于复杂的病理图像。图像分割技术可用于区分医学图像中的各种组织和结构，从而帮助医生进行诊断。多阈值图像分割（MIS）因其直接和实际的应用而得到广泛认可。然而，现有的多阈值图像分割方法仍存在一些问题。因此，本研究将非局部均值、二维直方图和二维仁义熵结合起来，以提高 MIS 方法的性能。此外，本研究还引入了鲸鱼优化算法（GTMWOA）的改进变体，以优化上述 MIS 方法并降低算法复杂度。GTMWOA 融合了高斯探索 (GE)、拓扑映射 (TM) 和磁液爬升 (MLC)。高斯探索有效提高了算法在局部探索方面的能力，并加快了收敛速度。TM有助于算法摆脱局部最优状态，而MLC机制则模拟了MLC的物理现象，提高了算法的收敛精度。本研究使用 IEEE CEC 2017 基准函数进行了一系列广泛的测试，以证明 GTMWOA 在解决复杂优化问题方面的卓越性能。此外，本研究还使用伯克利图像和 LN 图像进行了实验，以验证 GTMWOA 在 MIS 中的优越性。MIS 实验的最终结果证实了该算法在处理复杂优化问题时的先进能力和鲁棒性。

{"title":"Multi-threshold image segmentation using new strategies enhanced whale optimization for lupus nephritis pathological images","authors":"Jinge Shi , Yi Chen , Chaofan Wang , Ali Asghar Heidari , Lei Liu , Huiling Chen , Xiaowei Chen , Li Sun","doi":"10.1016/j.displa.2024.102799","DOIUrl":"10.1016/j.displa.2024.102799","url":null,"abstract":"<div><p>Lupus Nephritis (LN) has been considered as the most prevalent form of systemic lupus erythematosus. Medical imaging plays an important role in diagnosing and treating LN, which can help doctors accurately assess the extent and extent of the lesion. However, relying solely on visual observation and judgment can introduce subjectivity and errors, especially for complex pathological images. Image segmentation techniques are used to differentiate various tissues and structures in medical images to assist doctors in diagnosis. Multi-threshold Image Segmentation (MIS) has gained widespread recognition for its direct and practical application. However, existing MIS methods still have some issues. Therefore, this study combines non-local means, 2D histogram, and 2D Renyi’s entropy to improve the performance of MIS methods. Additionally, this study introduces an improved variant of the Whale Optimization Algorithm (GTMWOA) to optimize the aforementioned MIS methods and reduce algorithm complexity. The GTMWOA fusions Gaussian Exploration (GE), Topology Mapping (TM), and Magnetic Liquid Climbing (MLC). The GE effectively amplifies the algorithm’s proficiency in local exploration and quickens the convergence rate. The TM facilitates the algorithm in escaping local optima, while the MLC mechanism emulates the physical phenomenon of MLC, refining the algorithm’s convergence precision. This study conducted an extensive series of tests using the IEEE CEC 2017 benchmark functions to demonstrate the superior performance of GTMWOA in addressing intricate optimization problems. Furthermore, this study executed an experiment using Berkeley images and LN images to verify the superiority of GTMWOA in MIS. The ultimate outcomes of the MIS experiments substantiate the algorithm’s advanced capabilities and robustness in handling complex optimization problems.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102799"},"PeriodicalIF":3.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0