Displays最新文献_第8页

Future perspectives of digital twin technology in orthodontics 数字孪生技术在正畸学中的未来展望

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-24 DOI: 10.1016/j.displa.2024.102818

Yanning Ma , Yiran Li , Xulin Liu , Jie Gao , Axian Wang , Haiwen chen , Zhi Liu , Zuolin Jin

Orthodontic treatment is a subject of prevention, treatment and prognostic care for malocclusion. The complexity and multimodality of orthodontic treatment restrict the development of intelligent orthodontics, due to the different growth and development characteristics of patients with different treatment stages and the different prognosis of treatment at different stages. Digital twin technology can effectively solve the problems of orthodontics ，due to its ability to interpret the deep information of big data. Building upon medical knowledge, this paper succinctly summarizes the application of digital twin technology in key areas of orthodontics, including precision Medicine, personalized orthodontic treatment, prediction of soft and hard tissues before and after orthodontic treatment, and the orthodontic cloud platform. The study provides a feasibility analysis of an intelligent predictive model for orthodontic treatment under multimodal fusion, offering a robust solution for establishing a comprehensive digital twin-assisted diagnostic paradigm.

正畸治疗是错颌畸形预防、治疗和预后护理的一门学科。由于不同治疗阶段患者的生长发育特点不同，不同阶段的治疗预后不同，正畸治疗的复杂性和多模态性制约了智能正畸的发展。数字孪生技术能够有效解决口腔正畸的问题，因为它能够解读大数据的深层信息。本文以医学知识为基础，简明扼要地总结了数字孪生技术在口腔正畸关键领域的应用，包括精准医学、个性化正畸治疗、正畸治疗前后软硬组织预测、正畸云平台等。该研究对多模态融合下的正畸治疗智能预测模型进行了可行性分析，为建立全面的数字孪生辅助诊断范式提供了稳健的解决方案。

引用次数: 0

Label-aware aggregation on heterophilous graphs for node representation learning 用于节点表示学习的异亲图上的标签感知聚合

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-24 DOI: 10.1016/j.displa.2024.102817

Linruo Liu , Yangtao Wang , Yanzhao Xie , Xin Tan , Lizhuang Ma , Maobin Tang , Meie Fang

Learning node representation on heterophilous graphs has been challenging due to nodes with diverse labels/attributes being connected. The main idea is to balance contributions between the center node and neighborhoods. However, existing methods failed to make full use of personalized contributions of different neighborhoods based on whether they own the same label as the center node, making it necessary to explore the distinctive contributions of similar/dissimilar neighborhoods. We reveal that both similar/dissimilar neighborhoods have positive impacts on feature aggregation under different homophily ratios. Especially, dissimilar neighborhoods play a significant role under low homophily ratios. Based on this, we propose LAAH, a label-aware aggregation approach for node representation learning on heterophilous graphs. LAAH separates each center node from its neighborhoods and generates their own node representations. Additionally, for each neighborhood, LAAH records its label information based on whether it belongs to the same class as the center node and then aggregates its effective feature in a weighted manner. Finally, a learnable parameter is used to balance the contributions of each center node and all its neighborhoods, leading to updated representations. Extensive experiments on 8 real-world heterophilous datasets and a synthetic dataset verify that LAAH can achieve competitive or superior accuracy in node classification with lower parameter scale and computational complexity compared with the SOTA methods. The code is released at GitHub: https://github.com/laah123graph/LAAH.

由于具有不同标签/属性的节点相互连接，在异嗜图中学习节点表示一直是一项挑战。其主要思路是平衡中心节点和邻域之间的贡献。然而，现有方法未能充分利用不同邻域根据是否与中心节点拥有相同标签而做出的个性化贡献，因此有必要探索相似/不相似邻域的独特贡献。我们发现，在不同的同亲比率下，相似/不相似邻域对特征聚合都有积极影响。特别是，在低同源性比率下，不相似邻域发挥着重要作用。在此基础上，我们提出了一种用于异亲图节点表示学习的标签感知聚合方法--LAAH。LAAH 将每个中心节点从其邻域中分离出来，并生成各自的节点表示。此外，对于每个邻域，LAAH 会根据其是否与中心节点属于同一类别记录其标签信息，然后以加权方式聚合其有效特征。最后，利用一个可学习的参数来平衡每个中心节点及其所有邻域的贡献，从而更新表征。在 8 个真实世界的嗜异性数据集和一个合成数据集上进行的广泛实验验证了 LAAH 能够在节点分类方面达到具有竞争力或更高的准确度，而且与 SOTA 方法相比，参数规模和计算复杂度更低。代码发布在 GitHub：https://github.com/laah123graph/LAAH。

{"title":"Label-aware aggregation on heterophilous graphs for node representation learning","authors":"Linruo Liu , Yangtao Wang , Yanzhao Xie , Xin Tan , Lizhuang Ma , Maobin Tang , Meie Fang","doi":"10.1016/j.displa.2024.102817","DOIUrl":"10.1016/j.displa.2024.102817","url":null,"abstract":"<div><p>Learning node representation on heterophilous graphs has been challenging due to nodes with diverse labels/attributes being connected. The main idea is to balance contributions between the center node and neighborhoods. However, existing methods failed to make full use of personalized contributions of different neighborhoods based on whether they own the same label as the center node, making it necessary to explore the distinctive contributions of similar/dissimilar neighborhoods. We reveal that both similar/dissimilar neighborhoods have positive impacts on feature aggregation under different homophily ratios. Especially, dissimilar neighborhoods play a significant role under low homophily ratios. Based on this, we propose LAAH, a label-aware aggregation approach for node representation learning on heterophilous graphs. LAAH separates each center node from its neighborhoods and generates their own node representations. Additionally, for each neighborhood, LAAH records its label information based on whether it belongs to the same class as the center node and then aggregates its effective feature in a weighted manner. Finally, a learnable parameter is used to balance the contributions of each center node and all its neighborhoods, leading to updated representations. Extensive experiments on 8 real-world heterophilous datasets and a synthetic dataset verify that LAAH can achieve competitive or superior accuracy in node classification with lower parameter scale and computational complexity compared with the SOTA methods. The code is released at GitHub: <span><span>https://github.com/laah123graph/LAAH</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102817"},"PeriodicalIF":3.7,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142084113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pig-DTpV: A prior information guided directional TpV algorithm for orthogonal translation computed laminography Pig-DTpV：用于正交平移计算机层析成像的先验信息指导定向 TpV 算法

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-22 DOI: 10.1016/j.displa.2024.102812

Yarui Xi , Zhiwei Qiao , Ao Wang , Chenyun Fang , Fenglin Liu

The local scanning orthogonal translation computed laminography (OTCL) has great potential for tiny fault detection of laminated structure thin-plate parts. However, it generates limited-angle and truncated projection data, which result in aliasing and truncation artifacts in the reconstructed images. The directional total variation (DTV) algorithm has been demonstrated to achieve highly accurate reconstructed images in limited-angle computed tomography (CT). However, its application in local scanning OTCL has not been explored. Based on this algorithm, we introduce the $l_{p}$ norm to better suppress artifacts, and prior information to further constrain the reconstructed image. Thus, we propose a prior information guided directional total p-variation (DTpV) algorithm (Pig-DTpV). The Pig-DTpV model is a constrained non-convex optimization model. The constraint term are the six DTpV terms, whereas the objective term is the data fidelity term. Then, we use the iterative reweighting strategy and the Chambolle–Pock (CP) algorithm to solve the model. The Pig-DTpV reconstruction algorithm’s performance is compared with other algorithms such as simultaneous algebraic reconstruction technique (SART), TV, reweighted anisotropic-TV (RwATV), and DTV in simulation and real data experiments. The experiment results demonstrate that the Pig-DTpV algorithm can reduce truncation and aliasing artifacts and enhance the quality of reconstructed images.

局部扫描正交平移计算层析成像（OTCL）在层状结构薄板部件的微小故障检测方面具有巨大潜力。然而，它生成的投影数据角度有限且截断，导致重建图像中出现混叠和截断伪影。定向总变化（DTV）算法已被证明能在有限角度计算机断层扫描（CT）中获得高精度的重建图像。然而，该算法在局部扫描 OTCL 中的应用尚未得到探索。在此算法的基础上，我们引入了 lp 准则来更好地抑制伪影，并引入先验信息来进一步约束重建图像。因此，我们提出了一种先验信息引导的定向总 p 变异（DTpV）算法（Pig-DTpV）。Pig-DTpV 模型是一个受约束的非凸优化模型。约束项是六个 DTpV 项，目标项是数据保真度项。然后，我们使用迭代重权策略和 Chambolle-Pock (CP) 算法来求解该模型。在模拟和真实数据实验中，我们比较了 Pig-DTpV 重建算法与其他算法的性能，如同步代数重建技术（SART）、TV、重加权各向异性-TV（RwATV）和 DTV。实验结果表明，Pig-DTpV 算法可以减少截断和混叠伪影，提高重建图像的质量。

{"title":"Pig-DTpV: A prior information guided directional TpV algorithm for orthogonal translation computed laminography","authors":"Yarui Xi , Zhiwei Qiao , Ao Wang , Chenyun Fang , Fenglin Liu","doi":"10.1016/j.displa.2024.102812","DOIUrl":"10.1016/j.displa.2024.102812","url":null,"abstract":"<div><p>The local scanning orthogonal translation computed laminography (OTCL) has great potential for tiny fault detection of laminated structure thin-plate parts. However, it generates limited-angle and truncated projection data, which result in aliasing and truncation artifacts in the reconstructed images. The directional total variation (DTV) algorithm has been demonstrated to achieve highly accurate reconstructed images in limited-angle computed tomography (CT). However, its application in local scanning OTCL has not been explored. Based on this algorithm, we introduce the <span><math><msub><mrow><mi>l</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span> norm to better suppress artifacts, and prior information to further constrain the reconstructed image. Thus, we propose a prior information guided directional total p-variation (DTpV) algorithm (Pig-DTpV). The Pig-DTpV model is a constrained non-convex optimization model. The constraint term are the six DTpV terms, whereas the objective term is the data fidelity term. Then, we use the iterative reweighting strategy and the Chambolle–Pock (CP) algorithm to solve the model. The Pig-DTpV reconstruction algorithm’s performance is compared with other algorithms such as simultaneous algebraic reconstruction technique (SART), TV, reweighted anisotropic-TV (RwATV), and DTV in simulation and real data experiments. The experiment results demonstrate that the Pig-DTpV algorithm can reduce truncation and aliasing artifacts and enhance the quality of reconstructed images.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102812"},"PeriodicalIF":3.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142058186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generative adversarial networks with deep blind degradation powered terahertz ptychography 具有深度盲降功能的生成式对抗网络驱动太赫兹层析成像技术

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-21 DOI: 10.1016/j.displa.2024.102815

Ziwei Ming , Defeng Liu , Long Xiao , Siyu Tu , Peng Chen , Yingshan Ma , Jinsong Liu , Zhengang Yang , Kejia Wang

Ptychography is an imaging technique that uses the redundancy of information generated by the overlapping of adjacent light regions to calculate the relative phase of adjacent regions and reconstruct the image. In the terahertz domain, in order to make the ptychography technology better serve engineering applications, we propose a set of deep learning terahertz ptychography system that is easier to realize in engineering and plays an outstanding role. To address this issue, we propose to use a powerful deep blind degradation model which uses isotropic and anisotropic Gaussian kernels for random blurring, chooses the downsampling modes from nearest interpolation, bilinear interpolation, bicubic interpolation and down-up-sampling method, and introduces Gaussian noise, JPEG compression noise, and processed detector noise. Additionally, a random shuffle strategy is used to further expand the degradation space of the image. Using paired low/high resolution images generated by the deep blind degradation model, we trained a multi-layer residual network with residual scaling parameters and dense connection structure to achieve the neural network super-resolution of terahertz ptychography for the first time. We use two representative neural networks, SwinIR and RealESRGAN, to compare with our model. Experimental result shows that the proposed method achieved better accuracy and visual improvement than other terahertz ptychographic image super-resolution algorithms. Further quantitative calculation proved that our method has significant advantages in terahertz ptychographic image super-resolution, achieving a resolution of 33.09 dB on the peak signal-to-noise ratio (PSNR) index and 3.05 on the naturalness image quality estimator (NIQE) index. This efficient and engineered approach fills the gap in the improvement of terahertz ptychography by using neural networks.

相位差成像技术是一种利用相邻光区重叠产生的冗余信息来计算相邻光区的相对相位并重建图像的成像技术。在太赫兹领域，为了让平差成像技术更好地服务于工程应用，我们提出了一套更容易在工程中实现且作用突出的深度学习太赫兹平差成像系统。针对这一问题，我们提出使用强大的深度盲退化模型，该模型使用各向同性和各向异性高斯核进行随机模糊，从最近插值法、双线性插值法、双三次插值法和下上采样法中选择下采样模式，并引入高斯噪声、JPEG 压缩噪声和处理后的检测器噪声。此外，还使用了随机洗牌策略来进一步扩大图像的降解空间。利用深度盲退化模型生成的成对低/高分辨率图像，我们训练了一个具有残差缩放参数和密集连接结构的多层残差网络，首次实现了太赫兹拼接图像的神经网络超分辨率。我们使用两个具有代表性的神经网络 SwinIR 和 RealESRGAN 与我们的模型进行比较。实验结果表明，与其他太赫兹拼接图像超分辨算法相比，我们提出的方法获得了更好的精度和视觉效果。进一步的定量计算证明，我们的方法在太赫兹梯形图像超分辨方面具有显著优势，在峰值信噪比（PSNR）指标上实现了 33.09 dB 的分辨率，在自然度图像质量估计器（NIQE）指标上实现了 3.05 的分辨率。这种高效的工程化方法填补了利用神经网络改进太赫兹层析成像技术的空白。

{"title":"Generative adversarial networks with deep blind degradation powered terahertz ptychography","authors":"Ziwei Ming , Defeng Liu , Long Xiao , Siyu Tu , Peng Chen , Yingshan Ma , Jinsong Liu , Zhengang Yang , Kejia Wang","doi":"10.1016/j.displa.2024.102815","DOIUrl":"10.1016/j.displa.2024.102815","url":null,"abstract":"<div><p>Ptychography is an imaging technique that uses the redundancy of information generated by the overlapping of adjacent light regions to calculate the relative phase of adjacent regions and reconstruct the image. In the terahertz domain, in order to make the ptychography technology better serve engineering applications, we propose a set of deep learning terahertz ptychography system that is easier to realize in engineering and plays an outstanding role. To address this issue, we propose to use a powerful deep blind degradation model which uses isotropic and anisotropic Gaussian kernels for random blurring, chooses the downsampling modes from nearest interpolation, bilinear interpolation, bicubic interpolation and down-up-sampling method, and introduces Gaussian noise, JPEG compression noise, and processed detector noise. Additionally, a random shuffle strategy is used to further expand the degradation space of the image. Using paired low/high resolution images generated by the deep blind degradation model, we trained a multi-layer residual network with residual scaling parameters and dense connection structure to achieve the neural network super-resolution of terahertz ptychography for the first time. We use two representative neural networks, SwinIR and RealESRGAN, to compare with our model. Experimental result shows that the proposed method achieved better accuracy and visual improvement than other terahertz ptychographic image super-resolution algorithms. Further quantitative calculation proved that our method has significant advantages in terahertz ptychographic image super-resolution, achieving a resolution of 33.09 dB on the peak signal-to-noise ratio (PSNR) index and 3.05 on the naturalness image quality estimator (NIQE) index. This efficient and engineered approach fills the gap in the improvement of terahertz ptychography by using neural networks.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102815"},"PeriodicalIF":3.7,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatial–angular–epipolar transformer for light field spatial and angular super-resolution 用于光场空间和角度超分辨率的空间-方位-极性变压器

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-20 DOI: 10.1016/j.displa.2024.102816

Sizhe Wang , Hao Sheng , Rongshan Chen , Da Yang , Zhenglong Cui , Ruixuan Cong , Zhang Xiong

Transformer-based light field (LF) super-resolution (SR) methods have recently achieved significant performance improvements due to global feature modeling by self-attention mechanisms. However, as a method designed for natural language processing, 4D LFs are reshaped into 1D sequences with an immense set of tokens, which results in a quadratic computational complexity cost. In this paper, a spatial–angular–epipolar swin transformer (SAEST) is proposed for spatial and angular SR (SASR), which sufficiently extracts SR information in the spatial, angular, and epipolar domains using local self-attention with shifted windows. Specifically, in SAEST, a spatial swin transformer and an angular standard transformer are firstly cascaded to extract spatial and angular SR features, separately. Then, the extracted SR feature is reshaped into the epipolar plane image pattern and fed into an epipolar swin transformer to extract the spatial–angular correlation information. Finally, several SAEST blocks are cascaded in a Unet framework to extract multi-scale SR features for SASR. Experiment results indicate that SAEST is a fast transformer-based SASR method with less running time and GPU consumption and has outstanding performance on simulated and real-world public datasets.

基于变压器的光场（LF）超分辨率（SR）方法通过自我注意机制进行全局特征建模，最近取得了显著的性能提升。然而，作为一种专为自然语言处理而设计的方法，4D 光场被重塑为具有大量标记集的 1D 序列，这导致了二次计算复杂度成本。本文提出了一种用于空间和角度 SR（SASR）的空间-角度-外极性斯温变换器（SAEST），该变换器利用带有移位窗口的局部自注意，充分提取了空间、角度和外极性域中的 SR 信息。具体来说，在 SAEST 中，首先级联空间斯温变换器和角度标准变换器，分别提取空间和角度 SR 特征。然后，将提取的 SR 特征重塑为外极平面图像模式，并输入外极swin 变换器以提取空间-角度相关信息。最后，在 Unet 框架中级联多个 SAEST 模块，为 SASR 提取多尺度 SR 特征。实验结果表明，SAEST 是一种基于变换器的快速 SASR 方法，运行时间和 GPU 消耗较少，在模拟和真实世界公共数据集上表现出色。

{"title":"Spatial–angular–epipolar transformer for light field spatial and angular super-resolution","authors":"Sizhe Wang , Hao Sheng , Rongshan Chen , Da Yang , Zhenglong Cui , Ruixuan Cong , Zhang Xiong","doi":"10.1016/j.displa.2024.102816","DOIUrl":"10.1016/j.displa.2024.102816","url":null,"abstract":"<div><p>Transformer-based light field (LF) super-resolution (SR) methods have recently achieved significant performance improvements due to global feature modeling by self-attention mechanisms. However, as a method designed for natural language processing, 4D LFs are reshaped into 1D sequences with an immense set of tokens, which results in a quadratic computational complexity cost. In this paper, a spatial–angular–epipolar swin transformer (SAEST) is proposed for spatial and angular SR (SASR), which sufficiently extracts SR information in the spatial, angular, and epipolar domains using local self-attention with shifted windows. Specifically, in SAEST, a spatial swin transformer and an angular standard transformer are firstly cascaded to extract spatial and angular SR features, separately. Then, the extracted SR feature is reshaped into the epipolar plane image pattern and fed into an epipolar swin transformer to extract the spatial–angular correlation information. Finally, several SAEST blocks are cascaded in a Unet framework to extract multi-scale SR features for SASR. Experiment results indicate that SAEST is a fast transformer-based SASR method with less running time and GPU consumption and has outstanding performance on simulated and real-world public datasets.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102816"},"PeriodicalIF":3.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142148483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TS-BEV: BEV object detection algorithm based on temporal-spatial feature fusion TS-BEV：基于时空特征融合的 BEV 物体检测算法

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-19 DOI: 10.1016/j.displa.2024.102814

Xinlong Dong , Peicheng Shi , Heng Qi , Aixi Yang , Taonian Liang

In order to accurately identify occluding targets and infer the motion state of objects, we propose a Bird’s-Eye View Object Detection Network based on Temporal-Spatial feature fusion (TS-BEV), which replaces the previous multi-frame sampling method by using the cyclic propagation mode of historical frame instance information. We design a new Temporal-Spatial feature fusion attention module, which fully integrates temporal information and spatial features, and improves the inference and training speed. In response to realize multi-frame feature fusion across multiple scales and views, we propose an efficient Temporal-Spatial deformable aggregation module, which performs feature sampling and weighted summation from multiple feature maps of historical frames and current frames, and makes full use of the parallel computing capabilities of GPUs and AI chips to further improve efficiency. Furthermore, in order to solve the lack of global inference in the context of temporal-spatial fusion BEV features and the inability of instance features distributed in different locations to fully interact, we further design the BEV self-attention mechanism module to perform global operation of features, enhance global inference ability and fully interact with instance features. We have carried out extensive experimental experiments on the challenging BEV object detection nuScenes dataset, quantitative results show that our method achieves excellent performance of 61.5% mAP and 68.5% NDS in camera-only 3D object detection tasks, and qualitative results show that TS-BEV can effectively solve the problem of 3D object detection in complex traffic background with lack of light at night, with good robustness and scalability.

为了准确识别遮挡目标并推断物体的运动状态，我们提出了一种基于时空特征融合的鸟瞰物体检测网络（TS-BEV），它利用历史帧实例信息的循环传播模式取代了以往的多帧采样方法。我们设计了一种新的时空特征融合注意模块，充分整合了时间信息和空间特征，提高了推理和训练速度。为实现跨尺度、跨视角的多帧特征融合，我们提出了高效的时空可变形聚合模块，对历史帧和当前帧的多个特征图进行特征采样和加权求和，并充分利用 GPU 和 AI 芯片的并行计算能力，进一步提高了效率。此外，为了解决时空融合 BEV 特征缺乏全局推理、分布在不同位置的实例特征无法充分交互的问题，我们进一步设计了 BEV 自关注机制模块，对特征进行全局运算，增强全局推理能力，并与实例特征充分交互。我们在具有挑战性的 BEV 物体检测 nuScenes 数据集上进行了大量实验，定量结果表明，我们的方法在仅摄像头的三维物体检测任务中取得了 61.5% mAP 和 68.5% NDS 的优异性能；定性结果表明，TS-BEV 能有效解决夜间光线不足的复杂交通背景下的三维物体检测问题，并具有良好的鲁棒性和可扩展性。

{"title":"TS-BEV: BEV object detection algorithm based on temporal-spatial feature fusion","authors":"Xinlong Dong , Peicheng Shi , Heng Qi , Aixi Yang , Taonian Liang","doi":"10.1016/j.displa.2024.102814","DOIUrl":"10.1016/j.displa.2024.102814","url":null,"abstract":"<div><p>In order to accurately identify occluding targets and infer the motion state of objects, we propose a Bird’s-Eye View Object Detection Network based on Temporal-Spatial feature fusion (TS-BEV), which replaces the previous multi-frame sampling method by using the cyclic propagation mode of historical frame instance information. We design a new Temporal-Spatial feature fusion attention module, which fully integrates temporal information and spatial features, and improves the inference and training speed. In response to realize multi-frame feature fusion across multiple scales and views, we propose an efficient Temporal-Spatial deformable aggregation module, which performs feature sampling and weighted summation from multiple feature maps of historical frames and current frames, and makes full use of the parallel computing capabilities of GPUs and AI chips to further improve efficiency. Furthermore, in order to solve the lack of global inference in the context of temporal-spatial fusion BEV features and the inability of instance features distributed in different locations to fully interact, we further design the BEV self-attention mechanism module to perform global operation of features, enhance global inference ability and fully interact with instance features. We have carried out extensive experimental experiments on the challenging BEV object detection nuScenes dataset, quantitative results show that our method achieves excellent performance of 61.5% mAP and 68.5% NDS in camera-only 3D object detection tasks, and qualitative results show that TS-BEV can effectively solve the problem of 3D object detection in complex traffic background with lack of light at night, with good robustness and scalability.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102814"},"PeriodicalIF":3.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skeuomorphic or flat? The effects of icon style on visual search and recognition performance 天马行空还是平面化？图标风格对视觉搜索和识别性能的影响

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-17 DOI: 10.1016/j.displa.2024.102813

Zhangfan Shen, Tiantian Chen, Yi Wang, Moke Li, Jiaxiang Chen, Zhanpeng Hu

Although there have been many previous studies on icon visual search and recognition performance, only a few have considered the effects of both the internal and external characteristics of icons. In this behavioral study, we employed a visual search task and a semantic recognition task to explore the effects of icon style, semantic distance (SD), and task difficulty on users’ performance in perceiving and identifying icons. First, we created and filtered 64 new icons, which were divided into four different groups (flat design & close SD, flat design & far SD, skeuomorphic design & close SD, skeuomorphic design & far SD) through expert evaluation. A total of 40 participants (13 men and 27 women, ages ranging from 19 to 25 years, mean age = 21.9 years, SD=1.93) were asked to perform an icon visual search task and an icon recognition task after a round of learning. Participants’ accuracy and response time were measured as a function of the following independent variables: two icon styles (flat or skeuomorphic style), two levels of SD (close or far), and two levels of task difficulty (easy or difficult). The results showed that flat icons had better visual search performance than skeuomorphic icons; this beneficial effect increased as the task difficulty increased. However, in the icon recognition task, participants’ performance in recalling skeuomorphic icons was significantly better than that in recalling flat icons. Furthermore, a strong interaction effect between icon style and task difficulty was observed for response time. As the task difficulty decreased, the difference in recognition performance between these two different icon styles increased significantly. These findings provide valuable guidance for the design of icons in human–computer interaction interfaces.

尽管以前有很多关于图标视觉搜索和识别性能的研究，但只有少数研究考虑了图标内部和外部特征的影响。在这项行为研究中，我们采用了视觉搜索任务和语义识别任务来探讨图标风格、语义距离（SD）和任务难度对用户感知和识别图标表现的影响。首先，我们创建并筛选了64个新图标，并通过专家评估将其分为四组（扁平设计& close SD、扁平设计& far SD、skeuomorphic design & close SD、skeuomorphic design & far SD）。共有 40 名参与者（13 名男性和 27 名女性，年龄在 19 至 25 岁之间，平均年龄 = 21.9 岁，SD=1.93）被要求在一轮学习后完成图标视觉搜索任务和图标识别任务。参与者的准确率和反应时间是由以下自变量决定的：两种图标风格（扁平或偏斜风格）、两种标距水平（近或远）和两种任务难度（易或难）。结果表明，扁平图标的视觉搜索性能优于斜体图标；随着任务难度的增加，这种有利影响也在增加。然而，在图标识别任务中，参与者回忆斜体图标的表现明显优于回忆扁平图标的表现。此外，在反应时间方面，图标风格与任务难度之间存在强烈的交互效应。随着任务难度的降低，这两种不同图标风格之间的识别成绩差异明显增大。这些发现为人机交互界面中的图标设计提供了宝贵的指导。

{"title":"Skeuomorphic or flat? The effects of icon style on visual search and recognition performance","authors":"Zhangfan Shen, Tiantian Chen, Yi Wang, Moke Li, Jiaxiang Chen, Zhanpeng Hu","doi":"10.1016/j.displa.2024.102813","DOIUrl":"10.1016/j.displa.2024.102813","url":null,"abstract":"<div><p>Although there have been many previous studies on icon visual search and recognition performance, only a few have considered the effects of both the internal and external characteristics of icons. In this behavioral study, we employed a visual search task and a semantic recognition task to explore the effects of icon style, semantic distance (SD), and task difficulty on users’ performance in perceiving and identifying icons. First, we created and filtered 64 new icons, which were divided into four different groups (flat design & close SD, flat design & far SD, skeuomorphic design & close SD, skeuomorphic design & far SD) through expert evaluation. A total of 40 participants (13 men and 27 women, ages ranging from 19 to 25 years, mean age = 21.9 years, SD=1.93) were asked to perform an icon visual search task and an icon recognition task after a round of learning. Participants’ accuracy and response time were measured as a function of the following independent variables: two icon styles (flat or skeuomorphic style), two levels of SD (close or far), and two levels of task difficulty (easy or difficult). The results showed that flat icons had better visual search performance than skeuomorphic icons; this beneficial effect increased as the task difficulty increased. However, in the icon recognition task, participants’ performance in recalling skeuomorphic icons was significantly better than that in recalling flat icons. Furthermore, a strong interaction effect between icon style and task difficulty was observed for response time. As the task difficulty decreased, the difference in recognition performance between these two different icon styles increased significantly. These findings provide valuable guidance for the design of icons in human–computer interaction interfaces.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102813"},"PeriodicalIF":3.7,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interactive geometry editing of Neural Radiance Fields 神经辐射场的交互式几何编辑

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-13 DOI: 10.1016/j.displa.2024.102810

Shaoxu Li, Ye Pan

Neural Radiance Fields (NeRF) have recently emerged as a promising approach for synthesizing highly realistic images from 3D scenes. This technology has shown impressive results in capturing intricate details and producing photorealistic renderings. However, one of the limitations of traditional NeRF approaches is the difficulty in editing and manipulating the geometry of the scene once it has been captured. This restriction hinders creative freedom and practical applicability.

In this paper, we propose a method that enables interactive geometry editing for neural radiance fields manipulation. We use two proxy cages (inner cage and outer cage) to edit a scene. The inner cage defines the operation target, and the outer cage defines the adjustment space. Various operations apply to the two cages. After cage selection, operations on the inner cage lead to the desired transformation of the inner cage and adjustment of the outer cage. Users can edit the scene with translation, rotation, scaling, or combinations. The operations on the corners and edges of the cage are also supported. Our method does not need any explicit 3D geometry representations. The interactive geometry editing applies directly to the implicit neural radiance fields. Extensive experimental results demonstrate the effectiveness of our approach.

神经辐射场（NeRF）是最近出现的一种从三维场景合成高度逼真图像的有效方法。这项技术在捕捉复杂细节和制作逼真渲染效果方面取得了令人印象深刻的成果。然而，传统 NeRF 方法的局限之一是，一旦捕捉到场景的几何图形，就很难对其进行编辑和处理。在本文中，我们提出了一种可对神经辐射场进行交互式几何编辑的方法。我们使用两个代理笼（内笼和外笼）来编辑场景。内笼定义操作目标，外笼定义调整空间。各种操作都适用于这两个笼子。选择笼子后，对内笼子的操作将导致内笼子的预期变换和外笼子的调整。用户可以对场景进行平移、旋转、缩放或组合编辑。此外，还支持对笼子的角落和边缘进行操作。我们的方法不需要任何明确的 3D 几何图形表示。交互式几何编辑直接应用于隐式神经辐射场。大量实验结果证明了我们方法的有效性。

{"title":"Interactive geometry editing of Neural Radiance Fields","authors":"Shaoxu Li, Ye Pan","doi":"10.1016/j.displa.2024.102810","DOIUrl":"10.1016/j.displa.2024.102810","url":null,"abstract":"<div><p>Neural Radiance Fields (NeRF) have recently emerged as a promising approach for synthesizing highly realistic images from 3D scenes. This technology has shown impressive results in capturing intricate details and producing photorealistic renderings. However, one of the limitations of traditional NeRF approaches is the difficulty in editing and manipulating the geometry of the scene once it has been captured. This restriction hinders creative freedom and practical applicability.</p><p>In this paper, we propose a method that enables interactive geometry editing for neural radiance fields manipulation. We use two proxy cages (inner cage and outer cage) to edit a scene. The inner cage defines the operation target, and the outer cage defines the adjustment space. Various operations apply to the two cages. After cage selection, operations on the inner cage lead to the desired transformation of the inner cage and adjustment of the outer cage. Users can edit the scene with translation, rotation, scaling, or combinations. The operations on the corners and edges of the cage are also supported. Our method does not need any explicit 3D geometry representations. The interactive geometry editing applies directly to the implicit neural radiance fields. Extensive experimental results demonstrate the effectiveness of our approach.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102810"},"PeriodicalIF":3.7,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141978908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards benchmarking VR sickness: A novel methodological framework for assessing contributing factors and mitigation strategies through rapid VR sickness induction and recovery 制定 VR 病症基准：通过快速 VR 病症诱导和恢复评估诱因和缓解策略的新方法框架

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-13 DOI: 10.1016/j.displa.2024.102807

Rose Rouhani , Narmada Umatheva , Jannik Brockerhoff , Behrang Keshavarz , Ernst Kruijff , Jan Gugenheimer , Bernhard E. Riecke

Virtual Reality (VR) sickness remains a significant challenge in the widespread adoption of VR technologies. The absence of a standardized benchmark system hinders progress in understanding and effectively countering VR sickness. This paper proposes an initial step towards a benchmark system, utilizing a novel methodological framework to serve as a common platform for evaluating contributing VR sickness factors and mitigation strategies. Our benchmark, grounded in established theories and leveraging existing research, features both small and large environments. In two research studies, we validated our system by demonstrating its capability to (1) quickly, reliably, and controllably induce VR sickness in both environments, followed by a rapid decline post-stimulus, facilitating cost and time-effective within-subject studies and increased statistical power, (2) integrate and evaluate established VR sickness mitigation methods — static and dynamic field of view reduction, blur, and virtual nose — demonstrating their effectiveness in reducing symptoms in the benchmark and their direct comparison within a standardized setting. Our proposed benchmark also enables broader, more comparative research into different technical, setup, and participant variables influencing VR sickness and overall user experience, ultimately paving the way for building a comprehensive database to identify the most effective strategies for specific VR applications.

虚拟现实（VR）病仍然是广泛采用 VR 技术的一个重大挑战。标准化基准系统的缺失阻碍了在理解和有效应对 VR 病症方面取得进展。本文提出了建立基准系统的第一步，利用一个新颖的方法框架，作为评估导致 VR 病症的因素和缓解策略的通用平台。我们的基准系统以既有理论为基础，利用现有研究，同时具有小型和大型环境的特点。在两项研究中，我们验证了我们的系统，证明其有能力（1）在两种环境中快速、可靠、可控地诱发 VR 晕眩，并在刺激后迅速缓解，从而促进成本和时间效益高的受试者内研究，并提高统计能力，（2）整合和评估既定的 VR 晕眩缓解方法--静态和动态视野缩小、模糊和虚拟鼻子--证明其在基准中减少症状的有效性，并在标准化设置中进行直接比较。我们提出的基准还有助于对影响 VR 病症和整体用户体验的不同技术、设置和参与者变量进行更广泛、更具可比性的研究，最终为建立一个全面的数据库以确定针对特定 VR 应用的最有效策略铺平道路。

{"title":"Towards benchmarking VR sickness: A novel methodological framework for assessing contributing factors and mitigation strategies through rapid VR sickness induction and recovery","authors":"Rose Rouhani , Narmada Umatheva , Jannik Brockerhoff , Behrang Keshavarz , Ernst Kruijff , Jan Gugenheimer , Bernhard E. Riecke","doi":"10.1016/j.displa.2024.102807","DOIUrl":"10.1016/j.displa.2024.102807","url":null,"abstract":"<div><p>Virtual Reality (VR) sickness remains a significant challenge in the widespread adoption of VR technologies. The absence of a standardized benchmark system hinders progress in understanding and effectively countering VR sickness. This paper proposes an initial step towards a benchmark system, utilizing a novel methodological framework to serve as a common platform for evaluating contributing VR sickness factors and mitigation strategies. Our benchmark, grounded in established theories and leveraging existing research, features both small and large environments. In two research studies, we validated our system by demonstrating its capability to (1) quickly, reliably, and controllably induce VR sickness in both environments, followed by a rapid decline post-stimulus, facilitating cost and time-effective within-subject studies and increased statistical power, (2) integrate and evaluate established VR sickness mitigation methods — static and dynamic field of view reduction, blur, and virtual nose — demonstrating their effectiveness in reducing symptoms in the benchmark and their direct comparison within a standardized setting. Our proposed benchmark also enables broader, more comparative research into different technical, setup, and participant variables influencing VR sickness and overall user experience, ultimately paving the way for building a comprehensive database to identify the most effective strategies for specific VR applications.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102807"},"PeriodicalIF":3.7,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001719/pdfft?md5=2e64eaeb33beb05d2ed088ab7163143d&pid=1-s2.0-S0141938224001719-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A feature fusion module based on complementary attention for medical image segmentation 基于互补注意力的医学图像分割特征融合模块

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays

Pub Date : 2024-08-10 DOI: 10.1016/j.displa.2024.102811

Mingyue Yang , Xiaoxuan Dong , Wang Zhang , Peng Xie , Chuan Li , Shanxiong Chen

Automated segmentation algorithms are a crucial component of medical image analysis, playing an essential role in assisting professionals to achieve accurate diagnoses. Traditional convolutional neural networks (CNNs) face challenges when dealing with complex and variable lesions: limited by the receptive field of convolutional operators, CNNs often struggle to capture long-range dependencies of complex lesions. The transformer’s outstanding ability to capture long-range dependencies offers a new perspective on addressing these challenges. Inspired by this, our research aims to combine the precise spatial detail extraction capabilities of CNNs with the global semantic understanding abilities of transformers. Unlike traditional fusion methods, we propose a fine-grained feature fusion strategy based on complementary attention, deeply exploring and complementarily fusing the feature representations of the encoder. Moreover, considering that merely relying on feature fusion might overlook critical texture details and key edge features in the segmentation task, we designed a feature enhancement module based on information entropy. This module emphasizes shallow texture features and edge information, enabling the model to more accurately capture and enhance multi-level details of the image, further optimizing segmentation results. Our method was tested on multiple public segmentation datasets of polyps and skin lesions,and performed better than state-of-the-art methods. Extensive qualitative experimental results indicate that our method maintains robust performance even when faced with challenging cases of narrow or blurry-boundary lesions.

自动分割算法是医学图像分析的重要组成部分，在帮助专业人员实现准确诊断方面发挥着至关重要的作用。传统的卷积神经网络（CNN）在处理复杂多变的病变时面临挑战：受限于卷积算子的感受野，CNN 通常难以捕捉复杂病变的长程依赖关系。变换器捕捉长程依赖关系的出色能力为应对这些挑战提供了新的视角。受此启发，我们的研究旨在将 CNN 的精确空间细节提取能力与变换器的全局语义理解能力相结合。与传统的融合方法不同，我们提出了一种基于互补关注的细粒度特征融合策略，深入探索并互补融合编码器的特征表征。此外，考虑到仅仅依靠特征融合可能会忽略分割任务中的关键纹理细节和关键边缘特征，我们设计了一个基于信息熵的特征增强模块。该模块强调浅层纹理特征和边缘信息，使模型能够更准确地捕捉和增强图像的多层次细节，进一步优化分割结果。我们的方法在多个公开的息肉和皮肤病变分割数据集上进行了测试，其表现优于最先进的方法。广泛的定性实验结果表明，即使面对病变边界狭窄或模糊的挑战情况，我们的方法也能保持稳健的性能。

{"title":"A feature fusion module based on complementary attention for medical image segmentation","authors":"Mingyue Yang , Xiaoxuan Dong , Wang Zhang , Peng Xie , Chuan Li , Shanxiong Chen","doi":"10.1016/j.displa.2024.102811","DOIUrl":"10.1016/j.displa.2024.102811","url":null,"abstract":"<div><p>Automated segmentation algorithms are a crucial component of medical image analysis, playing an essential role in assisting professionals to achieve accurate diagnoses. Traditional convolutional neural networks (CNNs) face challenges when dealing with complex and variable lesions: limited by the receptive field of convolutional operators, CNNs often struggle to capture long-range dependencies of complex lesions. The transformer’s outstanding ability to capture long-range dependencies offers a new perspective on addressing these challenges. Inspired by this, our research aims to combine the precise spatial detail extraction capabilities of CNNs with the global semantic understanding abilities of transformers. Unlike traditional fusion methods, we propose a fine-grained feature fusion strategy based on complementary attention, deeply exploring and complementarily fusing the feature representations of the encoder. Moreover, considering that merely relying on feature fusion might overlook critical texture details and key edge features in the segmentation task, we designed a feature enhancement module based on information entropy. This module emphasizes shallow texture features and edge information, enabling the model to more accurately capture and enhance multi-level details of the image, further optimizing segmentation results. Our method was tested on multiple public segmentation datasets of polyps and skin lesions,and performed better than state-of-the-art methods. Extensive qualitative experimental results indicate that our method maintains robust performance even when faced with challenging cases of narrow or blurry-boundary lesions.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102811"},"PeriodicalIF":3.7,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141997547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0