Computers & Graphics-Uk最新文献_第9页

Executing realistic earthquake simulations in unreal engine with material calibration 通过材料校准在虚幻引擎中执行逼真的地震模拟

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-24 DOI: 10.1016/j.cag.2024.104091

Yitong Sun , Hanchun Wang , Zhejun Zhang , Cyriel Diels , Ali Asadipour

Earthquakes significantly impact societies and economies, underscoring the need for effective search and rescue strategies. As AI and robotics increasingly support these efforts, the demand for high-fidelity, real-time simulation environments for training has become pressing. Earthquake simulation can be considered as a complex system. Traditional simulation methods, which primarily focus on computing intricate factors for single buildings or simplified architectural agglomerations, often fall short in providing realistic visuals and real-time structural damage assessments for urban environments. To address this deficiency, we introduce a real-time, high visual fidelity earthquake simulation platform based on the Chaos Physics System in Unreal Engine, specifically designed to simulate the damage to urban buildings. Initially, we use a genetic algorithm to calibrate material simulation parameters from Ansys into the Unreal Engine’s fracture system, based on real-world test standards. This alignment ensures the similarity of results between the two systems while achieving real-time capabilities. Additionally, by integrating real earthquake waveform data, we improve the simulation’s authenticity, ensuring it accurately reflects historical events. All functionalities are integrated into a visual user interface, enabling zero-code operation, which facilitates testing and further development by cross-disciplinary users. We verify the platform’s effectiveness through three AI-based tasks: similarity detection, path planning, and image segmentation. This paper builds upon the preliminary earthquake simulation study we presented at IMET 2023, with significant enhancements, including improvements to the material calibration workflow and the method for binding building foundations.

地震对社会和经济产生了重大影响，凸显了对有效搜救战略的需求。随着人工智能和机器人技术越来越多地支持这些工作，对用于培训的高保真实时模拟环境的需求也变得越来越迫切。地震模拟可视为一个复杂的系统。传统的模拟方法主要侧重于计算单个建筑物或简化建筑群的复杂因素，往往无法为城市环境提供逼真的视觉效果和实时的结构损坏评估。为了弥补这一不足，我们基于虚幻引擎中的混沌物理系统，引入了一个实时、高视觉保真度的地震模拟平台，专门用于模拟对城市建筑的破坏。最初，我们使用遗传算法，根据真实世界的测试标准，将来自 Ansys 的材料模拟参数校准到虚幻引擎的断裂系统中。这种校准确保了两个系统结果的相似性，同时实现了实时功能。此外，通过整合真实的地震波形数据，我们提高了模拟的真实性，确保其准确反映历史事件。所有功能都集成在一个可视化用户界面中，实现了零代码操作，这为跨学科用户的测试和进一步开发提供了便利。我们通过三个基于人工智能的任务来验证该平台的有效性：相似性检测、路径规划和图像分割。本文以我们在 IMET 2023 上展示的初步地震模拟研究为基础，进行了重大改进，包括改进材料校准工作流程和绑定建筑地基的方法。

{"title":"Executing realistic earthquake simulations in unreal engine with material calibration","authors":"Yitong Sun , Hanchun Wang , Zhejun Zhang , Cyriel Diels , Ali Asadipour","doi":"10.1016/j.cag.2024.104091","DOIUrl":"10.1016/j.cag.2024.104091","url":null,"abstract":"<div><div>Earthquakes significantly impact societies and economies, underscoring the need for effective search and rescue strategies. As AI and robotics increasingly support these efforts, the demand for high-fidelity, real-time simulation environments for training has become pressing. Earthquake simulation can be considered as a complex system. Traditional simulation methods, which primarily focus on computing intricate factors for single buildings or simplified architectural agglomerations, often fall short in providing realistic visuals and real-time structural damage assessments for urban environments. To address this deficiency, we introduce a real-time, high visual fidelity earthquake simulation platform based on the Chaos Physics System in Unreal Engine, specifically designed to simulate the damage to urban buildings. Initially, we use a genetic algorithm to calibrate material simulation parameters from Ansys into the Unreal Engine’s fracture system, based on real-world test standards. This alignment ensures the similarity of results between the two systems while achieving real-time capabilities. Additionally, by integrating real earthquake waveform data, we improve the simulation’s authenticity, ensuring it accurately reflects historical events. All functionalities are integrated into a visual user interface, enabling zero-code operation, which facilitates testing and further development by cross-disciplinary users. We verify the platform’s effectiveness through three AI-based tasks: similarity detection, path planning, and image segmentation. This paper builds upon the preliminary earthquake simulation study we presented at IMET 2023, with significant enhancements, including improvements to the material calibration workflow and the method for binding building foundations.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104091"},"PeriodicalIF":2.5,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Controlling the scatterplot shapes of 2D and 3D multidimensional projections 控制二维和三维多维投影的散点图形状

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-24 DOI: 10.1016/j.cag.2024.104093

Alister Machado, Alexandru Telea, Michael Behrisch

Multidimensional projections are effective techniques for depicting high-dimensional data. The point patterns created by such techniques, or a technique’s visual signature, depend — apart from the data themselves — on the technique design and its parameter settings. Controlling such visual signatures — something that only few projections allow — can bring additional freedom for generating insightful depictions of the data. We present a novel projection technique — ShaRP — that allows explicit control on such visual signatures in terms of shapes of similar-value point clusters (settable to rectangles, triangles, ellipses, and convex polygons) and the projection space (2D or 3D Euclidean or

S^{2}

). We show that ShaRP scales computationally well with dimensionality and dataset size, provides its signature-control by a small set of parameters, allows trading off projection quality to signature enforcement, and can be used to generate decision maps to explore the behavior of trained machine-learning classifiers.

多维投影是描述高维数据的有效技术。除数据本身外，此类技术所创建的点模式或技术的视觉特征还取决于技术设计及其参数设置。控制这种视觉特征--只有少数投影技术可以做到--可以为生成有洞察力的数据描述带来更多自由。我们提出了一种新颖的投影技术--ShaRP，它允许在相似值点簇的形状（可设置为矩形、三角形、椭圆形和凸多边形）和投影空间（二维或三维欧几里得或 S2）方面对这种视觉特征进行明确控制。我们的研究表明，ShaRP 在计算维度和数据集大小方面具有良好的扩展性，只需一组很小的参数就能实现对签名的控制，允许在投影质量与签名执行之间进行权衡，并可用于生成决策图，以探索训练有素的机器学习分类器的行为。

{"title":"Controlling the scatterplot shapes of 2D and 3D multidimensional projections","authors":"Alister Machado, Alexandru Telea, Michael Behrisch","doi":"10.1016/j.cag.2024.104093","DOIUrl":"10.1016/j.cag.2024.104093","url":null,"abstract":"<div><div>Multidimensional projections are effective techniques for depicting high-dimensional data. The point patterns created by such techniques, or a technique’s <em>visual signature</em>, depend — apart from the data themselves — on the technique design and its parameter settings. Controlling such visual signatures — something that only few projections allow — can bring additional freedom for generating insightful depictions of the data. We present a novel projection technique — ShaRP — that allows explicit control on such visual signatures in terms of shapes of similar-value point clusters (settable to rectangles, triangles, ellipses, and convex polygons) and the projection space (2D or 3D Euclidean or <span><math><msup><mrow><mi>S</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>). We show that ShaRP scales computationally well with dimensionality and dataset size, provides its signature-control by a small set of parameters, allows trading off projection quality to signature enforcement, and can be used to generate decision maps to explore the behavior of trained machine-learning classifiers.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104093"},"PeriodicalIF":2.5,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142322382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ten years of immersive education: Overview of a Virtual and Augmented Reality course at postgraduate level 沉浸式教育十年：研究生虚拟和增强现实课程概览

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-20 DOI: 10.1016/j.cag.2024.104088

Bernardo Marques, Beatriz Sousa Santos, Paulo Dias

In recent years, the market has seen the emergence of numerous affordable sensors, interaction devices, and displays, which have greatly facilitated the adoption of Virtual and Augmented Reality (VR/AR) across various applications. However, developing these applications requires a solid understanding of the field and specific technical skills, which are often lacking in current Computer Science and Engineering education programs. This work details an extended version from a Eurographics 2024 Education Paper, reporting a post-graduate-level course that has been taught for the past ten years to almost 200 students, across several Master’s programs. The course introduces students to the fundamental principles, methods, and tools of VR/AR. Its primary objective is to equip students with the knowledge necessary to understand, create, implement, and evaluate applications using these technologies. The paper provides insights into the course structure, key topics covered, assessment methods, as well as the devices and infrastructure utilized. It also includes an overview of various practical projects completed over the years. Among other reflections, we discuss the challenges of teaching this course, particularly due to the rapid evolution of the field, which necessitates constant updates to the curriculum. Finally, future perspectives for the course are outlined.

近年来，市场上出现了许多价格低廉的传感器、交互设备和显示器，极大地促进了虚拟现实和增强现实（VR/AR）在各种应用中的普及。然而，开发这些应用需要对该领域有扎实的了解并掌握特定的技术技能，而目前的计算机科学与工程教育课程往往缺乏这些技能。本作品详细介绍了 Eurographics 2024 教育论文的扩展版本，报告了一门研究生水平的课程，该课程在过去十年中已教授了多个硕士项目的近 200 名学生。该课程向学生介绍 VR/AR 的基本原理、方法和工具。其主要目的是让学生掌握必要的知识，以便理解、创建、实施和评估使用这些技术的应用程序。本文深入介绍了课程结构、涵盖的关键主题、评估方法以及使用的设备和基础设施。本文还概述了多年来完成的各种实践项目。除其他思考外，我们还讨论了本课程教学所面临的挑战，特别是由于该领域的快速发展，需要不断更新课程。最后，概述了该课程的未来前景。

{"title":"Ten years of immersive education: Overview of a Virtual and Augmented Reality course at postgraduate level","authors":"Bernardo Marques, Beatriz Sousa Santos, Paulo Dias","doi":"10.1016/j.cag.2024.104088","DOIUrl":"10.1016/j.cag.2024.104088","url":null,"abstract":"<div><div>In recent years, the market has seen the emergence of numerous affordable sensors, interaction devices, and displays, which have greatly facilitated the adoption of Virtual and Augmented Reality (VR/AR) across various applications. However, developing these applications requires a solid understanding of the field and specific technical skills, which are often lacking in current Computer Science and Engineering education programs. This work details an extended version from a Eurographics 2024 Education Paper, reporting a post-graduate-level course that has been taught for the past ten years to almost 200 students, across several Master’s programs. The course introduces students to the fundamental principles, methods, and tools of VR/AR. Its primary objective is to equip students with the knowledge necessary to understand, create, implement, and evaluate applications using these technologies. The paper provides insights into the course structure, key topics covered, assessment methods, as well as the devices and infrastructure utilized. It also includes an overview of various practical projects completed over the years. Among other reflections, we discuss the challenges of teaching this course, particularly due to the rapid evolution of the field, which necessitates constant updates to the curriculum. Finally, future perspectives for the course are outlined.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104088"},"PeriodicalIF":2.5,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324002231/pdfft?md5=f05085791d28d06cef00928e6ebd0b31&pid=1-s2.0-S0097849324002231-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient image generation with Contour Wavelet Diffusion 利用轮廓小波扩散生成高效图像

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-20 DOI: 10.1016/j.cag.2024.104087

Dimeng Zhang , JiaYao Li , Zilong Chen , Yuntao Zou

The burgeoning field of image generation has captivated academia and industry with its potential to produce high-quality images, facilitating applications like text-to-image conversion, image translation, and recovery. These advancements have notably propelled the growth of the metaverse, where virtual environments constructed from generated images offer new interactive experiences, especially in conjunction with digital libraries. The technology creates detailed high-quality images, enabling immersive experiences. Despite diffusion models showing promise with superior image quality and mode coverage over GANs, their slow training and inference speeds have hindered broader adoption. To counter this, we introduce the Contour Wavelet Diffusion Model, which accelerates the process by decomposing features and employing multi-directional, anisotropic analysis. This model integrates an attention mechanism to focus on high-frequency details and a reconstruction loss function to ensure image consistency and accelerate convergence. The result is a significant reduction in training and inference times without sacrificing image quality, making diffusion models viable for large-scale applications and enhancing their practicality in the evolving digital landscape.

新兴的图像生成领域以其生成高质量图像的潜力吸引了学术界和工业界，促进了文本到图像的转换、图像翻译和恢复等应用。这些进步显著推动了元宇宙的发展，由生成图像构建的虚拟环境提供了全新的互动体验，尤其是与数字图书馆结合使用时。该技术可生成细节丰富的高质量图像，带来身临其境的体验。尽管扩散模型在图像质量和模式覆盖率方面比 GANs 更胜一筹，但其缓慢的训练和推理速度阻碍了更广泛的应用。为了解决这一问题，我们引入了轮廓小波扩散模型，该模型通过分解特征和采用多向、各向异性分析来加速这一过程。该模型集成了一种关注机制，用于关注高频细节；还集成了一种重构损失函数，用于确保图像一致性并加速收敛。其结果是在不牺牲图像质量的前提下，显著缩短了训练和推理时间，使扩散模型在大规模应用中变得可行，并增强了其在不断发展的数字领域中的实用性。

{"title":"Efficient image generation with Contour Wavelet Diffusion","authors":"Dimeng Zhang , JiaYao Li , Zilong Chen , Yuntao Zou","doi":"10.1016/j.cag.2024.104087","DOIUrl":"10.1016/j.cag.2024.104087","url":null,"abstract":"<div><div>The burgeoning field of image generation has captivated academia and industry with its potential to produce high-quality images, facilitating applications like text-to-image conversion, image translation, and recovery. These advancements have notably propelled the growth of the metaverse, where virtual environments constructed from generated images offer new interactive experiences, especially in conjunction with digital libraries. The technology creates detailed high-quality images, enabling immersive experiences. Despite diffusion models showing promise with superior image quality and mode coverage over GANs, their slow training and inference speeds have hindered broader adoption. To counter this, we introduce the Contour Wavelet Diffusion Model, which accelerates the process by decomposing features and employing multi-directional, anisotropic analysis. This model integrates an attention mechanism to focus on high-frequency details and a reconstruction loss function to ensure image consistency and accelerate convergence. The result is a significant reduction in training and inference times without sacrificing image quality, making diffusion models viable for large-scale applications and enhancing their practicality in the evolving digital landscape.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104087"},"PeriodicalIF":2.5,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supporting motion-capture acting with collaborative Mixed Reality 利用协作式混合现实支持动作捕捉表演

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-19 DOI: 10.1016/j.cag.2024.104090

Alberto Cannavò, Francesco Bottino, Fabrizio Lamberti

Technologies such as chroma-key, LED walls, motion capture (mocap), 3D visual storyboards, and simulcams are revolutionizing how films featuring visual effects are produced. Despite their popularity, these technologies have introduced new challenges for actors. An increased workload is faced when digital characters are animated via mocap, since actors are requested to use their imagination to envision what characters see and do on set. This work investigates how Mixed Reality (MR) technology can support actors during mocap sessions by presenting a collaborative MR system named CoMR-MoCap, which allows actors to rehearse scenes by overlaying digital contents onto the real set. Using a Video See-Through Head Mounted Display (VST-HMD), actors can see digital representations of performers in mocap suits and digital scene contents in real time. The system supports collaboration, enabling multiple actors to wear both mocap suits to animate digital characters and VST-HMDs to visualize the digital contents. A user study involving 24 participants compared CoMR-MoCap to the traditional method using physical props and visual cues. The results showed that CoMR-MoCap significantly improved actors’ ability to position themselves and direct their gaze, and it offered advantages in terms of usability, spatial and social presence, embodiment, and perceived effectiveness over the traditional method.

色键、LED 墙、动作捕捉（mocap）、三维视觉故事板和模拟摄像机等技术正在彻底改变以视觉效果为特色的电影制作方式。尽管这些技术很受欢迎，但也给演员带来了新的挑战。当通过 mocap 制作数字角色动画时，演员的工作量会增加，因为他们需要发挥想象力来设想角色在片场的所见所闻。本作品通过展示一个名为 CoMR-MoCap 的协作式 MR 系统，研究了混合现实（MR）技术如何在 mocap 过程中为演员提供支持。通过使用视频透视头戴式显示器（VST-HMD），演员可以实时看到穿着 mocap 服的演员的数字表现和数字场景内容。该系统支持协作，使多名演员既能穿上 mocap 套装为数字角色制作动画，又能佩戴 VST-HMD 将数字内容可视化。一项有 24 人参与的用户研究将 CoMR-MoCap 与使用实物道具和视觉提示的传统方法进行了比较。结果表明，CoMR-MoCap 显著提高了演员定位和引导视线的能力，与传统方法相比，它在可用性、空间和社交存在感、体现和感知效果方面更具优势。

{"title":"Supporting motion-capture acting with collaborative Mixed Reality","authors":"Alberto Cannavò, Francesco Bottino, Fabrizio Lamberti","doi":"10.1016/j.cag.2024.104090","DOIUrl":"10.1016/j.cag.2024.104090","url":null,"abstract":"<div><div>Technologies such as chroma-key, LED walls, motion capture (mocap), 3D visual storyboards, and simulcams are revolutionizing how films featuring visual effects are produced. Despite their popularity, these technologies have introduced new challenges for actors. An increased workload is faced when digital characters are animated via mocap, since actors are requested to use their imagination to envision what characters see and do on set. This work investigates how Mixed Reality (MR) technology can support actors during mocap sessions by presenting a collaborative MR system named CoMR-MoCap, which allows actors to rehearse scenes by overlaying digital contents onto the real set. Using a Video See-Through Head Mounted Display (VST-HMD), actors can see digital representations of performers in mocap suits and digital scene contents in real time. The system supports collaboration, enabling multiple actors to wear both mocap suits to animate digital characters and VST-HMDs to visualize the digital contents. A user study involving 24 participants compared CoMR-MoCap to the traditional method using physical props and visual cues. The results showed that CoMR-MoCap significantly improved actors’ ability to position themselves and direct their gaze, and it offered advantages in terms of usability, spatial and social presence, embodiment, and perceived effectiveness over the traditional method.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104090"},"PeriodicalIF":2.5,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LightingFormer: Transformer-CNN hybrid network for low-light image enhancement LightingFormer：用于弱光图像增强的变换器-CNN 混合网络

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-18 DOI: 10.1016/j.cag.2024.104089

Cong Bi , Wenhua Qian , Jinde Cao , Xue Wang

Recent deep-learning methods have shown promising results in low-light image enhancement. However, current methods often suffer from noise and artifacts, and most are based on convolutional neural networks, which have limitations in capturing long-range dependencies resulting in insufficient recovery of extremely dark parts in low-light images. To tackle these issues, this paper proposes a novel Transformer-based low-light image enhancement network called LightingFormer. Specifically, we propose a novel Transformer-CNN hybrid block that captures global and local information via mixed attention. It combines the advantages of the Transformer in capturing long-range dependencies and the advantages of CNNs in extracting low-level features and enhancing locality to recover extremely dark parts and enhance local details in low-light images. Moreover, we adopt the U-Net discriminator to enhance different regions in low-light images adaptively, avoiding overexposure or underexposure, and suppressing noise and artifacts. Extensive experiments show that our method outperforms the state-of-the-art methods quantitatively and qualitatively. Furthermore, the application to object detection demonstrates the potential of our method in high-level vision tasks.

最近的深度学习方法在弱光图像增强方面取得了可喜的成果。然而，目前的方法往往受到噪声和伪影的影响，而且大多数方法都是基于卷积神经网络，在捕捉长程依赖性方面存在局限性，导致对低照度图像中极暗部分的恢复不足。为了解决这些问题，本文提出了一种基于 Transformer 的新型低照度图像增强网络，称为 LightingFormer。具体来说，我们提出了一种新颖的 Transformer-CNN 混合块，通过混合注意力捕捉全局和局部信息。它结合了 Transformer 在捕捉长距离依赖性方面的优势，以及 CNN 在提取低层次特征和增强局部性方面的优势，从而在弱光图像中恢复极暗部分并增强局部细节。此外，我们还采用 U-Net 鉴别器自适应增强弱光图像中的不同区域，避免曝光过度或曝光不足，抑制噪声和伪影。大量实验表明，我们的方法在数量和质量上都优于最先进的方法。此外，在物体检测中的应用也证明了我们的方法在高级视觉任务中的潜力。

{"title":"LightingFormer: Transformer-CNN hybrid network for low-light image enhancement","authors":"Cong Bi , Wenhua Qian , Jinde Cao , Xue Wang","doi":"10.1016/j.cag.2024.104089","DOIUrl":"10.1016/j.cag.2024.104089","url":null,"abstract":"<div><div>Recent deep-learning methods have shown promising results in low-light image enhancement. However, current methods often suffer from noise and artifacts, and most are based on convolutional neural networks, which have limitations in capturing long-range dependencies resulting in insufficient recovery of extremely dark parts in low-light images. To tackle these issues, this paper proposes a novel Transformer-based low-light image enhancement network called LightingFormer. Specifically, we propose a novel Transformer-CNN hybrid block that captures global and local information via mixed attention. It combines the advantages of the Transformer in capturing long-range dependencies and the advantages of CNNs in extracting low-level features and enhancing locality to recover extremely dark parts and enhance local details in low-light images. Moreover, we adopt the U-Net discriminator to enhance different regions in low-light images adaptively, avoiding overexposure or underexposure, and suppressing noise and artifacts. Extensive experiments show that our method outperforms the state-of-the-art methods quantitatively and qualitatively. Furthermore, the application to object detection demonstrates the potential of our method in high-level vision tasks.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104089"},"PeriodicalIF":2.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

APE-GAN: A colorization method for focal areas of infrared images guided by an improved attention mask mechanism APE-GAN：以改进的注意力掩码机制为指导的红外图像焦点区域着色方法

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-18 DOI: 10.1016/j.cag.2024.104086

Wenchao Ren, Liangfu Li, Shiyi Wen, Lingmei Ai

Due to their minimal susceptibility to environmental changes, infrared images are widely applicable across various fields, particularly in the realm of traffic. Nonetheless, a common drawback of infrared images lies in their limited chroma and detail information, posing challenges for clear information retrieval. While extensive research has been conducted on colorizing infrared images in recent years, existing methods primarily focus on overall translation without adequately addressing the foreground area containing crucial details. To address this issue, we propose a novel approach that distinguishes and colors the foreground content with important information and the background content with less significant details separately before fusing them into a colored image. Consequently, we introduce an enhanced generative adversarial network based on Attention mask to meticulously translate the foreground content containing vital information more comprehensively. Furthermore, we have carefully designed a new composite loss function to optimize high-level detail generation and improve image colorization at a finer granularity. Detailed testing on IRVI datasets validates the effectiveness of our proposed method in solving the problem of infrared image coloring.

由于红外图像受环境变化的影响极小，因此广泛应用于各个领域，尤其是交通领域。然而，红外图像的一个共同缺点是色度和细节信息有限，给清晰的信息检索带来了挑战。虽然近年来对红外图像着色进行了广泛的研究，但现有的方法主要侧重于整体翻译，而没有充分解决包含关键细节的前景区域。为了解决这个问题，我们提出了一种新方法，即在将包含重要信息的前景内容和包含次要细节的背景内容融合为彩色图像之前，分别对它们进行区分和着色。因此，我们引入了基于注意力掩码的增强型生成对抗网络，以更全面地翻译包含重要信息的前景内容。此外，我们还精心设计了一个新的复合损失函数，以优化高级细节生成，并在更细的粒度上改进图像着色。对 IRVI 数据集的详细测试验证了我们提出的方法在解决红外图像着色问题方面的有效性。

{"title":"APE-GAN: A colorization method for focal areas of infrared images guided by an improved attention mask mechanism","authors":"Wenchao Ren, Liangfu Li, Shiyi Wen, Lingmei Ai","doi":"10.1016/j.cag.2024.104086","DOIUrl":"10.1016/j.cag.2024.104086","url":null,"abstract":"<div><div>Due to their minimal susceptibility to environmental changes, infrared images are widely applicable across various fields, particularly in the realm of traffic. Nonetheless, a common drawback of infrared images lies in their limited chroma and detail information, posing challenges for clear information retrieval. While extensive research has been conducted on colorizing infrared images in recent years, existing methods primarily focus on overall translation without adequately addressing the foreground area containing crucial details. To address this issue, we propose a novel approach that distinguishes and colors the foreground content with important information and the background content with less significant details separately before fusing them into a colored image. Consequently, we introduce an enhanced generative adversarial network based on Attention mask to meticulously translate the foreground content containing vital information more comprehensively. Furthermore, we have carefully designed a new composite loss function to optimize high-level detail generation and improve image colorization at a finer granularity. Detailed testing on IRVI datasets validates the effectiveness of our proposed method in solving the problem of infrared image coloring.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104086"},"PeriodicalIF":2.5,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ST2SI: Image Style Transfer via Vision Transformer using Spatial Interaction ST2SI：通过视觉转换器利用空间交互进行图像风格转换

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-16 DOI: 10.1016/j.cag.2024.104084

Wenshu Li , Yinliang Chen , Xiaoying Guo , Xiaoyu He

While retaining the original content structure, image style transfer uses style image to render it to obtain stylized images with artistic features. Because the content image contains different detail units and the style image has various style patterns, it is easy to cause the distortion of the stylized image. We proposes a new Style Transfer based on Vision Transformer using Spatial Interaction (ST2SI), which takes advantage of Spatial Interactive Convolution (SIC) and Spatial Unit Attention (SUA) to further enhance the content and style representation, so that the encoder can not only better learn the features of the content domain and the style domain, but also maintain the structural integrity of the image content and the effective integration of style features. Concretely, the high-order spatial interaction ability of Spatial Interactive Convolution can capture complex style patterns, and Spatial Unit Attention can balance the content information of different detail units through the change of attention weight, thus solving the problem of image distortion. Comprehensive qualitative and quantitative experiments prove the efficacy of our approach.

图像风格转换在保留原有内容结构的同时，利用风格图像对其进行渲染，从而获得具有艺术特色的风格化图像。由于内容图像包含不同的细节单元，而风格图像具有各种风格模式，因此很容易造成风格化图像的失真。我们提出了一种新的基于空间交互视觉转换器的风格转换（ST2SI），利用空间交互卷积（SIC）和空间单元注意（SUA）的优势，进一步增强内容和风格的表示，使编码器不仅能更好地学习内容域和风格域的特征，还能保持图像内容结构的完整性和风格特征的有效融合。具体来说，空间交互卷积的高阶空间交互能力可以捕捉复杂的风格模式，而空间单元注意力则可以通过注意力权重的变化平衡不同细节单元的内容信息，从而解决图像失真的问题。全面的定性和定量实验证明了我们方法的有效性。

{"title":"ST2SI: Image Style Transfer via Vision Transformer using Spatial Interaction","authors":"Wenshu Li , Yinliang Chen , Xiaoying Guo , Xiaoyu He","doi":"10.1016/j.cag.2024.104084","DOIUrl":"10.1016/j.cag.2024.104084","url":null,"abstract":"<div><div>While retaining the original content structure, image style transfer uses style image to render it to obtain stylized images with artistic features. Because the content image contains different detail units and the style image has various style patterns, it is easy to cause the distortion of the stylized image. We proposes a new Style Transfer based on Vision Transformer using Spatial Interaction (ST2SI), which takes advantage of Spatial Interactive Convolution (SIC) and Spatial Unit Attention (SUA) to further enhance the content and style representation, so that the encoder can not only better learn the features of the content domain and the style domain, but also maintain the structural integrity of the image content and the effective integration of style features. Concretely, the high-order spatial interaction ability of Spatial Interactive Convolution can capture complex style patterns, and Spatial Unit Attention can balance the content information of different detail units through the change of attention weight, thus solving the problem of image distortion. Comprehensive qualitative and quantitative experiments prove the efficacy of our approach.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104084"},"PeriodicalIF":2.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial Note Computers & Graphics Issue 123 编者按《计算机与图形》第 123 期

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-13 DOI: 10.1016/j.cag.2024.104072

引用次数: 0

SHAPE: A visual computing pipeline for interactive landmarking of 3D photograms and patient reporting for assessing craniosynostosis SHAPE：用于交互式三维照片标记和患者报告的视觉计算管道，以评估颅骨发育不良症

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-12 DOI: 10.1016/j.cag.2024.104056

Carsten Görg , Connor Elkhill , Jasmine Chaij , Kristin Royalty , Phuong D. Nguyen , Brooke French , Ines A. Cruz-Guerrero , Antonio R. Porras

3D photogrammetry is a cost-effective, non-invasive imaging modality that does not require the use of ionizing radiation or sedation. Therefore, it is specifically valuable in pediatrics and is used to support the diagnosis and longitudinal study of craniofacial developmental pathologies such as craniosynostosis — the premature fusion of one or more cranial sutures resulting in local cranial growth restrictions and cranial malformations. Analysis of 3D photogrammetry requires the identification of craniofacial landmarks to segment the head surface and compute metrics to quantify anomalies. Unfortunately, commercial 3D photogrammetry software requires intensive manual landmark placements, which is time-consuming and prone to errors. We designed and implemented SHAPE, a System for Head-shape Analysis and Pediatric Evaluation. It integrates our previously developed automated landmarking method in a visual computing pipeline to evaluate a patient’s 3D photogram while allowing for manual confirmation and correction. It also automatically computes advanced metrics to quantify craniofacial anomalies and automatically creates a report that can be uploaded to the patient’s electronic health record. We conducted a user study with a professional clinical photographer to compare SHAPE to the existing clinical workflow. We found that SHAPE allows for the evaluation of a craniofacial 3D photogram more than three times faster than the current clinical workflow (

3.85 \pm 0.99

vs.

13.07 \pm 5.29

minutes,

p < 0.001

). Our qualitative study findings indicate that the SHAPE workflow is well aligned with the existing clinical workflow and that SHAPE has useful features and is easy to learn.

三维摄影测量是一种经济有效的非侵入性成像方式，无需使用电离辐射或镇静剂。因此，它在儿科具有特殊的价值，可用于支持颅面发育病症的诊断和纵向研究，例如颅骨发育不全--一条或多条颅缝过早融合，导致局部颅骨生长受限和颅骨畸形。三维摄影测量分析需要识别颅面地标，以分割头部表面并计算量化异常的指标。遗憾的是，商业三维摄影测量软件需要大量的人工放置地标，既费时又容易出错。我们设计并实施了 SHAPE--头形分析和儿科评估系统。该系统将我们之前开发的自动标记方法集成到视觉计算管道中，以评估患者的三维照片，同时允许手动确认和校正。它还能自动计算量化颅面畸形的高级指标，并自动创建可上传到患者电子健康记录的报告。我们与一名专业临床摄影师进行了用户研究，将 SHAPE 与现有的临床工作流程进行比较。我们发现，SHAPE 评估颅面三维照片的速度比现有临床工作流程快三倍多（3.85±0.99 分钟 vs. 13.07±5.29 分钟，p<0.001）。我们的定性研究结果表明，SHAPE 工作流程与现有的临床工作流程非常吻合，而且 SHAPE 功能实用、易于学习。

{"title":"SHAPE: A visual computing pipeline for interactive landmarking of 3D photograms and patient reporting for assessing craniosynostosis","authors":"Carsten Görg , Connor Elkhill , Jasmine Chaij , Kristin Royalty , Phuong D. Nguyen , Brooke French , Ines A. Cruz-Guerrero , Antonio R. Porras","doi":"10.1016/j.cag.2024.104056","DOIUrl":"10.1016/j.cag.2024.104056","url":null,"abstract":"<div><div>3D photogrammetry is a cost-effective, non-invasive imaging modality that does not require the use of ionizing radiation or sedation. Therefore, it is specifically valuable in pediatrics and is used to support the diagnosis and longitudinal study of craniofacial developmental pathologies such as craniosynostosis — the premature fusion of one or more cranial sutures resulting in local cranial growth restrictions and cranial malformations. Analysis of 3D photogrammetry requires the identification of craniofacial landmarks to segment the head surface and compute metrics to quantify anomalies. Unfortunately, commercial 3D photogrammetry software requires intensive manual landmark placements, which is time-consuming and prone to errors. We designed and implemented SHAPE, a System for Head-shape Analysis and Pediatric Evaluation. It integrates our previously developed automated landmarking method in a visual computing pipeline to evaluate a patient’s 3D photogram while allowing for manual confirmation and correction. It also automatically computes advanced metrics to quantify craniofacial anomalies and automatically creates a report that can be uploaded to the patient’s electronic health record. We conducted a user study with a professional clinical photographer to compare SHAPE to the existing clinical workflow. We found that SHAPE allows for the evaluation of a craniofacial 3D photogram more than three times faster than the current clinical workflow (<span><math><mrow><mn>3</mn><mo>.</mo><mn>85</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>99</mn></mrow></math></span> vs. <span><math><mrow><mn>13</mn><mo>.</mo><mn>07</mn><mo>±</mo><mn>5</mn><mo>.</mo><mn>29</mn></mrow></math></span> minutes, <span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>001</mn></mrow></math></span>). Our qualitative study findings indicate that the SHAPE workflow is well aligned with the existing clinical workflow and that SHAPE has useful features and is easy to learn.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104056"},"PeriodicalIF":2.5,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0