首页 > 最新文献

The Visual Computer最新文献

英文 中文
Deformable shape matching with multiple complex spectral filter operator preservation 保留多个复杂频谱滤波算子的可变形形状匹配
Pub Date : 2024-06-25 DOI: 10.1007/s00371-024-03487-z
Qinsong Li, Yueyu Guo, Xinru Liu, Ling Hu, Feifan Luo, Shengjun Liu

The functional maps framework has achieved remarkable success in non-rigid shape matching. However, the traditional functional map representations do not explicitly encode surface orientation, which can easily lead to orientation-reversing correspondence. The complex functional map addresses this issue by linking oriented tangent bundles to favor orientation-preserving correspondence. Nevertheless, the absence of effective restrictions on the complex functional maps hinders them from obtaining high-quality correspondences. To this end, we introduce novel and powerful constraints to determine complex functional maps by incorporating multiple complex spectral filter operator preservation constraints with a rigorous theoretical guarantee. Such constraints encode the surface orientation information and enforce the isometric property of the map. Based on these constraints, we propose a novel and efficient method to obtain orientation-preserving and accurate correspondences across shapes by alternatively updating the functional maps, complex functional maps, and pointwise maps. Extensive experiments demonstrate our significant improvements in correspondence quality and computing efficiency. In addition, our constraints can be easily adapted to other functional maps-based methods to enhance their performance.

功能图框架在非刚性形状匹配方面取得了巨大成功。然而,传统的功能图表示法没有明确地编码表面方向,这很容易导致方向反转对应。复杂功能图通过连接定向切线束来解决这一问题,从而有利于实现保全方向的对应。然而,由于对复合函数图缺乏有效的限制,它们无法获得高质量的对应关系。为此,我们引入了新颖而强大的约束条件,通过结合多个具有严格理论保证的复谱滤波算子保全约束条件来确定复函数映射。这些约束编码了表面方向信息,并强制执行映射的等距特性。基于这些约束条件,我们提出了一种新颖、高效的方法,通过交替更新功能图、复合功能图和点图,获得方向保护和精确的跨形状对应关系。广泛的实验证明了我们在对应质量和计算效率方面的显著改进。此外,我们的约束条件可以很容易地适用于其他基于功能图的方法,以提高它们的性能。
{"title":"Deformable shape matching with multiple complex spectral filter operator preservation","authors":"Qinsong Li, Yueyu Guo, Xinru Liu, Ling Hu, Feifan Luo, Shengjun Liu","doi":"10.1007/s00371-024-03487-z","DOIUrl":"https://doi.org/10.1007/s00371-024-03487-z","url":null,"abstract":"<p>The functional maps framework has achieved remarkable success in non-rigid shape matching. However, the traditional functional map representations do not explicitly encode surface orientation, which can easily lead to orientation-reversing correspondence. The complex functional map addresses this issue by linking oriented tangent bundles to favor orientation-preserving correspondence. Nevertheless, the absence of effective restrictions on the complex functional maps hinders them from obtaining high-quality correspondences. To this end, we introduce novel and powerful constraints to determine complex functional maps by incorporating multiple complex spectral filter operator preservation constraints with a rigorous theoretical guarantee. Such constraints encode the surface orientation information and enforce the isometric property of the map. Based on these constraints, we propose a novel and efficient method to obtain orientation-preserving and accurate correspondences across shapes by alternatively updating the functional maps, complex functional maps, and pointwise maps. Extensive experiments demonstrate our significant improvements in correspondence quality and computing efficiency. In addition, our constraints can be easily adapted to other functional maps-based methods to enhance their performance.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Refined tri-directional path tracing with generated light portal 利用生成的光门进行精细的三向路径追踪
Pub Date : 2024-06-25 DOI: 10.1007/s00371-024-03464-6
Xuchen Wei, GuiYang Pu, Yuchi Huo, Hujun Bao, Rui Wang

The rendering efficiency of Monte Carlo path tracing often depends on the ease of path construction. For scenes with particularly complex visibility, e.g. where the camera and light sources are placed in separate rooms connected by narrow doorways or windows, it is difficult to construct valid paths using traditional path tracing algorithms such as unidirectional path tracing or bidirectional path tracing. Light portal is a class of methods that assist in sampling direct light paths based on prior knowledge of the scene. It usually requires additional manual editing and labelling by the artist or renderer user. Tri-directional path tracing is a sophisticated path tracing algorithm that combines bidirectional path tracing and light portals sampling, but the original work lacks sufficient analysis to demonstrate its effectiveness. In this paper, we propose an automatic light portal generation algorithm based on spatial radiosity analysis that mitigates the cost of manual operations for complex scenes. We also further analyse and improve the light portal-based tri-directional path tracing rendering algorithm, giving a detailed analysis of path construction strategies, algorithm complexity, and the unbiasedness of the Monte Carlo estimation. The experimental results show that our algorithm can accurately locate the light portals with low computational cost and effectively improve the rendering performance of complex scenes.

蒙特卡洛路径追踪的渲染效率通常取决于路径构建的难易程度。对于可视性特别复杂的场景,例如摄像机和光源被放置在由狭窄的门道或窗户连接的独立房间中,使用传统的路径追踪算法(如单向路径追踪或双向路径追踪)很难构建有效的路径。光入口是一类根据场景的先验知识辅助采样直射光路径的方法。它通常需要艺术家或渲染器用户进行额外的手动编辑和标注。三向路径追踪是一种复杂的路径追踪算法,它结合了双向路径追踪和光门户采样,但原作缺乏足够的分析来证明其有效性。在本文中,我们提出了一种基于空间辐射度分析的自动光门户生成算法,可以减轻复杂场景的手动操作成本。我们还进一步分析和改进了基于光门的三向路径追踪渲染算法,详细分析了路径构建策略、算法复杂度和蒙特卡罗估计的无偏性。实验结果表明,我们的算法能以较低的计算成本准确定位光门,并有效提高复杂场景的渲染性能。
{"title":"Refined tri-directional path tracing with generated light portal","authors":"Xuchen Wei, GuiYang Pu, Yuchi Huo, Hujun Bao, Rui Wang","doi":"10.1007/s00371-024-03464-6","DOIUrl":"https://doi.org/10.1007/s00371-024-03464-6","url":null,"abstract":"<p>The rendering efficiency of Monte Carlo path tracing often depends on the ease of path construction. For scenes with particularly complex visibility, e.g. where the camera and light sources are placed in separate rooms connected by narrow doorways or windows, it is difficult to construct valid paths using traditional path tracing algorithms such as unidirectional path tracing or bidirectional path tracing. Light portal is a class of methods that assist in sampling direct light paths based on prior knowledge of the scene. It usually requires additional manual editing and labelling by the artist or renderer user. Tri-directional path tracing is a sophisticated path tracing algorithm that combines bidirectional path tracing and light portals sampling, but the original work lacks sufficient analysis to demonstrate its effectiveness. In this paper, we propose an automatic light portal generation algorithm based on spatial radiosity analysis that mitigates the cost of manual operations for complex scenes. We also further analyse and improve the light portal-based tri-directional path tracing rendering algorithm, giving a detailed analysis of path construction strategies, algorithm complexity, and the unbiasedness of the Monte Carlo estimation. The experimental results show that our algorithm can accurately locate the light portals with low computational cost and effectively improve the rendering performance of complex scenes.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"146 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mem-Box: VR sandbox for adaptive working memory evaluation and training using physiological signals Mem-Box:利用生理信号评估和训练自适应工作记忆的 VR 沙盒
Pub Date : 2024-06-25 DOI: 10.1007/s00371-024-03539-4
Anqi Chen, Ming Li, Yang Gao

Working memory is crucial for higher cognitive functions in humans and is a focus in cognitive rehabilitation. Compared to conventional working memory training methods, VR-based training provides a more immersive experience with realistic scenarios, offering enhanced transferability to daily life. However, existing VR-based training methods often focus on basic cognitive tasks, underutilize VR’s realism, and rely heavily on subjective assessment methods. In this paper, we introduce a VR Sandbox for working memory training and evaluation, MEM-Box, which simulates everyday life scenarios and routines and adaptively adjusts task difficulty based on user performance. We conducted a training experiment utilizing the MEM-Box and compared it with a control group undergoing PC-based training. The results of the Stroop test indicate that both groups demonstrated improvements in working memory abilities, with MEM-Box training showing greater efficacy. Physiological data confirmed the effectiveness of the MEM-Box, as we observed lower HRV and SDNN. Furthermore, the results of the frequency-domain analysis indicate higher sympathetic nervous system activity (LFpower and LF/HF) during MEM-Box training, which is related to the higher sense of presence in VR. These metrics pave the way for building adaptive VR systems based on physiological data.

工作记忆对人类的高级认知功能至关重要,也是认知康复的重点。与传统的工作记忆训练方法相比,基于 VR 的训练提供了更加身临其境的真实场景体验,提高了日常生活中的可迁移性。然而,现有的基于 VR 的训练方法往往侧重于基本的认知任务,对 VR 的逼真性利用不足,并且严重依赖主观评估方法。在本文中,我们介绍了一种用于工作记忆训练和评估的 VR 沙盒--MEM-Box,它能模拟日常生活场景和日常活动,并能根据用户的表现自适应地调整任务难度。我们利用 MEM-Box 进行了一项训练实验,并与接受 PC 训练的对照组进行了比较。斯特罗普测试的结果表明,两组的工作记忆能力都有所提高,而 MEM-Box 培训的效果更好。生理数据证实了 MEM-Box 的有效性,因为我们观察到心率变异和 SDNN 均有所降低。此外,频域分析结果表明,MEM-Box 训练期间交感神经系统活动(LFpower 和 LF/HF)较高,这与 VR 中较高的临场感有关。这些指标为建立基于生理数据的自适应 VR 系统铺平了道路。
{"title":"Mem-Box: VR sandbox for adaptive working memory evaluation and training using physiological signals","authors":"Anqi Chen, Ming Li, Yang Gao","doi":"10.1007/s00371-024-03539-4","DOIUrl":"https://doi.org/10.1007/s00371-024-03539-4","url":null,"abstract":"<p>Working memory is crucial for higher cognitive functions in humans and is a focus in cognitive rehabilitation. Compared to conventional working memory training methods, VR-based training provides a more immersive experience with realistic scenarios, offering enhanced transferability to daily life. However, existing VR-based training methods often focus on basic cognitive tasks, underutilize VR’s realism, and rely heavily on subjective assessment methods. In this paper, we introduce a VR Sandbox for working memory training and evaluation, MEM-Box, which simulates everyday life scenarios and routines and adaptively adjusts task difficulty based on user performance. We conducted a training experiment utilizing the MEM-Box and compared it with a control group undergoing PC-based training. The results of the Stroop test indicate that both groups demonstrated improvements in working memory abilities, with MEM-Box training showing greater efficacy. Physiological data confirmed the effectiveness of the MEM-Box, as we observed lower HRV and SDNN. Furthermore, the results of the frequency-domain analysis indicate higher sympathetic nervous system activity (LFpower and LF/HF) during MEM-Box training, which is related to the higher sense of presence in VR. These metrics pave the way for building adaptive VR systems based on physiological data.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPO++: efficient encoding and rendering of dynamic neural radiance fields by analyzing and enhancing Fourier PlenOctrees FPO++:通过分析和增强傅立叶全八叉树,高效编码和渲染动态神经辐射场
Pub Date : 2024-06-22 DOI: 10.1007/s00371-024-03475-3
Saskia Rabich, Patrick Stotko, Reinhard Klein

Fourier PlenOctrees have shown to be an efficient representation for real-time rendering of dynamic neural radiance fields (NeRF). Despite its many advantages, this method suffers from artifacts introduced by the involved compression when combining it with recent state-of-the-art techniques for training the static per-frame NeRF models. In this paper, we perform an in-depth analysis of these artifacts and leverage the resulting insights to propose an improved representation. In particular, we present a novel density encoding that adapts the Fourier-based compression to the characteristics of the transfer function used by the underlying volume rendering procedure and leads to a substantial reduction of artifacts in the dynamic model. We demonstrate the effectiveness of our enhanced Fourier PlenOctrees in the scope of quantitative and qualitative evaluations on synthetic and real-world scenes.

傅立叶全八叉树已被证明是实时渲染动态神经辐射场(NeRF)的高效表示方法。尽管这种方法有很多优点,但当它与最近用于训练每帧静态 NeRF 模型的最先进技术相结合时,就会受到压缩所带来的假象的影响。在本文中,我们对这些假象进行了深入分析,并利用分析结果提出了一种改进的表示方法。特别是,我们提出了一种新颖的密度编码,它能使基于傅立叶的压缩适应底层体积渲染程序所使用的传递函数的特性,从而大幅减少动态模型中的伪影。我们通过对合成场景和真实世界场景的定量和定性评估,证明了增强型傅立叶全八叉树的有效性。
{"title":"FPO++: efficient encoding and rendering of dynamic neural radiance fields by analyzing and enhancing Fourier PlenOctrees","authors":"Saskia Rabich, Patrick Stotko, Reinhard Klein","doi":"10.1007/s00371-024-03475-3","DOIUrl":"https://doi.org/10.1007/s00371-024-03475-3","url":null,"abstract":"<p>Fourier PlenOctrees have shown to be an efficient representation for real-time rendering of dynamic neural radiance fields (NeRF). Despite its many advantages, this method suffers from artifacts introduced by the involved compression when combining it with recent state-of-the-art techniques for training the static per-frame NeRF models. In this paper, we perform an in-depth analysis of these artifacts and leverage the resulting insights to propose an improved representation. In particular, we present a novel density encoding that adapts the Fourier-based compression to the characteristics of the transfer function used by the underlying volume rendering procedure and leads to a substantial reduction of artifacts in the dynamic model. We demonstrate the effectiveness of our enhanced Fourier PlenOctrees in the scope of quantitative and qualitative evaluations on synthetic and real-world scenes.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-feature fusion enhanced monocular depth estimation with boundary awareness 多特征融合增强型单目深度估计与边界感知
Pub Date : 2024-06-22 DOI: 10.1007/s00371-024-03498-w
Chao Song, Qingjie Chen, Frederick W. B. Li, Zhaoyi Jiang, Dong Zheng, Yuliang Shen, Bailin Yang

Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditional methods for single-image depth estimation encounter limitations in photometric loss due to a lack of geometric constraints, reliance on pixel-level intensity or color differences, and the assumption of perfect photometric consistency, leading to errors in challenging conditions and resulting in overly smooth depth maps with insufficient capture of object boundaries and depth transitions. To tackle these challenges, we propose MFFENet, which leverages multi-level semantic and boundary-aware features to improve depth estimation accuracy. MFFENet extracts multi-level semantic features using our modified HRFormer approach. These features are then fed into our decoder and enhanced using attention mechanisms to enrich the boundary information generated by Laplacian pyramid residuals. To mitigate the weakening of semantic features during convolution processes, we introduce a feature-enhanced combination strategy. We also integrate the DeconvUp module to improve the restoration of depth map boundaries. We introduce a boundary loss that enforces constraints between object boundaries. We propose an extended evaluation method that utilizes Laplacian pyramid residuals to evaluate boundary depth. Extensive evaluations on the KITTI, Cityscapes, and Make3D datasets demonstrate the superior performance of MFFENet compared to state-of-the-art models in monocular depth estimation.

自监督单目深度估计为实际应用提供了令人兴奋的可能性,包括场景理解、物体检测和自动驾驶,而无需昂贵的深度注释。然而,传统的单图像深度估算方法由于缺乏几何约束、依赖像素级强度或颜色差异以及假设完美的光度一致性,在光度损失方面存在局限性,从而导致在挑战性条件下出现错误,并导致深度图过于平滑,无法充分捕捉物体边界和深度转换。为了应对这些挑战,我们提出了 MFFENet,它利用多层次语义和边界感知特征来提高深度估计的准确性。MFFENet 利用我们改进的 HRFormer 方法提取多层次语义特征。然后将这些特征输入我们的解码器,并利用注意力机制进行增强,以丰富拉普拉斯金字塔残差生成的边界信息。为了减轻卷积过程中语义特征的弱化,我们引入了特征增强组合策略。我们还整合了 DeconvUp 模块,以改进深度图边界的还原。我们引入了一种边界损失,以加强物体边界之间的约束。我们提出了一种扩展的评估方法,利用拉普拉斯金字塔残差来评估边界深度。在 KITTI、Cityscapes 和 Make3D 数据集上进行的广泛评估表明,与最先进的单目深度估算模型相比,MFFENet 的性能更为出色。
{"title":"Multi-feature fusion enhanced monocular depth estimation with boundary awareness","authors":"Chao Song, Qingjie Chen, Frederick W. B. Li, Zhaoyi Jiang, Dong Zheng, Yuliang Shen, Bailin Yang","doi":"10.1007/s00371-024-03498-w","DOIUrl":"https://doi.org/10.1007/s00371-024-03498-w","url":null,"abstract":"<p>Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditional methods for single-image depth estimation encounter limitations in photometric loss due to a lack of geometric constraints, reliance on pixel-level intensity or color differences, and the assumption of perfect photometric consistency, leading to errors in challenging conditions and resulting in overly smooth depth maps with insufficient capture of object boundaries and depth transitions. To tackle these challenges, we propose MFFENet, which leverages multi-level semantic and boundary-aware features to improve depth estimation accuracy. MFFENet extracts multi-level semantic features using our modified HRFormer approach. These features are then fed into our decoder and enhanced using attention mechanisms to enrich the boundary information generated by Laplacian pyramid residuals. To mitigate the weakening of semantic features during convolution processes, we introduce a feature-enhanced combination strategy. We also integrate the DeconvUp module to improve the restoration of depth map boundaries. We introduce a boundary loss that enforces constraints between object boundaries. We propose an extended evaluation method that utilizes Laplacian pyramid residuals to evaluate boundary depth. Extensive evaluations on the KITTI, Cityscapes, and Make3D datasets demonstrate the superior performance of MFFENet compared to state-of-the-art models in monocular depth estimation.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Road crack detection using pixel classification and intensity-based distinctive fuzzy C-means clustering 利用像素分类和基于强度的独特模糊 C-means 聚类检测道路裂缝
Pub Date : 2024-06-22 DOI: 10.1007/s00371-024-03470-8
Munish Bhardwaj, Nafis Uddin Khan, Vikas Baghel

Road cracks are quickly becoming one of the world's most serious concerns. It may have an impact on traffic safety and increase the likelihood of road accidents. A significant amount of money is spent each year for road repair and upkeep. This cost can be lowered if the cracks are discovered in good time. However, detection takes longer and is less precise when done manually. Because of ambient noise, intensity in-homogeneity, and low contrast, crack identification is a complex technique for automatic processes. As a result, several techniques have been developed in the past to pinpoint the specific site of the crack. In this research, a novel fuzzy C-means clustering algorithm is proposed that will detect fractures automatically by adding optimal edge pixels utilizing a second-order difference and intensity-based edge and non-edge fuzzy factors. This technique provides information of the intensity of edge and non-edge pixels, allowing it to recognize edges even when the image has little contrast. This method does not necessitate the use of any data set to train the model and no any critical parameter optimization is required. As a result, it can recognize edges or fissures even in novel or previously unknown input pictures of different environments. The experimental results reveal that the unique fuzzy C-means clustering-based segmentation method beats many of the existing methods used for detecting alligator, transverse, and longitudinal fractures from road photos in terms of precession, recall, and F1 score, PSNR, and execution time.

路面裂缝正迅速成为全球最严重的问题之一。它可能会影响交通安全,增加道路事故发生的可能性。每年都有大量资金用于道路维修和保养。如果能及时发现裂缝,就能降低成本。然而,人工检测需要更长的时间,而且不够精确。由于环境噪声、强度不均匀和对比度低,裂缝识别是一项复杂的自动处理技术。因此,过去已经开发了几种技术来确定裂缝的具体位置。本研究提出了一种新颖的模糊 C-means 聚类算法,利用二阶差分和基于强度的边缘和非边缘模糊因子,通过添加最佳边缘像素来自动检测裂缝。该技术提供了边缘和非边缘像素的强度信息,即使图像对比度较低,也能识别边缘。这种方法不需要使用任何数据集来训练模型,也不需要对任何关键参数进行优化。因此,即使是新的或以前未知的不同环境的输入图片,它也能识别边缘或裂缝。实验结果表明,基于模糊 C-means 聚类的独特分割方法在前瞻性、召回率、F1 分数、PSNR 和执行时间等方面都优于许多现有的用于检测道路照片中鳄鱼、横向和纵向裂缝的方法。
{"title":"Road crack detection using pixel classification and intensity-based distinctive fuzzy C-means clustering","authors":"Munish Bhardwaj, Nafis Uddin Khan, Vikas Baghel","doi":"10.1007/s00371-024-03470-8","DOIUrl":"https://doi.org/10.1007/s00371-024-03470-8","url":null,"abstract":"<p>Road cracks are quickly becoming one of the world's most serious concerns. It may have an impact on traffic safety and increase the likelihood of road accidents. A significant amount of money is spent each year for road repair and upkeep. This cost can be lowered if the cracks are discovered in good time. However, detection takes longer and is less precise when done manually. Because of ambient noise, intensity in-homogeneity, and low contrast, crack identification is a complex technique for automatic processes. As a result, several techniques have been developed in the past to pinpoint the specific site of the crack. In this research, a novel fuzzy C-means clustering algorithm is proposed that will detect fractures automatically by adding optimal edge pixels utilizing a second-order difference and intensity-based edge and non-edge fuzzy factors. This technique provides information of the intensity of edge and non-edge pixels, allowing it to recognize edges even when the image has little contrast. This method does not necessitate the use of any data set to train the model and no any critical parameter optimization is required. As a result, it can recognize edges or fissures even in novel or previously unknown input pictures of different environments. The experimental results reveal that the unique fuzzy C-means clustering-based segmentation method beats many of the existing methods used for detecting alligator, transverse, and longitudinal fractures from road photos in terms of precession, recall, and F1 score, PSNR, and execution time.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of a real-time vision-based automatic HDMI wire-split inspection system 开发和验证基于视觉的实时自动 HDMI 分线检测系统
Pub Date : 2024-06-21 DOI: 10.1007/s00371-024-03436-w
Yu-Chen Chiu, Chi-Yi Tsai, Po-Hsiang Chang

In the production process of HDMI cables, manual intervention is often required, resulting in low production efficiency and time-consuming. The paper presents a real-time vision-based automatic inspection system for HDMI cables to reduce the labor requirement in the production process. The system consists of hardware and software design. Since the wires in HDMI cables are tiny objects, the hardware design includes an image capture platform with a high-resolution camera and a ring light source to acquire high-resolution and high-quality images of the wires. The software design includes a data augmentation system and an automatic HDMI wire-split inspection system. The former aims to increase the number and diversity of training samples. The latter is designed to detect the coordinate position of the wire center and the corresponding Pin-ID (pid) number and output the results to the wire-bonding machine to perform subsequent tasks. In addition, a new HDMI cable dataset is created to train and evaluate a series of existing detection network models for this study. The experimental results show that the detection accuracy of the wire center using the existing YOLOv4 detector reaches 99.9%. Furthermore, the proposed system reduces the execution time by about 38.67% compared with the traditional manual wire-split inspection operation.

在 HDMI 电缆的生产过程中,往往需要人工干预,导致生产效率低且耗时长。本文介绍了一种基于视觉的 HDMI 电缆实时自动检测系统,以减少生产过程中的人力需求。该系统由硬件和软件设计组成。由于 HDMI 电缆中的电线是微小物体,因此硬件设计包括一个带有高分辨率摄像头和环形光源的图像捕捉平台,以获取高分辨率和高质量的电线图像。软件设计包括一个数据增强系统和一个 HDMI 线缆分路自动检测系统。前者旨在增加训练样本的数量和多样性。后者旨在检测电线中心的坐标位置和相应的 Pin-ID (pid) 编号,并将结果输出给电线绑定机以执行后续任务。此外,本研究还创建了一个新的 HDMI 电缆数据集,用于训练和评估一系列现有的检测网络模型。实验结果表明,使用现有的 YOLOv4 检测器对电线中心的检测准确率达到 99.9%。此外,与传统的人工分线检测操作相比,所提出的系统减少了约 38.67% 的执行时间。
{"title":"Development and validation of a real-time vision-based automatic HDMI wire-split inspection system","authors":"Yu-Chen Chiu, Chi-Yi Tsai, Po-Hsiang Chang","doi":"10.1007/s00371-024-03436-w","DOIUrl":"https://doi.org/10.1007/s00371-024-03436-w","url":null,"abstract":"<p>In the production process of HDMI cables, manual intervention is often required, resulting in low production efficiency and time-consuming. The paper presents a real-time vision-based automatic inspection system for HDMI cables to reduce the labor requirement in the production process. The system consists of hardware and software design. Since the wires in HDMI cables are tiny objects, the hardware design includes an image capture platform with a high-resolution camera and a ring light source to acquire high-resolution and high-quality images of the wires. The software design includes a data augmentation system and an automatic HDMI wire-split inspection system. The former aims to increase the number and diversity of training samples. The latter is designed to detect the coordinate position of the wire center and the corresponding Pin-ID (<i>pid</i>) number and output the results to the wire-bonding machine to perform subsequent tasks. In addition, a new HDMI cable dataset is created to train and evaluate a series of existing detection network models for this study. The experimental results show that the detection accuracy of the wire center using the existing YOLOv4 detector reaches 99.9%. Furthermore, the proposed system reduces the execution time by about 38.67% compared with the traditional manual wire-split inspection operation.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust point cloud normal estimation via multi-level critical point aggregation 通过多级临界点聚合进行稳健的点云法线估算
Pub Date : 2024-06-21 DOI: 10.1007/s00371-024-03532-x
Jun Zhou, Yaoshun Li, Mingjie Wang, Nannan Li, Zhiyang Li, Weixiao Wang

We propose a multi-level critical point aggregation architecture based on a graph attention mechanism for 3D point cloud normal estimation, which can efficiently focus on locally important points during the feature extraction process. Wherein, the local feature aggregation (LFA) module and the global feature refinement (GFR) module are designed to accurately identify critical points which are geometrically closer to tangent plane for surface fitting at both local and global levels. Specifically, the LFA module captures significant local information from neighboring points with strong geometric correlations to the query point in the low-level feature space. The GFR module enhances the exploration of global geometric correlations in the high-level feature space, allowing the network to focus precisely on critical global points. To address indistinguishable features in the low-level space, we implement a stacked LFA structure. This structure transfers essential adjacent information across multiple levels, enabling deep feature aggregation layer by layer. Then the GFR module can leverage robust local geometric information and refines it into comprehensive global features. Our multi-level point-aware architecture improves the stability and accuracy of surface fitting and normal estimation, even in the presence of sharp features, high noise or anisotropic structures. Experimental results demonstrate that our method is competitive and achieves stable performance on both synthetic and real-world datasets. Code is available at https://github.com/CharlesLee96/NormalEstimation.

我们针对三维点云法线估算提出了一种基于图关注机制的多级临界点聚合架构,该架构可在特征提取过程中有效地关注局部重要点。其中,局部特征聚合(LFA)模块和全局特征细化(GFR)模块旨在准确识别几何上更接近切平面的临界点,以便在局部和全局层面进行曲面拟合。具体来说,LFA 模块从低层次特征空间中与查询点具有较强几何相关性的邻近点捕捉重要的局部信息。GFR 模块增强了对高层特征空间中全局几何相关性的探索,使网络能够精确地聚焦于关键的全局点。为了解决低层空间中难以区分的特征,我们采用了堆叠式 LFA 结构。这种结构可以在多个层次上传输重要的相邻信息,从而实现逐层的深度特征聚合。然后,GFR 模块可以利用强大的局部几何信息,并将其提炼为全面的全局特征。我们的多层次点感知架构提高了曲面拟合和法线估计的稳定性和准确性,即使在存在尖锐特征、高噪声或各向异性结构的情况下也是如此。实验结果表明,我们的方法很有竞争力,在合成数据集和实际数据集上都能取得稳定的性能。代码见 https://github.com/CharlesLee96/NormalEstimation。
{"title":"Robust point cloud normal estimation via multi-level critical point aggregation","authors":"Jun Zhou, Yaoshun Li, Mingjie Wang, Nannan Li, Zhiyang Li, Weixiao Wang","doi":"10.1007/s00371-024-03532-x","DOIUrl":"https://doi.org/10.1007/s00371-024-03532-x","url":null,"abstract":"<p>We propose a multi-level critical point aggregation architecture based on a graph attention mechanism for 3D point cloud normal estimation, which can efficiently focus on locally important points during the feature extraction process. Wherein, the local feature aggregation (LFA) module and the global feature refinement (GFR) module are designed to accurately identify critical points which are geometrically closer to tangent plane for surface fitting at both local and global levels. Specifically, the LFA module captures significant local information from neighboring points with strong geometric correlations to the query point in the low-level feature space. The GFR module enhances the exploration of global geometric correlations in the high-level feature space, allowing the network to focus precisely on critical global points. To address indistinguishable features in the low-level space, we implement a stacked LFA structure. This structure transfers essential adjacent information across multiple levels, enabling deep feature aggregation layer by layer. Then the GFR module can leverage robust local geometric information and refines it into comprehensive global features. Our multi-level point-aware architecture improves the stability and accuracy of surface fitting and normal estimation, even in the presence of sharp features, high noise or anisotropic structures. Experimental results demonstrate that our method is competitive and achieves stable performance on both synthetic and real-world datasets. Code is available at https://github.com/CharlesLee96/NormalEstimation.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCLGAN: a multi-style cartoonization method based on style condition information MCLGAN:基于风格条件信息的多风格卡通化方法
Pub Date : 2024-06-21 DOI: 10.1007/s00371-024-03550-9
Canlin Li, Xinyue Wang, Ran Yi, Wenjiao Zhang, Lihua Bi, Lizhuang Ma

Image cartoonization, a special kind of style transformation, is a challenging image processing task. Most existing cartoonization methods aim at single-style transformation. While multiple models are trained to achieve multi-style transformation, which is time-consuming and resource-consuming. Meanwhile, existing multi-style cartoonization methods based on generative adversarial network require multiple discriminators to handle different styles, which increases the complexity of the network. To solve the above issues, this paper proposes an image cartoonization method for multi-style transformation based on style condition information, called MCLGAN. This approach integrates two key components for promoting multi-style image cartoonization. Firstly, we design a conditional generator and a multi-style learning discriminator to embed the style condition information into the feature space, so as to enhance the ability of the model in realizing different cartoon styles. Then the new loss mechanism, the conditional contrastive loss, is used strategically to strengthen the difference between different styles, thus effectively realizing multi-style image cartoonization. At the same time, MCLGAN simplifies the cartoonization process of different styles images, and only needs to train the model once, which significantly improves the efficiency. Numerous experiments verify the validity of our method as well as demonstrate the superiority of our method compared to previous methods.

图像卡通化是一种特殊的样式变换,是一项具有挑战性的图像处理任务。现有的卡通化方法大多以单一风格转换为目标。而实现多风格转换需要训练多个模型,耗时耗力。同时,现有的基于生成式对抗网络的多风格卡通化方法需要多个判别器来处理不同风格,增加了网络的复杂性。为了解决上述问题,本文提出了一种基于风格条件信息的多风格转换图像卡通化方法,称为 MCLGAN。该方法集成了两个关键组件来促进多风格图像卡通化。首先,我们设计了一个条件发生器和一个多风格学习判别器,将风格条件信息嵌入特征空间,从而增强模型实现不同卡通风格的能力。然后,有策略地使用新的损失机制--条件对比损失,强化不同风格之间的差异,从而有效地实现多风格图像卡通化。同时,MCLGAN 简化了不同风格图像的卡通化过程,只需对模型进行一次训练,大大提高了效率。大量实验验证了我们方法的有效性,也证明了我们的方法与之前的方法相比更有优势。
{"title":"MCLGAN: a multi-style cartoonization method based on style condition information","authors":"Canlin Li, Xinyue Wang, Ran Yi, Wenjiao Zhang, Lihua Bi, Lizhuang Ma","doi":"10.1007/s00371-024-03550-9","DOIUrl":"https://doi.org/10.1007/s00371-024-03550-9","url":null,"abstract":"<p>Image cartoonization, a special kind of style transformation, is a challenging image processing task. Most existing cartoonization methods aim at single-style transformation. While multiple models are trained to achieve multi-style transformation, which is time-consuming and resource-consuming. Meanwhile, existing multi-style cartoonization methods based on generative adversarial network require multiple discriminators to handle different styles, which increases the complexity of the network. To solve the above issues, this paper proposes an image cartoonization method for multi-style transformation based on style condition information, called MCLGAN. This approach integrates two key components for promoting multi-style image cartoonization. Firstly, we design a conditional generator and a multi-style learning discriminator to embed the style condition information into the feature space, so as to enhance the ability of the model in realizing different cartoon styles. Then the new loss mechanism, the conditional contrastive loss, is used strategically to strengthen the difference between different styles, thus effectively realizing multi-style image cartoonization. At the same time, MCLGAN simplifies the cartoonization process of different styles images, and only needs to train the model once, which significantly improves the efficiency. Numerous experiments verify the validity of our method as well as demonstrate the superiority of our method compared to previous methods.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital human and embodied intelligence for sports science: advancements, opportunities and prospects 运动科学中的数字人类和体现智能:进步、机遇和前景
Pub Date : 2024-06-21 DOI: 10.1007/s00371-024-03547-4
Xiang Suo, Weidi Tang, Lijuan Mao, Zhen Li

This paper presents a comprehensive review of state-of-the-art motion capture techniques for digital human modeling in sports, including traditional optical motion capture systems, wearable sensor capture systems, computer vision capture systems, and fusion motion capture systems. The review explores the strengths, limitations, and applications of each technique in the context of sports science, such as performance analysis, technique optimization, injury prevention, and interactive training. The paper highlights the significance of accurate and comprehensive motion data acquisition for creating high-fidelity digital human models that can replicate an athlete’s movements and biomechanics. However, several challenges and limitations are identified, such as limited capture volume, marker occlusion, accuracy limitations, lack of diverse datasets, and computational complexity. To address these challenges, the paper emphasizes the need for collaborative efforts from researchers and practitioners across various disciplines. By bridging theory and practice and identifying application-specific challenges and solutions, this review aims to facilitate cross-disciplinary collaboration and guide future research and development efforts in harnessing the power of digital human technology for sports science advancement, ultimately unlocking new possibilities for athlete performance optimization and health.

本文全面评述了用于运动中数字人体建模的最先进的运动捕捉技术,包括传统光学运动捕捉系统、可穿戴传感器捕捉系统、计算机视觉捕捉系统和融合运动捕捉系统。综述探讨了每种技术的优势、局限性以及在运动科学中的应用,如成绩分析、技术优化、损伤预防和互动训练。论文强调了准确而全面的运动数据采集对于创建高保真数字人体模型的重要意义,该模型可以复制运动员的运动和生物力学。然而,本文也指出了一些挑战和局限性,例如捕获量有限、标记闭塞、精度限制、缺乏多样化数据集以及计算复杂性。为了应对这些挑战,本文强调需要各学科研究人员和从业人员的通力合作。通过沟通理论与实践,确定特定应用的挑战和解决方案,本综述旨在促进跨学科合作,指导未来的研究和开发工作,利用数字人体技术的力量促进体育科学的发展,最终为运动员的表现优化和健康开启新的可能性。
{"title":"Digital human and embodied intelligence for sports science: advancements, opportunities and prospects","authors":"Xiang Suo, Weidi Tang, Lijuan Mao, Zhen Li","doi":"10.1007/s00371-024-03547-4","DOIUrl":"https://doi.org/10.1007/s00371-024-03547-4","url":null,"abstract":"<p>This paper presents a comprehensive review of state-of-the-art motion capture techniques for digital human modeling in sports, including traditional optical motion capture systems, wearable sensor capture systems, computer vision capture systems, and fusion motion capture systems. The review explores the strengths, limitations, and applications of each technique in the context of sports science, such as performance analysis, technique optimization, injury prevention, and interactive training. The paper highlights the significance of accurate and comprehensive motion data acquisition for creating high-fidelity digital human models that can replicate an athlete’s movements and biomechanics. However, several challenges and limitations are identified, such as limited capture volume, marker occlusion, accuracy limitations, lack of diverse datasets, and computational complexity. To address these challenges, the paper emphasizes the need for collaborative efforts from researchers and practitioners across various disciplines. By bridging theory and practice and identifying application-specific challenges and solutions, this review aims to facilitate cross-disciplinary collaboration and guide future research and development efforts in harnessing the power of digital human technology for sports science advancement, ultimately unlocking new possibilities for athlete performance optimization and health.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
The Visual Computer
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1