The functional maps framework has achieved remarkable success in non-rigid shape matching. However, the traditional functional map representations do not explicitly encode surface orientation, which can easily lead to orientation-reversing correspondence. The complex functional map addresses this issue by linking oriented tangent bundles to favor orientation-preserving correspondence. Nevertheless, the absence of effective restrictions on the complex functional maps hinders them from obtaining high-quality correspondences. To this end, we introduce novel and powerful constraints to determine complex functional maps by incorporating multiple complex spectral filter operator preservation constraints with a rigorous theoretical guarantee. Such constraints encode the surface orientation information and enforce the isometric property of the map. Based on these constraints, we propose a novel and efficient method to obtain orientation-preserving and accurate correspondences across shapes by alternatively updating the functional maps, complex functional maps, and pointwise maps. Extensive experiments demonstrate our significant improvements in correspondence quality and computing efficiency. In addition, our constraints can be easily adapted to other functional maps-based methods to enhance their performance.
{"title":"Deformable shape matching with multiple complex spectral filter operator preservation","authors":"Qinsong Li, Yueyu Guo, Xinru Liu, Ling Hu, Feifan Luo, Shengjun Liu","doi":"10.1007/s00371-024-03487-z","DOIUrl":"https://doi.org/10.1007/s00371-024-03487-z","url":null,"abstract":"<p>The functional maps framework has achieved remarkable success in non-rigid shape matching. However, the traditional functional map representations do not explicitly encode surface orientation, which can easily lead to orientation-reversing correspondence. The complex functional map addresses this issue by linking oriented tangent bundles to favor orientation-preserving correspondence. Nevertheless, the absence of effective restrictions on the complex functional maps hinders them from obtaining high-quality correspondences. To this end, we introduce novel and powerful constraints to determine complex functional maps by incorporating multiple complex spectral filter operator preservation constraints with a rigorous theoretical guarantee. Such constraints encode the surface orientation information and enforce the isometric property of the map. Based on these constraints, we propose a novel and efficient method to obtain orientation-preserving and accurate correspondences across shapes by alternatively updating the functional maps, complex functional maps, and pointwise maps. Extensive experiments demonstrate our significant improvements in correspondence quality and computing efficiency. In addition, our constraints can be easily adapted to other functional maps-based methods to enhance their performance.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1007/s00371-024-03464-6
Xuchen Wei, GuiYang Pu, Yuchi Huo, Hujun Bao, Rui Wang
The rendering efficiency of Monte Carlo path tracing often depends on the ease of path construction. For scenes with particularly complex visibility, e.g. where the camera and light sources are placed in separate rooms connected by narrow doorways or windows, it is difficult to construct valid paths using traditional path tracing algorithms such as unidirectional path tracing or bidirectional path tracing. Light portal is a class of methods that assist in sampling direct light paths based on prior knowledge of the scene. It usually requires additional manual editing and labelling by the artist or renderer user. Tri-directional path tracing is a sophisticated path tracing algorithm that combines bidirectional path tracing and light portals sampling, but the original work lacks sufficient analysis to demonstrate its effectiveness. In this paper, we propose an automatic light portal generation algorithm based on spatial radiosity analysis that mitigates the cost of manual operations for complex scenes. We also further analyse and improve the light portal-based tri-directional path tracing rendering algorithm, giving a detailed analysis of path construction strategies, algorithm complexity, and the unbiasedness of the Monte Carlo estimation. The experimental results show that our algorithm can accurately locate the light portals with low computational cost and effectively improve the rendering performance of complex scenes.
{"title":"Refined tri-directional path tracing with generated light portal","authors":"Xuchen Wei, GuiYang Pu, Yuchi Huo, Hujun Bao, Rui Wang","doi":"10.1007/s00371-024-03464-6","DOIUrl":"https://doi.org/10.1007/s00371-024-03464-6","url":null,"abstract":"<p>The rendering efficiency of Monte Carlo path tracing often depends on the ease of path construction. For scenes with particularly complex visibility, e.g. where the camera and light sources are placed in separate rooms connected by narrow doorways or windows, it is difficult to construct valid paths using traditional path tracing algorithms such as unidirectional path tracing or bidirectional path tracing. Light portal is a class of methods that assist in sampling direct light paths based on prior knowledge of the scene. It usually requires additional manual editing and labelling by the artist or renderer user. Tri-directional path tracing is a sophisticated path tracing algorithm that combines bidirectional path tracing and light portals sampling, but the original work lacks sufficient analysis to demonstrate its effectiveness. In this paper, we propose an automatic light portal generation algorithm based on spatial radiosity analysis that mitigates the cost of manual operations for complex scenes. We also further analyse and improve the light portal-based tri-directional path tracing rendering algorithm, giving a detailed analysis of path construction strategies, algorithm complexity, and the unbiasedness of the Monte Carlo estimation. The experimental results show that our algorithm can accurately locate the light portals with low computational cost and effectively improve the rendering performance of complex scenes.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"146 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1007/s00371-024-03539-4
Anqi Chen, Ming Li, Yang Gao
Working memory is crucial for higher cognitive functions in humans and is a focus in cognitive rehabilitation. Compared to conventional working memory training methods, VR-based training provides a more immersive experience with realistic scenarios, offering enhanced transferability to daily life. However, existing VR-based training methods often focus on basic cognitive tasks, underutilize VR’s realism, and rely heavily on subjective assessment methods. In this paper, we introduce a VR Sandbox for working memory training and evaluation, MEM-Box, which simulates everyday life scenarios and routines and adaptively adjusts task difficulty based on user performance. We conducted a training experiment utilizing the MEM-Box and compared it with a control group undergoing PC-based training. The results of the Stroop test indicate that both groups demonstrated improvements in working memory abilities, with MEM-Box training showing greater efficacy. Physiological data confirmed the effectiveness of the MEM-Box, as we observed lower HRV and SDNN. Furthermore, the results of the frequency-domain analysis indicate higher sympathetic nervous system activity (LFpower and LF/HF) during MEM-Box training, which is related to the higher sense of presence in VR. These metrics pave the way for building adaptive VR systems based on physiological data.
{"title":"Mem-Box: VR sandbox for adaptive working memory evaluation and training using physiological signals","authors":"Anqi Chen, Ming Li, Yang Gao","doi":"10.1007/s00371-024-03539-4","DOIUrl":"https://doi.org/10.1007/s00371-024-03539-4","url":null,"abstract":"<p>Working memory is crucial for higher cognitive functions in humans and is a focus in cognitive rehabilitation. Compared to conventional working memory training methods, VR-based training provides a more immersive experience with realistic scenarios, offering enhanced transferability to daily life. However, existing VR-based training methods often focus on basic cognitive tasks, underutilize VR’s realism, and rely heavily on subjective assessment methods. In this paper, we introduce a VR Sandbox for working memory training and evaluation, MEM-Box, which simulates everyday life scenarios and routines and adaptively adjusts task difficulty based on user performance. We conducted a training experiment utilizing the MEM-Box and compared it with a control group undergoing PC-based training. The results of the Stroop test indicate that both groups demonstrated improvements in working memory abilities, with MEM-Box training showing greater efficacy. Physiological data confirmed the effectiveness of the MEM-Box, as we observed lower HRV and SDNN. Furthermore, the results of the frequency-domain analysis indicate higher sympathetic nervous system activity (LFpower and LF/HF) during MEM-Box training, which is related to the higher sense of presence in VR. These metrics pave the way for building adaptive VR systems based on physiological data.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-22DOI: 10.1007/s00371-024-03475-3
Saskia Rabich, Patrick Stotko, Reinhard Klein
Fourier PlenOctrees have shown to be an efficient representation for real-time rendering of dynamic neural radiance fields (NeRF). Despite its many advantages, this method suffers from artifacts introduced by the involved compression when combining it with recent state-of-the-art techniques for training the static per-frame NeRF models. In this paper, we perform an in-depth analysis of these artifacts and leverage the resulting insights to propose an improved representation. In particular, we present a novel density encoding that adapts the Fourier-based compression to the characteristics of the transfer function used by the underlying volume rendering procedure and leads to a substantial reduction of artifacts in the dynamic model. We demonstrate the effectiveness of our enhanced Fourier PlenOctrees in the scope of quantitative and qualitative evaluations on synthetic and real-world scenes.
{"title":"FPO++: efficient encoding and rendering of dynamic neural radiance fields by analyzing and enhancing Fourier PlenOctrees","authors":"Saskia Rabich, Patrick Stotko, Reinhard Klein","doi":"10.1007/s00371-024-03475-3","DOIUrl":"https://doi.org/10.1007/s00371-024-03475-3","url":null,"abstract":"<p>Fourier PlenOctrees have shown to be an efficient representation for real-time rendering of dynamic neural radiance fields (NeRF). Despite its many advantages, this method suffers from artifacts introduced by the involved compression when combining it with recent state-of-the-art techniques for training the static per-frame NeRF models. In this paper, we perform an in-depth analysis of these artifacts and leverage the resulting insights to propose an improved representation. In particular, we present a novel density encoding that adapts the Fourier-based compression to the characteristics of the transfer function used by the underlying volume rendering procedure and leads to a substantial reduction of artifacts in the dynamic model. We demonstrate the effectiveness of our enhanced Fourier PlenOctrees in the scope of quantitative and qualitative evaluations on synthetic and real-world scenes.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-22DOI: 10.1007/s00371-024-03498-w
Chao Song, Qingjie Chen, Frederick W. B. Li, Zhaoyi Jiang, Dong Zheng, Yuliang Shen, Bailin Yang
Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditional methods for single-image depth estimation encounter limitations in photometric loss due to a lack of geometric constraints, reliance on pixel-level intensity or color differences, and the assumption of perfect photometric consistency, leading to errors in challenging conditions and resulting in overly smooth depth maps with insufficient capture of object boundaries and depth transitions. To tackle these challenges, we propose MFFENet, which leverages multi-level semantic and boundary-aware features to improve depth estimation accuracy. MFFENet extracts multi-level semantic features using our modified HRFormer approach. These features are then fed into our decoder and enhanced using attention mechanisms to enrich the boundary information generated by Laplacian pyramid residuals. To mitigate the weakening of semantic features during convolution processes, we introduce a feature-enhanced combination strategy. We also integrate the DeconvUp module to improve the restoration of depth map boundaries. We introduce a boundary loss that enforces constraints between object boundaries. We propose an extended evaluation method that utilizes Laplacian pyramid residuals to evaluate boundary depth. Extensive evaluations on the KITTI, Cityscapes, and Make3D datasets demonstrate the superior performance of MFFENet compared to state-of-the-art models in monocular depth estimation.
{"title":"Multi-feature fusion enhanced monocular depth estimation with boundary awareness","authors":"Chao Song, Qingjie Chen, Frederick W. B. Li, Zhaoyi Jiang, Dong Zheng, Yuliang Shen, Bailin Yang","doi":"10.1007/s00371-024-03498-w","DOIUrl":"https://doi.org/10.1007/s00371-024-03498-w","url":null,"abstract":"<p>Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditional methods for single-image depth estimation encounter limitations in photometric loss due to a lack of geometric constraints, reliance on pixel-level intensity or color differences, and the assumption of perfect photometric consistency, leading to errors in challenging conditions and resulting in overly smooth depth maps with insufficient capture of object boundaries and depth transitions. To tackle these challenges, we propose MFFENet, which leverages multi-level semantic and boundary-aware features to improve depth estimation accuracy. MFFENet extracts multi-level semantic features using our modified HRFormer approach. These features are then fed into our decoder and enhanced using attention mechanisms to enrich the boundary information generated by Laplacian pyramid residuals. To mitigate the weakening of semantic features during convolution processes, we introduce a feature-enhanced combination strategy. We also integrate the DeconvUp module to improve the restoration of depth map boundaries. We introduce a boundary loss that enforces constraints between object boundaries. We propose an extended evaluation method that utilizes Laplacian pyramid residuals to evaluate boundary depth. Extensive evaluations on the KITTI, Cityscapes, and Make3D datasets demonstrate the superior performance of MFFENet compared to state-of-the-art models in monocular depth estimation.\u0000</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-22DOI: 10.1007/s00371-024-03470-8
Munish Bhardwaj, Nafis Uddin Khan, Vikas Baghel
Road cracks are quickly becoming one of the world's most serious concerns. It may have an impact on traffic safety and increase the likelihood of road accidents. A significant amount of money is spent each year for road repair and upkeep. This cost can be lowered if the cracks are discovered in good time. However, detection takes longer and is less precise when done manually. Because of ambient noise, intensity in-homogeneity, and low contrast, crack identification is a complex technique for automatic processes. As a result, several techniques have been developed in the past to pinpoint the specific site of the crack. In this research, a novel fuzzy C-means clustering algorithm is proposed that will detect fractures automatically by adding optimal edge pixels utilizing a second-order difference and intensity-based edge and non-edge fuzzy factors. This technique provides information of the intensity of edge and non-edge pixels, allowing it to recognize edges even when the image has little contrast. This method does not necessitate the use of any data set to train the model and no any critical parameter optimization is required. As a result, it can recognize edges or fissures even in novel or previously unknown input pictures of different environments. The experimental results reveal that the unique fuzzy C-means clustering-based segmentation method beats many of the existing methods used for detecting alligator, transverse, and longitudinal fractures from road photos in terms of precession, recall, and F1 score, PSNR, and execution time.
{"title":"Road crack detection using pixel classification and intensity-based distinctive fuzzy C-means clustering","authors":"Munish Bhardwaj, Nafis Uddin Khan, Vikas Baghel","doi":"10.1007/s00371-024-03470-8","DOIUrl":"https://doi.org/10.1007/s00371-024-03470-8","url":null,"abstract":"<p>Road cracks are quickly becoming one of the world's most serious concerns. It may have an impact on traffic safety and increase the likelihood of road accidents. A significant amount of money is spent each year for road repair and upkeep. This cost can be lowered if the cracks are discovered in good time. However, detection takes longer and is less precise when done manually. Because of ambient noise, intensity in-homogeneity, and low contrast, crack identification is a complex technique for automatic processes. As a result, several techniques have been developed in the past to pinpoint the specific site of the crack. In this research, a novel fuzzy C-means clustering algorithm is proposed that will detect fractures automatically by adding optimal edge pixels utilizing a second-order difference and intensity-based edge and non-edge fuzzy factors. This technique provides information of the intensity of edge and non-edge pixels, allowing it to recognize edges even when the image has little contrast. This method does not necessitate the use of any data set to train the model and no any critical parameter optimization is required. As a result, it can recognize edges or fissures even in novel or previously unknown input pictures of different environments. The experimental results reveal that the unique fuzzy C-means clustering-based segmentation method beats many of the existing methods used for detecting alligator, transverse, and longitudinal fractures from road photos in terms of precession, recall, and F1 score, PSNR, and execution time.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1007/s00371-024-03436-w
Yu-Chen Chiu, Chi-Yi Tsai, Po-Hsiang Chang
In the production process of HDMI cables, manual intervention is often required, resulting in low production efficiency and time-consuming. The paper presents a real-time vision-based automatic inspection system for HDMI cables to reduce the labor requirement in the production process. The system consists of hardware and software design. Since the wires in HDMI cables are tiny objects, the hardware design includes an image capture platform with a high-resolution camera and a ring light source to acquire high-resolution and high-quality images of the wires. The software design includes a data augmentation system and an automatic HDMI wire-split inspection system. The former aims to increase the number and diversity of training samples. The latter is designed to detect the coordinate position of the wire center and the corresponding Pin-ID (pid) number and output the results to the wire-bonding machine to perform subsequent tasks. In addition, a new HDMI cable dataset is created to train and evaluate a series of existing detection network models for this study. The experimental results show that the detection accuracy of the wire center using the existing YOLOv4 detector reaches 99.9%. Furthermore, the proposed system reduces the execution time by about 38.67% compared with the traditional manual wire-split inspection operation.
{"title":"Development and validation of a real-time vision-based automatic HDMI wire-split inspection system","authors":"Yu-Chen Chiu, Chi-Yi Tsai, Po-Hsiang Chang","doi":"10.1007/s00371-024-03436-w","DOIUrl":"https://doi.org/10.1007/s00371-024-03436-w","url":null,"abstract":"<p>In the production process of HDMI cables, manual intervention is often required, resulting in low production efficiency and time-consuming. The paper presents a real-time vision-based automatic inspection system for HDMI cables to reduce the labor requirement in the production process. The system consists of hardware and software design. Since the wires in HDMI cables are tiny objects, the hardware design includes an image capture platform with a high-resolution camera and a ring light source to acquire high-resolution and high-quality images of the wires. The software design includes a data augmentation system and an automatic HDMI wire-split inspection system. The former aims to increase the number and diversity of training samples. The latter is designed to detect the coordinate position of the wire center and the corresponding Pin-ID (<i>pid</i>) number and output the results to the wire-bonding machine to perform subsequent tasks. In addition, a new HDMI cable dataset is created to train and evaluate a series of existing detection network models for this study. The experimental results show that the detection accuracy of the wire center using the existing YOLOv4 detector reaches 99.9%. Furthermore, the proposed system reduces the execution time by about 38.67% compared with the traditional manual wire-split inspection operation.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1007/s00371-024-03532-x
Jun Zhou, Yaoshun Li, Mingjie Wang, Nannan Li, Zhiyang Li, Weixiao Wang
We propose a multi-level critical point aggregation architecture based on a graph attention mechanism for 3D point cloud normal estimation, which can efficiently focus on locally important points during the feature extraction process. Wherein, the local feature aggregation (LFA) module and the global feature refinement (GFR) module are designed to accurately identify critical points which are geometrically closer to tangent plane for surface fitting at both local and global levels. Specifically, the LFA module captures significant local information from neighboring points with strong geometric correlations to the query point in the low-level feature space. The GFR module enhances the exploration of global geometric correlations in the high-level feature space, allowing the network to focus precisely on critical global points. To address indistinguishable features in the low-level space, we implement a stacked LFA structure. This structure transfers essential adjacent information across multiple levels, enabling deep feature aggregation layer by layer. Then the GFR module can leverage robust local geometric information and refines it into comprehensive global features. Our multi-level point-aware architecture improves the stability and accuracy of surface fitting and normal estimation, even in the presence of sharp features, high noise or anisotropic structures. Experimental results demonstrate that our method is competitive and achieves stable performance on both synthetic and real-world datasets. Code is available at https://github.com/CharlesLee96/NormalEstimation.
{"title":"Robust point cloud normal estimation via multi-level critical point aggregation","authors":"Jun Zhou, Yaoshun Li, Mingjie Wang, Nannan Li, Zhiyang Li, Weixiao Wang","doi":"10.1007/s00371-024-03532-x","DOIUrl":"https://doi.org/10.1007/s00371-024-03532-x","url":null,"abstract":"<p>We propose a multi-level critical point aggregation architecture based on a graph attention mechanism for 3D point cloud normal estimation, which can efficiently focus on locally important points during the feature extraction process. Wherein, the local feature aggregation (LFA) module and the global feature refinement (GFR) module are designed to accurately identify critical points which are geometrically closer to tangent plane for surface fitting at both local and global levels. Specifically, the LFA module captures significant local information from neighboring points with strong geometric correlations to the query point in the low-level feature space. The GFR module enhances the exploration of global geometric correlations in the high-level feature space, allowing the network to focus precisely on critical global points. To address indistinguishable features in the low-level space, we implement a stacked LFA structure. This structure transfers essential adjacent information across multiple levels, enabling deep feature aggregation layer by layer. Then the GFR module can leverage robust local geometric information and refines it into comprehensive global features. Our multi-level point-aware architecture improves the stability and accuracy of surface fitting and normal estimation, even in the presence of sharp features, high noise or anisotropic structures. Experimental results demonstrate that our method is competitive and achieves stable performance on both synthetic and real-world datasets. Code is available at https://github.com/CharlesLee96/NormalEstimation.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1007/s00371-024-03550-9
Canlin Li, Xinyue Wang, Ran Yi, Wenjiao Zhang, Lihua Bi, Lizhuang Ma
Image cartoonization, a special kind of style transformation, is a challenging image processing task. Most existing cartoonization methods aim at single-style transformation. While multiple models are trained to achieve multi-style transformation, which is time-consuming and resource-consuming. Meanwhile, existing multi-style cartoonization methods based on generative adversarial network require multiple discriminators to handle different styles, which increases the complexity of the network. To solve the above issues, this paper proposes an image cartoonization method for multi-style transformation based on style condition information, called MCLGAN. This approach integrates two key components for promoting multi-style image cartoonization. Firstly, we design a conditional generator and a multi-style learning discriminator to embed the style condition information into the feature space, so as to enhance the ability of the model in realizing different cartoon styles. Then the new loss mechanism, the conditional contrastive loss, is used strategically to strengthen the difference between different styles, thus effectively realizing multi-style image cartoonization. At the same time, MCLGAN simplifies the cartoonization process of different styles images, and only needs to train the model once, which significantly improves the efficiency. Numerous experiments verify the validity of our method as well as demonstrate the superiority of our method compared to previous methods.
{"title":"MCLGAN: a multi-style cartoonization method based on style condition information","authors":"Canlin Li, Xinyue Wang, Ran Yi, Wenjiao Zhang, Lihua Bi, Lizhuang Ma","doi":"10.1007/s00371-024-03550-9","DOIUrl":"https://doi.org/10.1007/s00371-024-03550-9","url":null,"abstract":"<p>Image cartoonization, a special kind of style transformation, is a challenging image processing task. Most existing cartoonization methods aim at single-style transformation. While multiple models are trained to achieve multi-style transformation, which is time-consuming and resource-consuming. Meanwhile, existing multi-style cartoonization methods based on generative adversarial network require multiple discriminators to handle different styles, which increases the complexity of the network. To solve the above issues, this paper proposes an image cartoonization method for multi-style transformation based on style condition information, called MCLGAN. This approach integrates two key components for promoting multi-style image cartoonization. Firstly, we design a conditional generator and a multi-style learning discriminator to embed the style condition information into the feature space, so as to enhance the ability of the model in realizing different cartoon styles. Then the new loss mechanism, the conditional contrastive loss, is used strategically to strengthen the difference between different styles, thus effectively realizing multi-style image cartoonization. At the same time, MCLGAN simplifies the cartoonization process of different styles images, and only needs to train the model once, which significantly improves the efficiency. Numerous experiments verify the validity of our method as well as demonstrate the superiority of our method compared to previous methods.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1007/s00371-024-03547-4
Xiang Suo, Weidi Tang, Lijuan Mao, Zhen Li
This paper presents a comprehensive review of state-of-the-art motion capture techniques for digital human modeling in sports, including traditional optical motion capture systems, wearable sensor capture systems, computer vision capture systems, and fusion motion capture systems. The review explores the strengths, limitations, and applications of each technique in the context of sports science, such as performance analysis, technique optimization, injury prevention, and interactive training. The paper highlights the significance of accurate and comprehensive motion data acquisition for creating high-fidelity digital human models that can replicate an athlete’s movements and biomechanics. However, several challenges and limitations are identified, such as limited capture volume, marker occlusion, accuracy limitations, lack of diverse datasets, and computational complexity. To address these challenges, the paper emphasizes the need for collaborative efforts from researchers and practitioners across various disciplines. By bridging theory and practice and identifying application-specific challenges and solutions, this review aims to facilitate cross-disciplinary collaboration and guide future research and development efforts in harnessing the power of digital human technology for sports science advancement, ultimately unlocking new possibilities for athlete performance optimization and health.
{"title":"Digital human and embodied intelligence for sports science: advancements, opportunities and prospects","authors":"Xiang Suo, Weidi Tang, Lijuan Mao, Zhen Li","doi":"10.1007/s00371-024-03547-4","DOIUrl":"https://doi.org/10.1007/s00371-024-03547-4","url":null,"abstract":"<p>This paper presents a comprehensive review of state-of-the-art motion capture techniques for digital human modeling in sports, including traditional optical motion capture systems, wearable sensor capture systems, computer vision capture systems, and fusion motion capture systems. The review explores the strengths, limitations, and applications of each technique in the context of sports science, such as performance analysis, technique optimization, injury prevention, and interactive training. The paper highlights the significance of accurate and comprehensive motion data acquisition for creating high-fidelity digital human models that can replicate an athlete’s movements and biomechanics. However, several challenges and limitations are identified, such as limited capture volume, marker occlusion, accuracy limitations, lack of diverse datasets, and computational complexity. To address these challenges, the paper emphasizes the need for collaborative efforts from researchers and practitioners across various disciplines. By bridging theory and practice and identifying application-specific challenges and solutions, this review aims to facilitate cross-disciplinary collaboration and guide future research and development efforts in harnessing the power of digital human technology for sports science advancement, ultimately unlocking new possibilities for athlete performance optimization and health.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}