The primary aim of image-based virtual try-on is to seamlessly deform the target garment image to align with the human body. Owing to the inherent non-rigid nature of garments, current methods prioritise flexible deformation through appearance flow with high degrees of freedom. However, existing appearance flow estimation methods solely focus on the correlation of local feature information. While this strategy successfully avoids the extensive computational effort associated with the direct computation of the global information correlation of feature maps, it leads to challenges in garments adapting to large deformation scenarios. To overcome these limitations, we propose the GIC-Flow framework, which obtains appearance flow by calculating the global information correlation while reducing computational regression. Specifically, our proposed global streak information matching module is designed to decompose the appearance flow into horizontal and vertical vectors, effectively propagating global information in both directions. This innovative approach considerably diminishes computational requirements, contributing to an enhanced and efficient process. In addition, to ensure the accurate deformation of local texture in garments, we propose the local aggregate information matching module to aggregate information from the nearest neighbours before computing the global correlation and to enhance weak semantic information. Comprehensive experiments conducted using our method on the VITON and VITON-HD datasets show that GIC-Flow outperforms existing state-of-the-art algorithms, particularly in cases involving complex garment deformation.
Surface reconstruction for point clouds is a central task in 3D modeling. Recently, the attractive approaches solve this problem by learning neural implicit representations, e.g., unsigned distance functions (UDFs), from point clouds, which have achieved good performance. However, the existing UDF-based methods still struggle to recover the local geometrical details. One of the difficulties arises from the used inflexible representations, which is hard to capture the local high-fidelity geometry details. In this paper, we propose a novel neural implicit representation, named MuSic-UDF, which leverages Multi-Scale dynamic grids for high-fidelity and flexible surface reconstruction from raw point clouds with arbitrary typologies. Specifically, we initialize a hierarchical voxel grid where each grid point stores a learnable 3D coordinate. Then, we optimize these grids such that different levels of geometry structures can be captured adaptively. To further explore the geometry details, we introduce a frequency encoding strategy to hierarchically encode these coordinates. MuSic-UDF does not require any supervisions like ground truth distance values or point normals. We conduct comprehensive experiments under widely-used benchmarks, where the results demonstrate the superior performance of our proposed method compared to the state-of-the-art methods.
In various situations, such as clinical environments with sterile conditions or when hands are occupied with multiple devices, traditional methods of navigation and scene adjustment are impractical or even impossible. We explore a new solution by using voice control to facilitate interaction in virtual worlds to avoid the use of additional controllers. Therefore, we investigate three scenarios: Object Orientation, Visualization Customization, and Analytical Tasks and evaluate whether natural language interaction is possible and promising in each of these scenarios. In our quantitative user study participants were able to control virtual environments effortlessly using verbal instructions. This resulted in rapid orientation adjustments, adaptive visual aids, and accurate data analysis. In addition, user satisfaction and usability surveys showed consistently high levels of acceptance and ease of use. In conclusion, our study shows that the use of natural language can be a promising alternative for the improvement of user interaction in virtual environments. It enables intuitive interactions in virtual spaces, especially in situations where traditional controls have limitations.
Aortic dissection is a rare disease affecting the aortic wall layers splitting the aortic lumen into two flow channels: the true and false lumen. The rarity of the disease leads to a sparsity of available datasets resulting in a low amount of available training data for in-silico studies or the training of machine learning algorithms. To mitigate this issue, we use statistical shape modeling to create a database of Stanford type B dissection surface meshes. We account for the complex disease anatomy by modeling two separate flow channels in the aorta, the true and false lumen. Former approaches mainly modeled the aortic arch including its branches but not two separate flow channels inside the aorta. To our knowledge, our approach is the first to attempt generating synthetic aortic dissection surface meshes. For the statistical shape model, the aorta is parameterized using the centerlines of the respective lumen and the according ellipses describing the cross-section of the lumen while being aligned along the centerline employing rotation-minimizing frames. To evaluate our approach we introduce disease-specific quality criteria by investigating the torsion and twist of the true lumen.
The success of a structure preserving filtering technique has relied on its capability to recognize structures and textures present in the input image. In this paper a novel structure preserving filtering technique is presented that first, generates an edge-map of the input image by exploiting semantic information. Then, an edge-aware adaptive recursive median filter is utilized to produce the filter image. The technique provides satisfactory results for a wide variety of images with minimal fine-tuning of its parameters. Moreover, along with the various computer graphics applications the proposed technique also shows its robustness to incorporate spatial information for spectral-spatial classification of hyperspectral images. A MATLAB implementation of the proposed technique is available at-https://www.github.com/K-Pradhan/A-semantic-edge-aware-parameter-efficient-image-filtering-technique
Software testing is a vital tool to ensure the quality and trustworthiness of the pieces of software produced. Test suites are often large, which makes the process of testing software a costly and time-consuming process. In this context, test case prioritization (TCP) methods play an important role by ranking test cases in order to enable early fault detection and, hence, enable quicker problem fixes. The evaluation of such methods is a difficult problem, due to the variety of the methods and objectives. To address this issue, we present TPVis, a visual analytics framework that enables the evaluation and comparison of TCP methods designed in collaboration with experts in software testing. Our solution is an open-source web application that provides a variety of analytical tools to assist in the exploration of test suites and prioritization algorithms. Furthermore, TPVis also provides dashboard presets, that were validated with our domain collaborators, that support common analysis goals. We illustrate the usefulness of TPVis through a series of use cases that illustrate our system’s flexibility in addressing different problems in analyzing TCP methods. Finally, we also report on feedback received from the domain experts that indicate the effectiveness of TPVis. TPVis is available at https://github.com/vixe-cin-ufpe/TPVis.
In volume visualization, a transfer function tailored for one volume usually does not work for other similar volumes without careful tuning. This process can be tedious and time-consuming for a large set of volumes. In this work, we present a novel approach to transfer function optimization based on the differentiable volume rendering of a reference volume and its corresponding transfer function. Using two fully connected neural networks, our approach learns a continuous 2D separable transfer function that visualizes the features of interest with consistent visual properties between the volumes. Because many volume visualization software packages support separable transfer functions, users can export the optimized transfer function into a domain-specific application for further interactions. In tandem with domain experts’ input and assessments, we present two use cases to demonstrate the effectiveness of our approach. The first use case tracks the effect of an asteroid blast near the ocean surface. In this application, a volume and its corresponding transfer function seed our method, cascading transfer function optimization for the proceeding time steps. The second use case focuses on the visualization of white matter, gray matter, and cerebrospinal fluid in magnetic resonance imaging (MRI) volumes. We optimize an intensity-gradient transfer function for one volume from its segmentation. Then we use these results to visualize other brain volumes with different intensity ranges acquired on different MRI machines.
Translating written sentences from oral languages to a sequence of manual and non-manual gestures plays a crucial role in building a more inclusive society for deaf and hard-of-hearing people. Facial expressions (non-manual), in particular, are responsible for encoding the grammar of the sentence to be spoken, applying punctuation, pronouns, or emphasizing signs. These non-manual gestures are closely related to the semantics of the sentence being spoken and also to the utterance of the speaker’s emotions. However, most Sign Language Production (SLP) approaches are centered on synthesizing manual gestures and do not focus on modeling the speaker’s expression. This paper introduces a new method focused in synthesizing facial expressions for sign language. Our goal is to improve sign language production by integrating sentiment information in facial expression generation. The approach leverages a sentence’s sentiment and semantic features to sample from a meaningful representation space, integrating the bias of the non-manual components into the sign language production process. To evaluate our method, we extend the Fréchet gesture distance (FGD) and propose a new metric called Fréchet Expression Distance (FED) and apply an extensive set of metrics to assess the quality of specific regions of the face. The experimental results showed that our method achieved state of the art, being superior to the competitors on How2Sign and PHOENIX14T datasets. Moreover, our architecture is based on a carefully designed graph pyramid that makes it simpler, easier to train, and capable of leveraging emotions to produce facial expressions. Our code and pretrained models will be available at: https://github.com/verlab/empowering-sign-language.
For classification tasks, several strategies aim to tackle the problem of not having sufficient labeled data, usually by automatic labeling or by fully passing this task to a user. Automatic labeling is simple to apply but can fail handling complex situations where human insights may be required to decide the correct labels. Conversely, manual labeling leverages the expertise of specialists but may waste precious effort which could be handled by automatic methods. More specifically, automatic solutions could be improved by combining an active learning loop with manual labeling assisted by visual depictions of a classifier’s behavior. We propose to include the human in the labeling loop by using manual labeling in feature spaces produced by a deep feature annotation (DeepFA) technique. To assist manual labeling, we provide users with visual insights on the classifier’s decision boundaries. Finally, we use the manual and automatically computed labels jointly to retrain the classifier in an active learning (AL) loop scheme. Experiments using a toy and a real-world application dataset show that our proposed combination of manual labeling supported by visualization of decision boundaries and automatic labeling can yield a significant increase in classifier performance with a quite limited user effort.