Sketch-based content generation offers flexible controllability, making it a promising narrative avenue in film production. Directors often visualize their imagination by crafting storyboards using sketches and textual descriptions for each shot. However, current video generation methods suffer from three-dimensional inconsistencies, with notably artifacts during large motion or camera pans around scenes. A suitable solution is to directly generate 4D scene, enabling consistent dynamic three-dimensional scenes generation. We define the Sketch-2-4D problem, aiming to enhance controllability and consistency in this context. We propose a novel Control Score Distillation Sampling (SDS-C) for sketch-based 4D scene generation, providing precise control over scene dynamics. We further design Spatial Consistency Modules and Temporal Consistency Modules to tackle the temporal and spatial inconsistencies introduced by sketch-based control, respectively. Extensive experiments have demonstrated the effectiveness of our approach.
Feature lines play a pivotal role in the reconstruction of CAD models. Currently, there is a lack of a robust explicit reconstruction algorithm capable of achieving sharp feature reconstruction in point clouds with noise and non-uniformity. In this paper, we propose a feature-preserving CAD model surface reconstruction algorithm, named FACE. The algorithm initiates with preprocessing the point cloud through denoising and resampling steps, resulting in a high-quality point cloud that is devoid of noise and uniformly distributed. Then, it employs discrete optimal transport to detect feature regions and subsequently generates dense points along potential feature lines to enhance features. Finally, the advancing-front surface reconstruction method, based on normal vector directions, is applied to reconstruct the enhanced point cloud. Extensive experiments demonstrate that, for contaminated point clouds, this algorithm excels not only in reconstructing straight edges and corner points but also in handling curved edges and surfaces, surpassing existing methods.
Mesh-based image vectorization techniques have been studied for a long time, mostly owing to their compactness and flexibility in capturing image features. However, existing methods often lead to relatively dense meshes, especially when applied to images with high-frequency details or textures. We present a novel method that automatically vectorizes an image into a sparse collection of Coons patches whose size adapts to image features. To balance the number of patches and the accuracy of feature alignment, we generate the layout based on a harmonic cross field constrained by image features. We support T-junctions, which keeps the number of patches low and ensures local adaptation to feature density, naturally complemented by varying mesh-color resolution over the patches. Our experimental results demonstrate the utility, accuracy, and sparsity of our method.
The restoration of digital images holds practical significance due to the fact that degradation of digital image data on the internet is common. State-of-the-art image restoration methods usually employ end-to-end trained networks. However, we argue that a network trained with diverse image pairs is not optimal for restoring line drawings which have extensive plain backgrounds. We propose a line-drawing restoration framework which takes a restoration neural network as backbone and processes an input degraded line drawing in two steps. First, a proposed mask-predicting network predicts a line mask which indicates the possible location of foreground and background in the potential original line drawing. Next, we feed the degraded input line drawing together with the predicted line mask into the backbone restoration network. The traditional loss for the backbone restoration network is substituted with a masked Mean Square Error (MSE) loss. We test our framework on two classical image restoration tasks: JPEG restoration and super-resolution, and experiments demonstrate that our framework can achieve better quantitative and visual results in most cases.
Free-form bending belongs to the kinematics-based forming processes and allows the manufacturing of arbitrary 3D-bent components. To obtain the desired part, the tool kinematics is adjusted by comparing the target and obtained bending line. While the target geometry consists of parametric CAD data, the obtained geometry is a surface mesh, making the bending line extraction a challenging task. In this paper the reconstruction of the bending line for free-form bent components is presented. The strategy relies on the extraction of the centroids, for which a ray casting algorithm is developed and compared to an existing Voronoi-based method. Subsequently the obtained points are used to fit a NURBS parametric model of the curve. The algorithm parameters are investigated with a sensitivity analysis, and its performance is evaluated with a defined error metric. Finally, the strategy is validated comparing its results with a Voronoi-based algorithm, and investigating different cross-sections and geometries.
With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a “try on” option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.