3D point cloud generation plays a pivotal role in a wide range of applications, including robotics, medical imaging, autonomous driving, and virtual/augmented reality (VR/AR). However, generating high-quality point clouds remains highly challenging due to the irregularity and unordered nature of point cloud data. Existing Transformer-based generative models suffer from quadratic computational complexity, which limits their ability to capture global contextual dependencies and often leads to the loss of critical geometric information. To address these limitations, we propose a novel diffusion-based framework for point cloud generation that integrates the Mamba state-space model — known for its linear complexity and strong long-sequence modeling capability — with convolutional layers. Specifically, Mamba is employed to capture global structural dependencies across time steps, while the convolutional layers refine local geometric details. To effectively leverage the strengths of both components, we introduce a learnable masking mechanism that dynamically fuses global and local features at optimal time steps, thereby exploiting their complementary advantages. Extensive experiments demonstrate that our model outperforms previous point cloud generative approaches such as TIGER and PVD in terms of both quality and diversity. On the airplane category, our model achieves a 9.28% improvement in 1-NNA accuracy based on EMD compared to PVD, and a 1.72% improvement based on CD compared to TIGER. Compared with recent baseline models, our method consistently achieves significant gains across multiple evaluation metrics.
扫码关注我们
求助内容:
应助结果提醒方式:
