In this work, we present a new deep generative model for disentangling image shape from its appearance through differentiable warping. We propose to use implicit neural representations for modeling the deformation field and show that coordinate-based representations hold the necessary inductive bias. Unlike the previous warping-based approaches, which tend to model only local and small-scale displacements, our method is able to learn complex deformations and is not restricted to reversible mappings. We study the convergence of warping-based generative models and find that the high-frequency nature of the textures leads to shattered learning gradients, slow convergence, and suboptimal solutions. To cope with this problem, we propose to use invertible blurring, which smooths the gradients and leads to improved results. As a way to further facilitate the convergence of warping, we train the deformation module jointly as a vanilla GAN generator to guide the learning process in a self-distillation manner. Our complete pipeline shows decent results on the LSUN churches dataset. Finally, we demonstrate various applications of our model, like composable texture editing, controllable deformation editing, and keypoint detection.