The training of ResNets and neural ODEs can be formulated and analyzed from the perspective of optimal control. This paper proposes a dissipative formulation of the training of ResNets and neural ODEs for classification problems. Specifically, we consider a variant of the cross-entropy (label smoothing) as a loss function and as a regularization in the stage cost. Based on our dissipative formulation of the training, we prove that the training OCPs for ResNets and neural ODEs alike exhibit the turnpike phenomenon. We illustrate this finding with numerical results for the two spirals and MNIST datasets. Crucially, our training formulation ensures that the transformation of the data from input to output is achieved in the first layers. In the following layers, which constitute the turnpike, the data remains at an equilibrium state and therefore these layers do not contribute to the transformation learned. In principle, these layers can be pruned after training, resulting in a network with only the necessary number of layers thus simplifying tuning of hyperparameters.
扫码关注我们
求助内容:
应助结果提醒方式:
