The training of neural networks can be formulated as an optimal control problem of a dynamical system. The initial conditions of the dynamical system are given by the data. The objective of the control problem is to transform the initial conditions in a form that can be easily classified or regressed using linear methods. This link between optimal control of dynamical systems and neural networks has proved beneficial both from a theoretical and from a practical point of view. Several researchers have exploited this link to investigate the stability of different neural network architectures and develop memory efficient training algorithms. In this paper, we also adopt the dynamical systems view of neural networks, but our aim is different from earlier works. Instead, we develop a novel distributed optimization algorithm. The proposed algorithm addresses the most significant obstacle for distributed algorithms for neural network optimization: the network weights cannot be updated until the forward propagation of the data, and backward propagation of the gradients are complete. Using the dynamical systems point of view, we interpret the layers of a (residual) neural network as the discretized dynamics of a dynamical system and exploit the relationship between the co-states (adjoints) of the optimal control problem and backpropagation. We then develop a parallel-in-time method that updates the parameters of the network without waiting for the forward or back propagation algorithms to complete in full. We establish the convergence of the proposed algorithm. Preliminary numerical results suggest that the algorithm is competitive and more efficient than the state-of-the-art.