Convergence of policy gradient methods for stochastic control problems
Policy gradient (PG) methods have demonstrated remarkable success in a wide range of sequential decision-making tasks. However, the majority of research efforts have focused on discrete pro- blems, leaving the convergence analysis of PG methods for controlled diffusions as an unresolved issue. This work proves the convergence of PG methods for finite-horizon linear-quadratic control problems. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm. If time allows, extensions of the algorithm to nonlinear control problems will be discussed.