Reinforcement Learning for Convex MDPs with application to hedging and pricing
Convex MDPs generalize the standard reinforcement learning (RL) problem formulation to a larger framework that includes many supervised and unsupervised RL problems, such as apprenticeship learning, constrained MDPs, and so-called ‘pure exploration’. We consider the reformulation of the convex MDP problem as a min-max game involving policy and cost (negative reward) ‘players’, using duality. Then we study the application of this strategy to pricing and hedging in Pricing/Hedging under optimized certainty equivalents (OCEs) which is a family of risk measures widely used by practitioners and academics. This class of risk measures includes many important examples, e.g. entropic risk measures and average value at risk.