Probability Colloqium
Date
Time
17:oo-18:oo
Location:
TUB; MA 041
Yufei Zhang (LSE)

Exploration-exploitation trade-off for continuous-time reinforcement learning

Recently, reinforcement learning (RL) has attracted substantial research interests. Much of the attention and success, however, has been for the discrete-time setting. Continuous-time RL, despite its natural analytical connection to stochastic controls, has been largely unexplored and with limited progress. In particular, characterising sample efficiency for continuous-time RL algorithms remains a challenging and open problem.

In this talk, we develop a framework to analyse model-based reinforcement learning in the episodic setting. We then apply it to optimise exploration-exploitation trade-off for linear-convex RL problems, and report sublinear (or even logarithmic) regret bounds for a class of learning algorithms inspired by filtering theory. The approach is probabilistic, involving analysing learning efficiency using concentration inequalities for correlated continuous-time observations, and applying stochastic control theory to quantify the performance gap between applying greedy policies derived from estimated and true models.

Probability Colloqium
Date
Time
16:oo-17:oo
Location:
TUB MA 041
Guanxing Fu (HK PolyU)

Mean field portfolio games

First, I will discuss a mean field portfolio game in a general framework. Using a dynamic programming principle and a martingale optimality principle, I establish a one-to-one correspondence between the Nash equilibrium and some BSDE. Such a correspondence is key to the uniqueness result of Nash equilibria. Generally, this BSDE can be solved under a weak interaction assumption. Motivated by this assumption, I will introduce an asymptotic expansion result of the game value in terms of the interaction parameter. Second, I will incorporate consumption into the portfolio game and show that the equilibrium investment and consumption can be fully characterized by one BSDE. 

Mathematical Finance Seminar
Date
Time
17:00-18:oo
Location:
TUB, MA043
Sigrid Källblad (KTH Stockholm)

Adapted Wasserstein distance between the laws of SDEs

We consider an adapted optimal transport problem between the laws of Markovian stochastic differential equations (SDEs) and establish optimality of the so-called synchronous coupling between the given laws. The proof of this result is based on time-discretisation methods and reveals an interesting connection between the synchronous coupling and the celebrated discrete-time Knothe–Rosenblatt rear- rangement. We also provide a related result on equality of various topologies when restricted to certain laws of continuous-time stochastic processes. The result is of relevance for the study of stability with respect to model specification in mathematical finance.

The talk is based on joint work with Julio Backhoff-Veraguas and Ben Robinson.

Probability Colloqium
Date
Time
17:00-18:oo
Location:
TUB, MA041
Giorgia Callegaro (University of Padova)

A McKean-Vlasov game of commodity production, consumption and trading

We propose a model where a producer and a consumer can affect the price dynamics of some commodity controlling drift and volatility of, respectively, the production rate and the consumption rate. We assume that the producer has a short position in a forward contract on λ units of the underlying at a fixed price F, while the consumer has the corresponding long position. Moreover, both players are risk-averse with respect to their financial position and their risk aversions are modelled through an integrated-variance penalization. We study the impact of risk aversion on the interaction between the producer and the consumer as well as on the derivative price. In mathematical terms, we are dealing with a two-player linear-quadratic McKean–Vlasov stochastic differential game. Using methods based on the martingale optimality principle and BSDEs, we find a Nash equilibrium and characterize the corresponding strategies and payoffs in semi-explicit form. Furthermore, we compute the two indifference prices (one for the producer and one for the consumer) induced by that equilibrium and we determine the quantity λ such that the players agree on the price. Finally, we illustrate our results with some numerics. In particular, we focus on how the risk aversions and the volatility control costs of the players affect the derivative price.

This is a joint paper with R. Aid, O. Bonesini and L. Campi.

Probability Colloqium
Date
Time
16:oo-17:oo
Location:
TUB, MA041
Günter Last (KIT)

Poisson hulls and nonparametric boundary models

We consider a Poisson point process on a general state space. Using an axiomatic approach, we introduce a hull as a random subset of the state space determined by this process. A key example is the convex hull of a finite Poisson process in Euclidean space. In the first part of the talk we shall provide some first properties along with other examples. Forming conditional expectations, Poisson hulls can be used as natural estimators of linear functions of the underlying intensity measure. Using a spatial Markov property, we will derive some fundamental properties of these estimators. In particular we shall discuss moment formulas and the connection to the (anticipating) stochastic Kabanov–Skorohod integral. In the second part of the talk we shall discuss central limit theorems for growing intensities. Our method is based on the Stein-Malliavin approach and yields presumably optimal rates of convergence. Finally we present an application to nonparametric boundary models.

The talk is based on joint work with Ilya Molchanov (Bern).

Mathematical Finance Seminar
Date
Time
17:oo-18:oo
Location:
TUB MA 041
Tahir Choulli (U Alberta)

Risk Quantification, Optimal Stopping and Reflected BSDEs for a Class of Informational Markets

Mathematical Finance Seminar
Date
Time
16:oo-17:oo
Location:
TUB MA 041
Paul Hager (HU Berlin)

Mean-field liquidation games with market drop-out

Mathematical Finance Seminar
Date
Time
17:oo-18:oo
Location:
TUB MA 041
Eyal Neumann (Imperial College)

Equilibrium in Infinite-Dimensional Stochastic Games with Mean-Field Interaction

We consider a general class of finite-player stochastic games with mean-field interaction, in which the linear-quadratic objective functional includes linear operators acting on square-integrable controls. We propose a novel approach for deriving explicitly the Nash equilibrium of the game by reducing the associated first order conditions to a system of stochastic Fredholm equations of the second kind and deriving their closed-form solution. Furthermore, by proving stability results for the system of Fredholm equations, we derive the convergence of the equilibrium of the N-player game to the corresponding mean- field equilibrium. As a by-product of our results we also derive epsilon-Nash equilibrium for the mean- field game and we show that the conditions for existence of an equilibrium in the mean-field limit are significantly less restrictive than in the finite-player game. Finally we apply our general framework to solve various examples, such as stochastic Volterra linear-quadratic games, models of systemic risk and advertising with delay and optimal liquidation games with transient price impact.

The talk is based on a joint work with Eduardo Abi-Jaber and Moritz Voss.

Mathematical Finance Seminar
Date
Time
16:oo-17:oo
Location:
TUB MA 041
Denis Belomestny (U Duisburg-Essen)

Reinforcement Learning for Convex MDPs with application to hedging and pricing

Convex MDPs generalize the standard reinforcement learning (RL) problem formulation to a larger framework that includes many supervised and unsupervised RL problems, such as apprenticeship learning, constrained MDPs, and so-called ‘pure exploration’. We consider the reformulation of the convex MDP problem as a min-max game involving policy and cost (negative reward) ‘players’, using duality. Then we study the application of this strategy to pricing and hedging in Pricing/Hedging under optimized certainty equivalents (OCEs) which is a family of risk measures widely used by practitioners and academics. This class of risk measures includes many important examples, e.g. entropic risk measures and average value at risk.

Probability Colloqium
Date
Time
17:oo-18:oo
Location:
TUB MA 041
Sam Cohen (Oxford)

Stability and approximation of projection filters

Nonlinear filtering is a central mathematical tool in understanding how we process information. Sadly, the equations involved are often very high dimensional, which may lead to difficulties in applications. One possible resolution (due to D. Brigo and collaborators) is to replace the filter by a low-dimensional approximation, with hopefully small error. In this talk we will see how, in the case where the underlying process is a finite-state Markov Chain, results on the stability of filters can be strengthened to show that this introduces a well-controlled error, leveraging tools from information geometry. (Based on joint work with Eliana Fausti)