Optimal transport and Wasserstein distances for causal models
We introduce a variant of optimal transport adapted to the causal structure given by an underlying directed graph G. Different graph structures lead to different specifications of the optimal transport problem. For instance, a fully connected graph yields standard optimal transport, a linear graph structure corresponds to causal optimal transport between the distributions of two discrete-time stochastic processes, and an empty graph leads to a notion of optimal transport related to CO-OT, Gromov–Wasserstein distances and factored OT. We derive different characterizations of G-causal transport plans and introduce Wasserstein distances between causal models that respect the underlying graph structure. We show that average treatment effects are continuous with respect to G-causal Wasserstein distances and small perturbations of structural causal models lead to small deviations in G-causal Wasserstein distance. We also introduce an interpolation between causal models based on G-causal Wasserstein distance and compare it to standard Wasserstein interpolation.