A Unifying Approach to Distributional Limits for Empirical Optimal Transport

Authors

Hundrieser S, Klatt M, Munk A, Staudt T

Journal

Bernoulli

Citation

Bernoulli 30 (4), 2846-2877.

Abstract

We provide a unifying approach to central limit type theorems for empirical optimal transport (OT). The limit distribution is given by a supremum of a centered Gaussian process, and we explicitly characterize when it is centered normal or degenerates to a Dirac measure. Moreover, in contrast to recent contributions to distributional limit laws for empirical OT on Euclidean spaces which require centering around its expectation, the limits obtained here are centered around the population quantity, which is well-suited for statistical applications such as goodness-of-fit testing and randomized OT computation. Overall, our distributional limits are valid if one of the population probability measures is of intrinsic dimension at most three. At the heart of our theory lies the Kantorovich duality which represents the OT cost as a supremum over a function class Fc for an underlying sufficiently regular and possibly unbounded cost function c. In this regard, OT is considered as a functional defined on ℓ∞(Fc), the Banach space of bounded functionals from Fc to R and equipped with uniform norm. We prove the OT functional to be Hadamard directionally differentiable and conclude distributional convergence for increasing sample size via a functional delta method that necessitates weak convergence of an underlying empirical process in ℓ∞(Fc). The latter can be dealt with empirical process theory and requires Fc to be a Donsker class. We give sufficient conditions depending on the dimension of the ground space, the underlying cost function and the probability measures under consideration to guarantee the Donsker property. Altogether, our approach reveals a noteworthy trade-off inherent in central limit theorems for empirical OT: Kantorovich duality requires Fc to be sufficiently rich, while weak convergence of the underlying empirical processes only occurs if Fc is not too complex.

DOI

10.3150/23-BEJ1697