Authors
Hundrieser S, Staudt T, Munk A
Journal
Annales de l’Institut Henri Poincare (B) Probabilites et statistiques
Citation
Ann. Inst. H. Poincaré Probab. Statist. 60(2): 824-846.
Abstract
The empirical optimal transport (OT) cost between two probability measures from random data is a fundamental quantity in transport based data analysis. In this work, we derive novel guarantees for its convergence rate when the involved measures are different, possibly supported on different spaces. Our central observation is that the statistical performance of the empirical OT cost is determined by the less complex measure, a phenomenon we refer to as lower complexity adaptation of empirical OT. For instance, under Lipschitz ground costs, we find that the expected error between the empirical OT cost based on n observations and the population quantity decreases with rate n−1/d if one of the two measures is concentrated on a d-dimensional manifold, while the other can be arbitrary. For semi-concave ground costs, we show that the upper bound for the rate improves to n−2/d. Similarly, our theory establishes the general convergence rate n−1/2 for semi-discrete OT. All of these results are valid in the two-sample case as well. Our findings therefore suggest that the curse of dimensionality only affects the estimation of the OT cost when both measures exhibit a high intrinsic dimension. Our proofs are based on the dual formulation of OT as a maximization over a suitable function class Fc and the observation that the c-transform of Fc under bounded costs has the same uniform metric entropy as Fc itself.
DOI