TY - JOUR
T1 - Hilbert Curve Projection Distance for Distribution Comparison
AU - Li, Tao
AU - Meng, Cheng
AU - Xu, Hongteng
AU - Yu, Jun
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Distribution comparison plays a central role in many machine learning tasks like data classification and generative mod- eling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then cal- culate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the Lp cost in the d-dimensional space converges to its population counterpart at a rate of no more than O(n−1/2 max{d,p}). To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.
AB - Distribution comparison plays a central role in many machine learning tasks like data classification and generative mod- eling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then cal- culate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the Lp cost in the d-dimensional space converges to its population counterpart at a rate of no more than O(n−1/2 max{d,p}). To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.
KW - Distribution comparison
KW - Hilbert curve
KW - Wasserstein distance
KW - optimal transport
KW - projection robust Wasserstein distance
UR - http://www.scopus.com/inward/record.url?scp=85187287473&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2024.3363780
DO - 10.1109/TPAMI.2024.3363780
M3 - Article
AN - SCOPUS:85187287473
SN - 0162-8828
VL - 46
SP - 4993
EP - 5007
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 7
ER -