Ordered Gromov-Hausdorff Metric: A New Tool for Comparative Analysis of Protein Structures
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Classical protein structure comparison metrics such as RMSD and TM-score effectively assess geometric similarity but ignore the linear order of amino acid residues (Zhang and Skolnick, 2004). The Gromov-Hausdorff (GH) metric compares metric spaces by shape but also does not account for order (Gromov, 1981). This can lead to incorrectly identifying proteins with swapped domains as similar. We introduce the Ordered Gromov-Hausdorff (OGH) metric, defined on ordered metric spaces, to incorporate residue order into the comparison.
Results
OGH combines coordinate normalization, an exponential penalty for order violations, and a monotonic alignment algorithm with computational complexity O(n·w), where w is the search window width. It is proven that OGH satisfies all metric axioms for α > 0. Analytical properties include invariance under isometries, upper boundedness, Lipschitz continuity under small coordinate perturbations, and concavity in the weight parameter α. On the VAD dataset (28 viral proteins from HIV-1, SARS-CoV-2, MERS-CoV), OGH increases monotonically with residue shuffling (up to 0.363 at 100% shuffling) and correlates strongly with TM-score (r = 0.706). In the task of separating homologs at fixed global similarity (TM-score ≈ 0.5), OGH achieves AUC = 0.800, whereas TM-score gives AUC = 0.467, demonstrating that OGH detects conserved order even when global geometry is not conserved.
Availability
The Python source code for OGH is freely available at https://github.com/andytimoffilim/OGH . The VAD dataset (PDB IDs listed in the paper) is publicly accessible from the RCSB Protein Data Bank (Berman et al., 2000; wwPDB, 2019).