Ordered Gromov-Hausdorff Metric: A New Tool for Comparative Analysis of Protein Structures

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Classical protein structure comparison metrics such as RMSD and TM-score effectively assess geometric similarity but ignore the linear order of amino acid residues (Zhang and Skolnick, 2004). The Gromov-Hausdorff (GH) metric compares metric spaces by shape but also does not account for order (Gromov, 1981). This can lead to incorrectly identifying proteins with swapped domains as similar. We introduce the Ordered Gromov-Hausdorff (OGH) metric, defined on ordered metric spaces, to incorporate residue order into the comparison.

Results

OGH combines coordinate normalization, an exponential penalty for order violations, and a monotonic alignment algorithm with computational complexity O(n·w), where w is the search window width. It is proven that OGH satisfies all metric axioms for α > 0. Analytical properties include invariance under isometries, upper boundedness, Lipschitz continuity under small coordinate perturbations, and concavity in the weight parameter α. On the VAD dataset (28 viral proteins from HIV-1, SARS-CoV-2, MERS-CoV), OGH increases monotonically with residue shuffling (up to 0.363 at 100% shuffling) and correlates strongly with TM-score (r = 0.706). In the task of separating homologs at fixed global similarity (TM-score ≈ 0.5), OGH achieves AUC = 0.800, whereas TM-score gives AUC = 0.467, demonstrating that OGH detects conserved order even when global geometry is not conserved.

Availability

The Python source code for OGH is freely available at https://github.com/andytimoffilim/OGH . The VAD dataset (PDB IDs listed in the paper) is publicly accessible from the RCSB Protein Data Bank (Berman et al., 2000; wwPDB, 2019).

Article activity feed