MELO-ED: learning locality-sensitive multi-embeddings for edit distance

Xin Yuan
Ke Chen
Ajmain Yasar Ahmed
Mingfu Shao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Edit distance is a fundamental metric for quantifying similarity between biological sequences, but its high computational cost limits large-scale applications. Previously, we proposed learned locality-sensitive bucketing (LSB) functions that achieved superior performance and efficiency compared to classical seeding methods for identifying similar and dissimilar sequences. How-ever, each component of an LSB function is represented as a one-dimensional hash value that can only be compared for identity, which constrains the method’s accuracy. Here, we intro-duce MELO-ED, a multi-embedding locality-sensitive framework that upgrades each hash value to a higher-dimensional embedding capable of efficiently approximating edit distance. MELO-ED employs a Siamese convolutional neural architecture that learns complementary embeddings capturing both global sequence context and fine-grained edit operations. By integrating locality-sensitive bucketing with multi-embedding representations, MELO-ED achieves near-perfect ac-curacy without increasing the number of buckets required. Leveraging mature indexing methods in the embedding space, MELO-ED transforms time-consuming edit distance computations into scalable similarity searches across massive genomic databases. Comprehensive evaluations on simulated DNA sequences and real barcode datasets demonstrate that MELO-ED outperforms both traditional alignment-free methods and contemporary machine learning approaches, in-cluding our previously developed learned LSB functions. These results establish MELO-ED as a state-of-the-art framework for fast and accurate classification of similar and dissimilar sequences. MELO-ED is available at https://github.com/Shao-Group/MELO-ED .

Version published to 10.1101/2025.11.23.689944 on bioRxiv
Nov 26, 2025

Knowledge and Context Compression via Question Generation

This article has 6 authors:
1. Alex Anvi Eponon
2. Moein Shahiki-Tash
3. Abdullah -
4. Luis Ramos
5. Christian Maldonado-Sifuentes
6. Ildar Batyrshin
This article has no evaluationsLatest version Jan 27, 2026
Knowledge and Context Compression via Question Generation

This article has 6 authors:
1. Alex Anvi Eponon
2. Moein Shahiki-Tash
3. Abdullah -
4. Luis Ramos
5. Christian Maldonado-Sifuentes
6. Ildar Batyrshin
This article has no evaluationsLatest version Jan 27, 2026
MixSense: AI Optimization for Contiguous Music Segmentation at Scale

This article has 1 author:
1. Vipul Razdan
This article has no evaluationsLatest version Jan 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Knowledge and Context Compression via Question Generation

Knowledge and Context Compression via Question Generation

MixSense: AI Optimization for Contiguous Music Segmentation at Scale