Compressed Dictionary Matching on Run-Length Encoded Strings

Philip Bille
Inge Li Gørtz
Simon J. Puglisi
Simon Rumle Tarnow

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Given a set of pattern strings P = {P1, P2, . . . Pk} and a text string S, the classic dictionary matching problem is to report all occurrences of each pattern in S. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-length encoding, and the goal is to solve the problem without decompression and achieve efficient time and space in the size of the compressed strings. Let m and n be the total length of the patterns P and the length of the text string S, respectively, and let m and n be the total number of runs in the run-length encoding of the patterns in P and S, respectively. Our main result is an algorithm that achieves O((m+n) log logm+occ) expected time, and O(m) space, where occ is the total number of occurrences of patterns in S. This is the first non-trivial solution to the problem. Since any solution must read the input, our time bound is optimal within an log logm factor. We introduce several new techniques to achieve our bounds, including a new compressed representation of the classic Aho-Corasick automaton and a new efficient string index that supports fast queries in runlength encoded strings.

Version published to 10.21203/rs.3.rs-7959005/v1 on Research Square
Dec 4, 2025

Data Structures for Range Sorted Consecutive Occurrence Queries

This article has 2 authors:
1. Waseem Akram
2. Takuya Mieno
This article has no evaluationsLatest version Jan 21, 2026
Number-Theoretic Derivation of Sliding Window Coverage Parameter W for Xi Sequences

This article has 1 author:
1. Tony Yuan
This article has no evaluationsLatest version Jan 7, 2026
Assembly Theory - Formalizing Assembly Spaces, Discovering Patterns and Bounds

This article has 5 authors:
1. Wawrzyniec Bieniawski
2. Andrzej Tomski
3. Szymon Łukaszyk
4. Piotr Masierak
5. Szymon Tworz
This article has no evaluationsLatest version Dec 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Data Structures for Range Sorted Consecutive Occurrence Queries

Number-Theoretic Derivation of Sliding Window Coverage Parameter W for Xi Sequences

Assembly Theory - Formalizing Assembly Spaces, Discovering Patterns and Bounds