Online and offline heuristics for inferring hierarchies of repetitions in sequences

Author: JM Chessells, IM Hann, G Harrison, SJ Passmore, DKH Webb, Keith Wheatley
Publisher: Institute of Electrical and Electronics Engineers (IEEE)

ABOUT BOOK

Hierarchical dictionary-based compression schemes form a grammar for a text by replacing each repeated string with a production rule. While such schemes usually operate online, making a replacement as soon as repetition is detected, offline operation permits greater freedom in choosing the order of replacement. In this paper, we compare the online method with three offline heuristics for selecting the next substring to replace: longest string first, most common string first, and the string that minimized the size of the grammar locally. Surprisingly, two of the offline techniques, like the online method, run in time linear in the size of the input. We evaluate each technique on artificial and natural sequences. In general, the locally-most-compressive heuristic performs best, followed by most frequent, the online technique, and, lagging by some distance, the longest-first technique

Powered by: