Article révisé par les pairs
Résumé : The goal of this research is threefold: (i) to analyze generalized digital search trees, (ii) to derive the average profile (i.e., phrase length) of a generalization of the well-known parsing algorithm due to Lempel and Ziv, and (iii) to provide analytic tools to analyze asymptotically certain partial differential functional equations often arising in the analysis of digital trees. In the generalized Lempel-Ziv parsing scheme, one partitions a sequence of symbols from a finite alphabet into phrases such that the new phrase is the shortest substring seen in the past by at most b - 1 phrases (b = 1 corresponds to the original Lempel-Ziv scheme). Such a scheme can be analyzed through a generalized digital search tree in which every node is capable of storing up to b strings. In this paper, we investigate the depth of a randomly selected node in such a tree and the length of a randomly selected phrase in the generalized Lempel-Ziv scheme. These findings and some recent results allow us to compute the average redundancy of the generalized Lempel-Ziv code and compare it to the ordinary Lempel-Ziv code, leading to an optimal value of 6. Analytic techniques of (precise) analyses of algorithms are used to establish most of these conclusions. © 1999 Society for Industrial and Applied Mathematics.