... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)

2006-04-03

Wikipedia vs. the Generic LCS Delta Algorithm

Wikipedia offers the following method to compute the longest common subsequence of two sequences (using the so-called "Generic LCS Delta" algorithm):

LCS-Delta(X,Y)
m <- LENGTH[X];
n <- LENGTH[Y];
for i <- 1 to m, do c[i,0] <-0;
for j <- 0 to n, do c[0,j] <-0;
b <- c;
for i <- 1 to m, do {
    for j <- 1 to n do {
         if (xi = yj) {
             c[i,j] <- c[i-1,j-1]+1;
             b[i,j] <- "UP-LEFT";
         }
         else if (c[i-1,j] >= c[i,j-1]) {
             c[i,j] <- c[i-1,j];
             b[i,j] <- "UP";
         }
         else {
             c[i,j] <- c[i,j-1];
             b[i,j] <- "LEFT";
         }
    }
}
return c and b

This algorithm fails for various trivial cases, such as (X, Y) = ({1}, {-1}), (X, Y) = ({1}, {}), and (X, Y) = ({}, {1}).  The problem is that the highlighted lines should read as follows:

for i <- 1 to m, do c[i,0] <- "LEFT";
for j <- 0 to n, do c[0,j] <- "UP";

If you neglect this initialization, differences at the beginning of the sequence are ignored.

Actually, I am probably being too hard on the wikipedia article since it suggests that the differences be computed from the LCS array using the following, somewhat confusing, algorithm:

The following algorithm will give the list of edits needed to make Y equal to X (the diff algorithm):

Notation: [n,m] refers to a LCS component at X[n] and Y[n]

Assumptions:

   The LCS as found by findLCS() is placed on a stack moving from the lower-right corner and
   following the direction arrows to form the LCS.

oldElem <---- [-1,-1]
while stack is not empty
   elem <---- pop from stack
   // diff does not require LCS components but add to list here if desired
   for i=oldElem[n] to elem[n]
        add delete X(i) from list of edits
   for i=oldElem[m] to elem[m]
        add insert Y(i) to list of edits

The change that I suggest makes it possible to recover the LCS by reversing the links in the table rather than using a stack.  I have an implementation in Java.

Blog Archive