## Dynamic Programming – 2 : Longest Common Subsequence

Longest Common Subsequence (LCS)

We are given 2 sequences $X = < x_1, \ldots, x_m >$ and $Y = $. Our task is to find the LCS of X and Y.

Terminology: A subsequence of X is a set of elements that can be obtained from the elements of X such that no 2 two elements of the subsequence are out of order wrt the ordering defined by X. e.g. if X = < E L E P H A N T >, then < E P H T>, < E E A > etc are subsequences of X.

A common subsequence of X and Y is a sequence Z that is a subsequence of both X and Y. The longest such Z is an LCS.

Solution.

Let OPT (m, n) denote the length of an LCS of X and Y. (Note that there may be more than one LCS’s, each of them having the same optimum length.)

Either of these 2 cases is true : either $x_m = y_n$ or $x_m \neq y_n$.

Case 1:

If $x_m = y_n$ then we can certainly include that element in the LCS and proceed to finding the LCS of $ and $.

Notation: Let $X_k$ denote $. $Y_k$ is defined analogously.

Notation: Let $OPT (i, j)$ denote the length of the LCS of $X_i$ and $Y_j$.

Therefore, we have that:

$OPT (m,n) = 1 + OPT (m-1, n-1)$ (if $x_m = y_n$)

Case 2:

If $x_m \neq y_n$, then we have that:

$OPT (m, n) = max \{ OPT (m-1, n), OPT (m, n-1) \}$.

—–

Generalizing the above, we have that, for $i > 0$ and $j> 0$:

$OPT (i, j) = 1 + OPT (i-1, j-1)$ (if $x_i = y_j)$

and

$OPT (i, j) = max \{ OPT (i, j-1), OPT (i-1, j) \}$ (if $x_i \neq y_j$)

—–

We can use the above recurrence to compute the optimum length in a bottom up fashion.

Pseudocode for computing the optimum length:

Initialize OPT (0, i) and OPT (j, 0) to be 0 for all 1 <= i <= n, and 1 <= j <= m.

For i from 1 to m:

For j from 1 to n:

Compute OPT (i,j) and OPT (j,i) using the above recurrence.

End of algorithm.

—-

What we want is the length of OPT (m,n). The above algorithm computes OPT(m,n) in O(mn) time.

—-

Using the $m \times n$ table of OPT (i,j) values, we can compute the LCS recursively.

Function: OPT_Solution (i,j) :=

If (i == 0) OR ( j == o)

// Exit; Break recursion.

Return;

If $x_i == y_j$

OPT_Solution (i-1, j-1);

Print $x_i$

Else if $OPT (i-1, j) \geq OPT (i, j-1)$

OPT_Solution (i-1, j);

Else

OPT_Solution (i, j-1);

End of algorithm.
—-
The above code for printing the LCS runs in O(m+n) time.

—–